Publications – Analytics of Software, GAmes And Repository Data (ASGAARD) Lab

1.

Mohammad Reza Taesiri; Abhijay Ghildyal; Saman Zadtootaghaj; Nabajeet Barman; Cor-Paul Bezemer

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Miscellaneous

https://arxiv.org/abs/2505.15952, 2025.

Files:

Links | BibTeX | Tags: Computer games, Game development, Game testing, Gameplay videos, LLM, Software quality

2.

Mohammad Reza Taesiri

Leveraging Foundation Models for Video Game Quality Assurance PhD Thesis

2024.

Files:

Abstract | BibTeX | Tags: Computer games, Computer vision, Game development, Game testing, Gameplay videos, Machine learning, Software quality

@phdthesis{phd_taesiri,

title = {Leveraging Foundation Models for Video Game Quality Assurance},

author = {Mohammad Reza Taesiri },

year = {2024},

date = {2024-09-25},

abstract = {The video game industry has become a powerhouse in the global entertainment econ-

omy. Creating engaging, high-quality games demands intricate development processes

and significant resources. As projects grow in complexity and scale, developers often

grapple with demanding schedules, tight deadlines, and the risk of burnout. These

pressures highlight the need for more efficient development strategies, with quality

assurance (QA) emerging as a critical area for optimization.

Artificial Intelligence (AI) has the potential to address these challenges by en-

hancing the game QA processes in large gaming companies. Specifically, foundation

models—large pre-trained AI models—offer promising applications to improve these

processes. Exploring novel uses of these advanced AI models could reveal their poten-

tial and limitations in optimizing game development workflows, potentially alleviating

some of the industry’s pressing issues and facilitating the creation of high-quality, en-

gaging games.

In this thesis, my goal is to improve video game testing processes by leveraging

foundation models to ensure the final product reaches a desirable quality. I explore

new opportunities that foundation models bring to game testing, from searching for

instances of game bugs within video repositories to assisting human testers in catching

bugs, through three studies:

First, I investigate the utility of image-text foundation models in retrieving game-

play videos. In this study, I create a video search engine designed to help developers

efficiently search video repositories for examples of video game bugs using textual

descriptions. For example, developers can find all instances of a bug by using a tex-

tual description of the bug, such as a horse flying in the air. This study lays the

groundwork for AI-based game QA processes, with results demonstrating significant

potential.

Next, I introduce GlitchBench, a benchmarking dataset of video game glitches

and anomalies designed to assess state-of-the-art large multimodal models, such as

GPT-4V, in detecting and understanding game bugs. This extensive dataset includes

a wide range of images depicting various glitches, sourced from both online platforms

and synthetic sets created within the Unity game engine. GlitchBench includes both

common and rare glitches encountered in the video game quality assurance process.

The findings from this study highlight both the promise and limitations of existing

models, particularly in unusual and rare cases.

Lastly, I introduce VideoGameBunny, a large multimodal model specifically

trained for video game content, accompanied by a dataset of 389,565 image-instruction

pairs. My analysis demonstrates that VideoGameBunny outperforms much larger

models in video game understanding tasks while using 4.2× fewer parameters. This

result underscores the effectiveness and promise of using a high-quality dataset to

improve models’ understanding of video games, thus making them more effective in

the game QA process.

Future work should focus on enhancing the generalization and robustness of AI

models in the gaming context, particularly through better integration of vision and

language components. This integration could be achieved using either early or late fu-

sion methods. For late fusion methods, where two pre-trained models are connected,

better alignment between these components can be achieved through improved train-

ing data and strategies. Alternatively, early fusion techniques, which involve training

both components simultaneously to enhance their integration, can overcome many

issues that existing models have.},

keywords = {Computer games, Computer vision, Game development, Game testing, Gameplay videos, Machine learning, Software quality},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

The video game industry has become a powerhouse in the global entertainment econ-
omy. Creating engaging, high-quality games demands intricate development processes
and significant resources. As projects grow in complexity and scale, developers often
grapple with demanding schedules, tight deadlines, and the risk of burnout. These
pressures highlight the need for more efficient development strategies, with quality
assurance (QA) emerging as a critical area for optimization.
Artificial Intelligence (AI) has the potential to address these challenges by en-
hancing the game QA processes in large gaming companies. Specifically, foundation
models—large pre-trained AI models—offer promising applications to improve these
processes. Exploring novel uses of these advanced AI models could reveal their poten-
tial and limitations in optimizing game development workflows, potentially alleviating
some of the industry’s pressing issues and facilitating the creation of high-quality, en-
gaging games.
In this thesis, my goal is to improve video game testing processes by leveraging
foundation models to ensure the final product reaches a desirable quality. I explore
new opportunities that foundation models bring to game testing, from searching for
instances of game bugs within video repositories to assisting human testers in catching
bugs, through three studies:
First, I investigate the utility of image-text foundation models in retrieving game-
play videos. In this study, I create a video search engine designed to help developers
efficiently search video repositories for examples of video game bugs using textual
descriptions. For example, developers can find all instances of a bug by using a tex-
tual description of the bug, such as a horse flying in the air. This study lays the
groundwork for AI-based game QA processes, with results demonstrating significant
potential.
Next, I introduce GlitchBench, a benchmarking dataset of video game glitches
and anomalies designed to assess state-of-the-art large multimodal models, such as
GPT-4V, in detecting and understanding game bugs. This extensive dataset includes
a wide range of images depicting various glitches, sourced from both online platforms
and synthetic sets created within the Unity game engine. GlitchBench includes both
common and rare glitches encountered in the video game quality assurance process.
The findings from this study highlight both the promise and limitations of existing
models, particularly in unusual and rare cases.
Lastly, I introduce VideoGameBunny, a large multimodal model specifically
trained for video game content, accompanied by a dataset of 389,565 image-instruction
pairs. My analysis demonstrates that VideoGameBunny outperforms much larger
models in video game understanding tasks while using 4.2× fewer parameters. This
result underscores the effectiveness and promise of using a high-quality dataset to
improve models’ understanding of video games, thus making them more effective in
the game QA process.
Future work should focus on enhancing the generalization and robustness of AI
models in the gaming context, particularly through better integration of vision and
language components. This integration could be achieved using either early or late fu-
sion methods. For late fusion methods, where two pre-trained models are connected,
better alignment between these components can be achieved through improved train-
ing data and strategies. Alternatively, early fusion techniques, which involve training
both components simultaneously to enhance their integration, can overcome many
issues that existing models have.

Close

3.

Mohammad Reza Taesiri; Tianjun Feng; Anh Nguyen; Cor-Paul Bezemer

GlitchBench: Can large multimodal models detect video game glitches? Inproceedings

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

Files:

Abstract | BibTeX | Tags: Computer games, Foundation models, Game development, Gameplay videos, LLM

4.

Mohammad Reza Taesiri; Finlay Macklon; Sarra Habchi; Cor-Paul Bezemer

Searching bug instances in gameplay video repositories Journal Article

IEEE Transactions on Games, 2024.

Files:

Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming

5.

Mohammad Reza Taesiri; Finlay Macklon; Cor-Paul Bezemer

CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning Inproceedings

International Conference on Mining Software Repositories (MSR), 2022.

Files:

Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming

@inproceedings{TaesiriMSR2022,

title = {CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning},

author = {Mohammad Reza Taesiri and Finlay Macklon and Cor-Paul Bezemer},

year  = {2022},

date = {2022-03-24},

urldate = {2022-03-24},

booktitle = {International Conference on Mining Software Repositories (MSR)},

abstract = {Gameplay videos contain rich information about how players inter-

act with the game and how the game responds. Sharing gameplay

videos on social media platforms, such as Reddit, has become a

common practice for many players. Often, players will share game-

play videos that showcase video game bugs. Such gameplay videos

are software artifacts that can be utilized for game testing, as they

provide insight for bug analysis. Although large repositories of

gameplay videos exist, parsing and mining them in an effective and

structured fashion has still remained a big challenge. In this paper,

we propose a search method that accepts any English text query as

input to retrieve relevant videos from large repositories of gameplay

videos. Our approach does not rely on any external information

(such as video metadata); it works solely based on the content of the

video. By leveraging the zero-shot transfer capabilities of the Con-

trastive Language-Image Pre-Training (CLIP) model, our approach

does not require any data labeling or training. To evaluate our ap-

proach, we present the GamePhysics dataset consisting of 26,954

videos from 1,873 games, that were collected from the GamePhysics

section on the Reddit website. Our approach shows promising re-

sults in our extensive analysis of simple queries, compound queries,

and bug queries, indicating that our approach is useful for object

and event detection in gameplay videos. An example application

of our approach is as a gameplay video search engine to aid in

reproducing video game bugs. Please visit the following link for the

code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/},

keywords = {Bug report, Computer games, Game development, Gameplay videos, Gaming},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

6.

Dayi Lin; Cor-Paul Bezemer; Ahmed E. Hassan

Identifying Gameplay Videos that Exhibit Bugs in Computer Games Journal Article

Empirical Software Engineering Journal (EMSE), 2019.

Files:

Abstract | BibTeX | Tags: Bug report, Computer games, Gameplay videos, Steam

@article{Lin2019videos,

title = {Identifying Gameplay Videos that Exhibit Bugs in Computer Games},

author = {Dayi Lin and Cor-Paul Bezemer and Ahmed E. Hassan},

year  = {2019},

date = {2019-05-21},

urldate = {2019-05-21},

journal = {Empirical Software Engineering Journal (EMSE)},

abstract = {With the rapid growing market and competition in the gaming industry, it is challenging to develop a successful game, making the quality of games very important. To improve the quality of games, developers commonly use gamer-submitted bug reports to locate bugs in games. Recently, gameplay videos have become popular in the gaming community. A few of these videos showcase a bug, offering developers a new opportunity to collect context-rich bug information.

In this paper, we investigate whether videos that showcase a bug can automatically be identified from the metadata of gameplay videos that are readily available online. Such bug videos could then be used as a supplemental source of bug information for game developers. We studied the number of gameplay videos on the Steam platform, one of the most popular digital game distribution platforms, and the difficulty of identifying bug videos from these gameplay videos. We show that naÃ¯ve approaches such as using keywords to search for bug videos are time-consuming and imprecise. We propose an approach which uses a random forest classifier to rank gameplay videos based on their likelihood of being a bug video. Our proposed approach achieves a precision that is 43% higher than that of the naÃ¯ve keyword searching approach on a manually labelled dataset of 96 videos. In addition, by evaluating 1,400 videos that are identified by our approach as bug videos, we calculated that our approach has both a mean average precision at 10 and a mean average precision at 100 of 0.91. Our study demonstrates that it is feasible to automatically identify gameplay videos that showcase a bug.},

keywords = {Bug report, Computer games, Gameplay videos, Steam},

pubstate = {published},

tppubtype = {article}

}

Close