( = Paper PDF,
= Presentation slides,
= Presentation video)
Mohammad Reza Taesiri
Leveraging Foundation Models for Video Game Quality Assurance PhD Thesis
2024.
Abstract | BibTeX | Tags: Computer games, Computer vision, Game development, Game testing, Gameplay videos, Machine learning, Software quality
@phdthesis{phd_taesiri,
title = {Leveraging Foundation Models for Video Game Quality Assurance},
author = {Mohammad Reza Taesiri },
year = {2024},
date = {2024-09-25},
abstract = {The video game industry has become a powerhouse in the global entertainment econ-
omy. Creating engaging, high-quality games demands intricate development processes
and significant resources. As projects grow in complexity and scale, developers often
grapple with demanding schedules, tight deadlines, and the risk of burnout. These
pressures highlight the need for more efficient development strategies, with quality
assurance (QA) emerging as a critical area for optimization.
Artificial Intelligence (AI) has the potential to address these challenges by en-
hancing the game QA processes in large gaming companies. Specifically, foundation
models—large pre-trained AI models—offer promising applications to improve these
processes. Exploring novel uses of these advanced AI models could reveal their poten-
tial and limitations in optimizing game development workflows, potentially alleviating
some of the industry’s pressing issues and facilitating the creation of high-quality, en-
gaging games.
In this thesis, my goal is to improve video game testing processes by leveraging
foundation models to ensure the final product reaches a desirable quality. I explore
new opportunities that foundation models bring to game testing, from searching for
instances of game bugs within video repositories to assisting human testers in catching
bugs, through three studies:
First, I investigate the utility of image-text foundation models in retrieving game-
play videos. In this study, I create a video search engine designed to help developers
efficiently search video repositories for examples of video game bugs using textual
descriptions. For example, developers can find all instances of a bug by using a tex-
tual description of the bug, such as a horse flying in the air. This study lays the
groundwork for AI-based game QA processes, with results demonstrating significant
potential.
Next, I introduce GlitchBench, a benchmarking dataset of video game glitches
and anomalies designed to assess state-of-the-art large multimodal models, such as
GPT-4V, in detecting and understanding game bugs. This extensive dataset includes
a wide range of images depicting various glitches, sourced from both online platforms
and synthetic sets created within the Unity game engine. GlitchBench includes both
common and rare glitches encountered in the video game quality assurance process.
The findings from this study highlight both the promise and limitations of existing
models, particularly in unusual and rare cases.
Lastly, I introduce VideoGameBunny, a large multimodal model specifically
trained for video game content, accompanied by a dataset of 389,565 image-instruction
pairs. My analysis demonstrates that VideoGameBunny outperforms much larger
models in video game understanding tasks while using 4.2× fewer parameters. This
result underscores the effectiveness and promise of using a high-quality dataset to
improve models’ understanding of video games, thus making them more effective in
the game QA process.
Future work should focus on enhancing the generalization and robustness of AI
models in the gaming context, particularly through better integration of vision and
language components. This integration could be achieved using either early or late fu-
sion methods. For late fusion methods, where two pre-trained models are connected,
better alignment between these components can be achieved through improved train-
ing data and strategies. Alternatively, early fusion techniques, which involve training
both components simultaneously to enhance their integration, can overcome many
issues that existing models have.},
keywords = {Computer games, Computer vision, Game development, Game testing, Gameplay videos, Machine learning, Software quality},
pubstate = {published},
tppubtype = {phdthesis}
}
omy. Creating engaging, high-quality games demands intricate development processes
and significant resources. As projects grow in complexity and scale, developers often
grapple with demanding schedules, tight deadlines, and the risk of burnout. These
pressures highlight the need for more efficient development strategies, with quality
assurance (QA) emerging as a critical area for optimization.
Artificial Intelligence (AI) has the potential to address these challenges by en-
hancing the game QA processes in large gaming companies. Specifically, foundation
models—large pre-trained AI models—offer promising applications to improve these
processes. Exploring novel uses of these advanced AI models could reveal their poten-
tial and limitations in optimizing game development workflows, potentially alleviating
some of the industry’s pressing issues and facilitating the creation of high-quality, en-
gaging games.
In this thesis, my goal is to improve video game testing processes by leveraging
foundation models to ensure the final product reaches a desirable quality. I explore
new opportunities that foundation models bring to game testing, from searching for
instances of game bugs within video repositories to assisting human testers in catching
bugs, through three studies:
First, I investigate the utility of image-text foundation models in retrieving game-
play videos. In this study, I create a video search engine designed to help developers
efficiently search video repositories for examples of video game bugs using textual
descriptions. For example, developers can find all instances of a bug by using a tex-
tual description of the bug, such as a horse flying in the air. This study lays the
groundwork for AI-based game QA processes, with results demonstrating significant
potential.
Next, I introduce GlitchBench, a benchmarking dataset of video game glitches
and anomalies designed to assess state-of-the-art large multimodal models, such as
GPT-4V, in detecting and understanding game bugs. This extensive dataset includes
a wide range of images depicting various glitches, sourced from both online platforms
and synthetic sets created within the Unity game engine. GlitchBench includes both
common and rare glitches encountered in the video game quality assurance process.
The findings from this study highlight both the promise and limitations of existing
models, particularly in unusual and rare cases.
Lastly, I introduce VideoGameBunny, a large multimodal model specifically
trained for video game content, accompanied by a dataset of 389,565 image-instruction
pairs. My analysis demonstrates that VideoGameBunny outperforms much larger
models in video game understanding tasks while using 4.2× fewer parameters. This
result underscores the effectiveness and promise of using a high-quality dataset to
improve models’ understanding of video games, thus making them more effective in
the game QA process.
Future work should focus on enhancing the generalization and robustness of AI
models in the gaming context, particularly through better integration of vision and
language components. This integration could be achieved using either early or late fu-
sion methods. For late fusion methods, where two pre-trained models are connected,
better alignment between these components can be achieved through improved train-
ing data and strategies. Alternatively, early fusion techniques, which involve training
both components simultaneously to enhance their integration, can overcome many
issues that existing models have.
Mohammad Reza Taesiri; Tianjun Feng; Anh Nguyen; Cor-Paul Bezemer
GlitchBench: Can large multimodal models detect video game glitches? Inproceedings
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Abstract | BibTeX | Tags: Computer games, Foundation models, Game development, Gameplay videos, LLM
@inproceedings{TaesiriCVPR2024,
title = {GlitchBench: Can large multimodal models detect video game glitches?},
author = {Mohammad Reza Taesiri and Tianjun Feng and Anh Nguyen and Cor-Paul Bezemer},
year = {2024},
date = {2024-06-15},
urldate = {2024-03-15},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
abstract = {Large multimodal models (LMMs) have evolved from
large language models (LLMs) to integrate multiple input
modalities, such as visual inputs. This integration augments
the capacity of LLMs for tasks requiring visual comprehen-
sion and reasoning. However, the extent and limitations of
their enhanced abilities are not fully understood, especially
when it comes to real-world tasks. To address this gap, we
introduce GlitchBench, a novel benchmark derived from
video game quality assurance tasks, to test and evaluate the
reasoning capabilities of LMMs. Our benchmark is curated
from a variety of unusual and glitched scenarios from video
games and aims to challenge both the visual and linguis-
tic reasoning powers of LMMs in detecting and interpreting
out-of-the-ordinary events. We evaluate multiple state-of-
the-art LMMs, and we show that GlitchBench presents a
new challenge for these models. Code and data are avail-
able at: https://glitchbench.github.io/},
keywords = {Computer games, Foundation models, Game development, Gameplay videos, LLM},
pubstate = {published},
tppubtype = {inproceedings}
}
large language models (LLMs) to integrate multiple input
modalities, such as visual inputs. This integration augments
the capacity of LLMs for tasks requiring visual comprehen-
sion and reasoning. However, the extent and limitations of
their enhanced abilities are not fully understood, especially
when it comes to real-world tasks. To address this gap, we
introduce GlitchBench, a novel benchmark derived from
video game quality assurance tasks, to test and evaluate the
reasoning capabilities of LMMs. Our benchmark is curated
from a variety of unusual and glitched scenarios from video
games and aims to challenge both the visual and linguis-
tic reasoning powers of LMMs in detecting and interpreting
out-of-the-ordinary events. We evaluate multiple state-of-
the-art LMMs, and we show that GlitchBench presents a
new challenge for these models. Code and data are avail-
able at: https://glitchbench.github.io/
Mohammad Reza Taesiri; Finlay Macklon; Sarra Habchi; Cor-Paul Bezemer
Searching bug instances in gameplay video repositories Journal Article
IEEE Transactions on Games, 2024.
Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming
@article{TaesiriTG2024,
title = {Searching bug instances in gameplay video repositories},
author = {Mohammad Reza Taesiri and Finlay Macklon and Sarra Habchi and Cor-Paul Bezemer},
year = {2024},
date = {2024-01-17},
urldate = {2024-01-17},
journal = {IEEE Transactions on Games},
abstract = {Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs.
Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This paper introduces a method
for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no
external information, like video metadata; it solely depends on video content. Leveraging the zero-shot transfer capabilities of the
Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our
approach, we present the GamePhysics dataset, comprising 26,954 videos from 1,873 games that were collected from the
GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound
queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the
effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate
the potential of our approach for applications such as the creation of a video search tool tailored to identifying video game bugs, which
could greatly benefit Quality Assurance (QA) teams in finding and reproducing bugs. The code and data used in this paper can be
found at https://zenodo.org/records/10211390},
keywords = {Bug report, Computer games, Game development, Gameplay videos, Gaming},
pubstate = {published},
tppubtype = {article}
}
Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This paper introduces a method
for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no
external information, like video metadata; it solely depends on video content. Leveraging the zero-shot transfer capabilities of the
Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our
approach, we present the GamePhysics dataset, comprising 26,954 videos from 1,873 games that were collected from the
GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound
queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the
effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate
the potential of our approach for applications such as the creation of a video search tool tailored to identifying video game bugs, which
could greatly benefit Quality Assurance (QA) teams in finding and reproducing bugs. The code and data used in this paper can be
found at https://zenodo.org/records/10211390
Mohammad Reza Taesiri; Finlay Macklon; Cor-Paul Bezemer
CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning Inproceedings
International Conference on Mining Software Repositories (MSR), 2022.
Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming
@inproceedings{TaesiriMSR2022,
title = {CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning},
author = {Mohammad Reza Taesiri and Finlay Macklon and Cor-Paul Bezemer},
year = {2022},
date = {2022-03-24},
urldate = {2022-03-24},
booktitle = {International Conference on Mining Software Repositories (MSR)},
abstract = {Gameplay videos contain rich information about how players inter-
act with the game and how the game responds. Sharing gameplay
videos on social media platforms, such as Reddit, has become a
common practice for many players. Often, players will share game-
play videos that showcase video game bugs. Such gameplay videos
are software artifacts that can be utilized for game testing, as they
provide insight for bug analysis. Although large repositories of
gameplay videos exist, parsing and mining them in an effective and
structured fashion has still remained a big challenge. In this paper,
we propose a search method that accepts any English text query as
input to retrieve relevant videos from large repositories of gameplay
videos. Our approach does not rely on any external information
(such as video metadata); it works solely based on the content of the
video. By leveraging the zero-shot transfer capabilities of the Con-
trastive Language-Image Pre-Training (CLIP) model, our approach
does not require any data labeling or training. To evaluate our ap-
proach, we present the GamePhysics dataset consisting of 26,954
videos from 1,873 games, that were collected from the GamePhysics
section on the Reddit website. Our approach shows promising re-
sults in our extensive analysis of simple queries, compound queries,
and bug queries, indicating that our approach is useful for object
and event detection in gameplay videos. An example application
of our approach is as a gameplay video search engine to aid in
reproducing video game bugs. Please visit the following link for the
code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/},
keywords = {Bug report, Computer games, Game development, Gameplay videos, Gaming},
pubstate = {published},
tppubtype = {inproceedings}
}
act with the game and how the game responds. Sharing gameplay
videos on social media platforms, such as Reddit, has become a
common practice for many players. Often, players will share game-
play videos that showcase video game bugs. Such gameplay videos
are software artifacts that can be utilized for game testing, as they
provide insight for bug analysis. Although large repositories of
gameplay videos exist, parsing and mining them in an effective and
structured fashion has still remained a big challenge. In this paper,
we propose a search method that accepts any English text query as
input to retrieve relevant videos from large repositories of gameplay
videos. Our approach does not rely on any external information
(such as video metadata); it works solely based on the content of the
video. By leveraging the zero-shot transfer capabilities of the Con-
trastive Language-Image Pre-Training (CLIP) model, our approach
does not require any data labeling or training. To evaluate our ap-
proach, we present the GamePhysics dataset consisting of 26,954
videos from 1,873 games, that were collected from the GamePhysics
section on the Reddit website. Our approach shows promising re-
sults in our extensive analysis of simple queries, compound queries,
and bug queries, indicating that our approach is useful for object
and event detection in gameplay videos. An example application
of our approach is as a gameplay video search engine to aid in
reproducing video game bugs. Please visit the following link for the
code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/
Dayi Lin; Cor-Paul Bezemer; Ahmed E. Hassan
Identifying Gameplay Videos that Exhibit Bugs in Computer Games Journal Article
Empirical Software Engineering Journal (EMSE), 2019.
Abstract | BibTeX | Tags: Bug report, Computer games, Gameplay videos, Steam
@article{Lin2019videos,
title = {Identifying Gameplay Videos that Exhibit Bugs in Computer Games},
author = {Dayi Lin and Cor-Paul Bezemer and Ahmed E. Hassan},
year = {2019},
date = {2019-05-21},
urldate = {2019-05-21},
journal = {Empirical Software Engineering Journal (EMSE)},
abstract = {With the rapid growing market and competition in the gaming industry, it is challenging to develop a successful game, making the quality of games very important. To improve the quality of games, developers commonly use gamer-submitted bug reports to locate bugs in games. Recently, gameplay videos have become popular in the gaming community. A few of these videos showcase a bug, offering developers a new opportunity to collect context-rich bug information.
In this paper, we investigate whether videos that showcase a bug can automatically be identified from the metadata of gameplay videos that are readily available online. Such bug videos could then be used as a supplemental source of bug information for game developers. We studied the number of gameplay videos on the Steam platform, one of the most popular digital game distribution platforms, and the difficulty of identifying bug videos from these gameplay videos. We show that naïve approaches such as using keywords to search for bug videos are time-consuming and imprecise. We propose an approach which uses a random forest classifier to rank gameplay videos based on their likelihood of being a bug video. Our proposed approach achieves a precision that is 43% higher than that of the naïve keyword searching approach on a manually labelled dataset of 96 videos. In addition, by evaluating 1,400 videos that are identified by our approach as bug videos, we calculated that our approach has both a mean average precision at 10 and a mean average precision at 100 of 0.91. Our study demonstrates that it is feasible to automatically identify gameplay videos that showcase a bug.},
keywords = {Bug report, Computer games, Gameplay videos, Steam},
pubstate = {published},
tppubtype = {article}
}
In this paper, we investigate whether videos that showcase a bug can automatically be identified from the metadata of gameplay videos that are readily available online. Such bug videos could then be used as a supplemental source of bug information for game developers. We studied the number of gameplay videos on the Steam platform, one of the most popular digital game distribution platforms, and the difficulty of identifying bug videos from these gameplay videos. We show that naïve approaches such as using keywords to search for bug videos are time-consuming and imprecise. We propose an approach which uses a random forest classifier to rank gameplay videos based on their likelihood of being a bug video. Our proposed approach achieves a precision that is 43% higher than that of the naïve keyword searching approach on a manually labelled dataset of 96 videos. In addition, by evaluating 1,400 videos that are identified by our approach as bug videos, we calculated that our approach has both a mean average precision at 10 and a mean average precision at 100 of 0.91. Our study demonstrates that it is feasible to automatically identify gameplay videos that showcase a bug.