( = Paper PDF,
= Presentation slides,
= Presentation video)
Mohammad Reza Taesiri; Cor-Paul Bezemer
VIDEOGAMEBUNNY: Towards vision assistants for video games Inproceedings
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025.
Abstract | BibTeX | Tags: Computer games, Foundation models, Game development, Game testing
@inproceedings{Taesiri_VideoGameBunny,
title = {VIDEOGAMEBUNNY: Towards vision assistants for video games},
author = {Mohammad Reza Taesiri and Cor-Paul Bezemer },
year = {2025},
date = {2025-03-01},
booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
abstract = {Large multimodal models (LMMs) hold substantial
promise across various domains, from personal assistance
in daily tasks to sophisticated applications like medical di-
agnostics. However, their capabilities have limitations in
the video game domain, such as challenges with scene un-
derstanding, hallucinations, and inaccurate descriptions of
video game content, especially in open-source models. This
paper describes the development of VIDEOGAMEBUNNY,
a LLaVA-style model based on Bunny, specifically tailored
for understanding images from video games. We release
intermediate checkpoints, training logs, and an extensive
dataset comprising 185,259 video game images from 413
titles, along with 389,565 image-instruction pairs that in-
clude image captions, question-answer pairs, and a JSON
representation of 16 elements of 136,974 images. Our ex-
periments show that our high quality game-related data
has the potential to make a relatively small model outper-
form the much larger state-of-the-art model LLaVa-1.6-34b
(which has more than 4x the number of parameters). Our
study paves the way for future research in video game un-
derstanding on tasks such as playing, commentary, and
debugging. Code and data are available at: https://videogamebunny.github.io/},
keywords = {Computer games, Foundation models, Game development, Game testing},
pubstate = {published},
tppubtype = {inproceedings}
}
promise across various domains, from personal assistance
in daily tasks to sophisticated applications like medical di-
agnostics. However, their capabilities have limitations in
the video game domain, such as challenges with scene un-
derstanding, hallucinations, and inaccurate descriptions of
video game content, especially in open-source models. This
paper describes the development of VIDEOGAMEBUNNY,
a LLaVA-style model based on Bunny, specifically tailored
for understanding images from video games. We release
intermediate checkpoints, training logs, and an extensive
dataset comprising 185,259 video game images from 413
titles, along with 389,565 image-instruction pairs that in-
clude image captions, question-answer pairs, and a JSON
representation of 16 elements of 136,974 images. Our ex-
periments show that our high quality game-related data
has the potential to make a relatively small model outper-
form the much larger state-of-the-art model LLaVa-1.6-34b
(which has more than 4x the number of parameters). Our
study paves the way for future research in video game un-
derstanding on tasks such as playing, commentary, and
debugging. Code and data are available at: https://videogamebunny.github.io/
Mohammad Reza Taesiri
Leveraging Foundation Models for Video Game Quality Assurance PhD Thesis
2024.
Abstract | BibTeX | Tags: Computer games, Computer vision, Game development, Game testing, Gameplay videos, Machine learning, Software quality
@phdthesis{phd_taesiri,
title = {Leveraging Foundation Models for Video Game Quality Assurance},
author = {Mohammad Reza Taesiri },
year = {2024},
date = {2024-09-25},
abstract = {The video game industry has become a powerhouse in the global entertainment econ-
omy. Creating engaging, high-quality games demands intricate development processes
and significant resources. As projects grow in complexity and scale, developers often
grapple with demanding schedules, tight deadlines, and the risk of burnout. These
pressures highlight the need for more efficient development strategies, with quality
assurance (QA) emerging as a critical area for optimization.
Artificial Intelligence (AI) has the potential to address these challenges by en-
hancing the game QA processes in large gaming companies. Specifically, foundation
models—large pre-trained AI models—offer promising applications to improve these
processes. Exploring novel uses of these advanced AI models could reveal their poten-
tial and limitations in optimizing game development workflows, potentially alleviating
some of the industry’s pressing issues and facilitating the creation of high-quality, en-
gaging games.
In this thesis, my goal is to improve video game testing processes by leveraging
foundation models to ensure the final product reaches a desirable quality. I explore
new opportunities that foundation models bring to game testing, from searching for
instances of game bugs within video repositories to assisting human testers in catching
bugs, through three studies:
First, I investigate the utility of image-text foundation models in retrieving game-
play videos. In this study, I create a video search engine designed to help developers
efficiently search video repositories for examples of video game bugs using textual
descriptions. For example, developers can find all instances of a bug by using a tex-
tual description of the bug, such as a horse flying in the air. This study lays the
groundwork for AI-based game QA processes, with results demonstrating significant
potential.
Next, I introduce GlitchBench, a benchmarking dataset of video game glitches
and anomalies designed to assess state-of-the-art large multimodal models, such as
GPT-4V, in detecting and understanding game bugs. This extensive dataset includes
a wide range of images depicting various glitches, sourced from both online platforms
and synthetic sets created within the Unity game engine. GlitchBench includes both
common and rare glitches encountered in the video game quality assurance process.
The findings from this study highlight both the promise and limitations of existing
models, particularly in unusual and rare cases.
Lastly, I introduce VideoGameBunny, a large multimodal model specifically
trained for video game content, accompanied by a dataset of 389,565 image-instruction
pairs. My analysis demonstrates that VideoGameBunny outperforms much larger
models in video game understanding tasks while using 4.2× fewer parameters. This
result underscores the effectiveness and promise of using a high-quality dataset to
improve models’ understanding of video games, thus making them more effective in
the game QA process.
Future work should focus on enhancing the generalization and robustness of AI
models in the gaming context, particularly through better integration of vision and
language components. This integration could be achieved using either early or late fu-
sion methods. For late fusion methods, where two pre-trained models are connected,
better alignment between these components can be achieved through improved train-
ing data and strategies. Alternatively, early fusion techniques, which involve training
both components simultaneously to enhance their integration, can overcome many
issues that existing models have.},
keywords = {Computer games, Computer vision, Game development, Game testing, Gameplay videos, Machine learning, Software quality},
pubstate = {published},
tppubtype = {phdthesis}
}
omy. Creating engaging, high-quality games demands intricate development processes
and significant resources. As projects grow in complexity and scale, developers often
grapple with demanding schedules, tight deadlines, and the risk of burnout. These
pressures highlight the need for more efficient development strategies, with quality
assurance (QA) emerging as a critical area for optimization.
Artificial Intelligence (AI) has the potential to address these challenges by en-
hancing the game QA processes in large gaming companies. Specifically, foundation
models—large pre-trained AI models—offer promising applications to improve these
processes. Exploring novel uses of these advanced AI models could reveal their poten-
tial and limitations in optimizing game development workflows, potentially alleviating
some of the industry’s pressing issues and facilitating the creation of high-quality, en-
gaging games.
In this thesis, my goal is to improve video game testing processes by leveraging
foundation models to ensure the final product reaches a desirable quality. I explore
new opportunities that foundation models bring to game testing, from searching for
instances of game bugs within video repositories to assisting human testers in catching
bugs, through three studies:
First, I investigate the utility of image-text foundation models in retrieving game-
play videos. In this study, I create a video search engine designed to help developers
efficiently search video repositories for examples of video game bugs using textual
descriptions. For example, developers can find all instances of a bug by using a tex-
tual description of the bug, such as a horse flying in the air. This study lays the
groundwork for AI-based game QA processes, with results demonstrating significant
potential.
Next, I introduce GlitchBench, a benchmarking dataset of video game glitches
and anomalies designed to assess state-of-the-art large multimodal models, such as
GPT-4V, in detecting and understanding game bugs. This extensive dataset includes
a wide range of images depicting various glitches, sourced from both online platforms
and synthetic sets created within the Unity game engine. GlitchBench includes both
common and rare glitches encountered in the video game quality assurance process.
The findings from this study highlight both the promise and limitations of existing
models, particularly in unusual and rare cases.
Lastly, I introduce VideoGameBunny, a large multimodal model specifically
trained for video game content, accompanied by a dataset of 389,565 image-instruction
pairs. My analysis demonstrates that VideoGameBunny outperforms much larger
models in video game understanding tasks while using 4.2× fewer parameters. This
result underscores the effectiveness and promise of using a high-quality dataset to
improve models’ understanding of video games, thus making them more effective in
the game QA process.
Future work should focus on enhancing the generalization and robustness of AI
models in the gaming context, particularly through better integration of vision and
language components. This integration could be achieved using either early or late fu-
sion methods. For late fusion methods, where two pre-trained models are connected,
better alignment between these components can be achieved through improved train-
ing data and strategies. Alternatively, early fusion techniques, which involve training
both components simultaneously to enhance their integration, can overcome many
issues that existing models have.
Ian Gauk; Cor-Paul Bezemer
Detecting Discrepancies between Subtitles and Audio in Gameplay Videos with EchoTest Journal Article
IEEE Transactions on Games, 2024.
Abstract | BibTeX | Tags: Accessibility testing, Computer games, Game accessibility, Game development, Game testing
@article{Gauk_EchoTest,
title = {Detecting Discrepancies between Subtitles and Audio in Gameplay Videos with EchoTest},
author = {Ian Gauk and Cor-Paul Bezemer},
year = {2024},
date = {2024-07-30},
journal = {IEEE Transactions on Games},
abstract = {The landscape of accessibility features in video
games remains inconsistent, posing challenges for gamers who
seek experiences tailored to their needs. Accessibility features
such as subtitles are widely used by players but are difficult to
test manually due to the large scope of games and the variability
in how subtitles can appear.
In this paper, we introduce an automated approach
(ECHOTEST) to extract subtitles and spoken audio from a
gameplay video, convert them into text, and compare them to
detect discrepancies such as typos, desynchronization and missing
text. ECHOTEST can be used by game developers to identify
discrepancies between subtitles and spoken audio in their games,
enabling them to better test the accessibility of their games.
In an empirical study on gameplay videos from 15 popular
games, ECHOTEST can verify discrepancies between subtitles and
audio with a precision of 98% and a recall of 89%. In addition,
ECHOTEST performs well with a precision of 73% and a recall
of 99% on a challenging generated benchmark.},
keywords = {Accessibility testing, Computer games, Game accessibility, Game development, Game testing},
pubstate = {published},
tppubtype = {article}
}
games remains inconsistent, posing challenges for gamers who
seek experiences tailored to their needs. Accessibility features
such as subtitles are widely used by players but are difficult to
test manually due to the large scope of games and the variability
in how subtitles can appear.
In this paper, we introduce an automated approach
(ECHOTEST) to extract subtitles and spoken audio from a
gameplay video, convert them into text, and compare them to
detect discrepancies such as typos, desynchronization and missing
text. ECHOTEST can be used by game developers to identify
discrepancies between subtitles and spoken audio in their games,
enabling them to better test the accessibility of their games.
In an empirical study on gameplay videos from 15 popular
games, ECHOTEST can verify discrepancies between subtitles and
audio with a precision of 98% and a recall of 89%. In addition,
ECHOTEST performs well with a precision of 73% and a recall
of 99% on a challenging generated benchmark.
Mohammad Reza Taesiri; Tianjun Feng; Anh Nguyen; Cor-Paul Bezemer
GlitchBench: Can large multimodal models detect video game glitches? Inproceedings
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Abstract | BibTeX | Tags: Computer games, Foundation models, Game development, Gameplay videos, LLM
@inproceedings{TaesiriCVPR2024,
title = {GlitchBench: Can large multimodal models detect video game glitches?},
author = {Mohammad Reza Taesiri and Tianjun Feng and Anh Nguyen and Cor-Paul Bezemer},
year = {2024},
date = {2024-06-15},
urldate = {2024-03-15},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
abstract = {Large multimodal models (LMMs) have evolved from
large language models (LLMs) to integrate multiple input
modalities, such as visual inputs. This integration augments
the capacity of LLMs for tasks requiring visual comprehen-
sion and reasoning. However, the extent and limitations of
their enhanced abilities are not fully understood, especially
when it comes to real-world tasks. To address this gap, we
introduce GlitchBench, a novel benchmark derived from
video game quality assurance tasks, to test and evaluate the
reasoning capabilities of LMMs. Our benchmark is curated
from a variety of unusual and glitched scenarios from video
games and aims to challenge both the visual and linguis-
tic reasoning powers of LMMs in detecting and interpreting
out-of-the-ordinary events. We evaluate multiple state-of-
the-art LMMs, and we show that GlitchBench presents a
new challenge for these models. Code and data are avail-
able at: https://glitchbench.github.io/},
keywords = {Computer games, Foundation models, Game development, Gameplay videos, LLM},
pubstate = {published},
tppubtype = {inproceedings}
}
large language models (LLMs) to integrate multiple input
modalities, such as visual inputs. This integration augments
the capacity of LLMs for tasks requiring visual comprehen-
sion and reasoning. However, the extent and limitations of
their enhanced abilities are not fully understood, especially
when it comes to real-world tasks. To address this gap, we
introduce GlitchBench, a novel benchmark derived from
video game quality assurance tasks, to test and evaluate the
reasoning capabilities of LMMs. Our benchmark is curated
from a variety of unusual and glitched scenarios from video
games and aims to challenge both the visual and linguis-
tic reasoning powers of LMMs in detecting and interpreting
out-of-the-ordinary events. We evaluate multiple state-of-
the-art LMMs, and we show that GlitchBench presents a
new challenge for these models. Code and data are avail-
able at: https://glitchbench.github.io/
Mohammad Reza Taesiri; Finlay Macklon; Sarra Habchi; Cor-Paul Bezemer
Searching bug instances in gameplay video repositories Journal Article
IEEE Transactions on Games, 2024.
Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming
@article{TaesiriTG2024,
title = {Searching bug instances in gameplay video repositories},
author = {Mohammad Reza Taesiri and Finlay Macklon and Sarra Habchi and Cor-Paul Bezemer},
year = {2024},
date = {2024-01-17},
urldate = {2024-01-17},
journal = {IEEE Transactions on Games},
abstract = {Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs.
Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This paper introduces a method
for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no
external information, like video metadata; it solely depends on video content. Leveraging the zero-shot transfer capabilities of the
Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our
approach, we present the GamePhysics dataset, comprising 26,954 videos from 1,873 games that were collected from the
GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound
queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the
effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate
the potential of our approach for applications such as the creation of a video search tool tailored to identifying video game bugs, which
could greatly benefit Quality Assurance (QA) teams in finding and reproducing bugs. The code and data used in this paper can be
found at https://zenodo.org/records/10211390},
keywords = {Bug report, Computer games, Game development, Gameplay videos, Gaming},
pubstate = {published},
tppubtype = {article}
}
Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This paper introduces a method
for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no
external information, like video metadata; it solely depends on video content. Leveraging the zero-shot transfer capabilities of the
Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our
approach, we present the GamePhysics dataset, comprising 26,954 videos from 1,873 games that were collected from the
GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound
queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the
effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate
the potential of our approach for applications such as the creation of a video search tool tailored to identifying video game bugs, which
could greatly benefit Quality Assurance (QA) teams in finding and reproducing bugs. The code and data used in this paper can be
found at https://zenodo.org/records/10211390
Markos Viggiato; Dale Paas; Cor-Paul Bezemer
Prioritizing Natural Language Test Cases Based on Highly-Used Game Features Inproceedings
Proceedings of the 31st Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 1–12, 2023.
Abstract | BibTeX | Tags: Computer games, Game development, Natural language processing, Testing
@inproceedings{ViggiatoFSE2023,
title = {Prioritizing Natural Language Test Cases Based on Highly-Used Game Features},
author = {Markos Viggiato and Dale Paas and Cor-Paul Bezemer },
year = {2023},
date = {2023-12-01},
urldate = {2023-12-01},
booktitle = {Proceedings of the 31st Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)},
pages = {1--12},
abstract = {Software testing is still a manual activity in many industries, such
as the gaming industry. But manually executing tests becomes im-
practical as the system grows and resources are restricted, mainly
in a scenario with short release cycles. Test case prioritization is a
commonly used technique to optimize the test execution. However,
most prioritization approaches do not work for manual test cases
as they require source code information or test execution history,
which is often not available in a manual testing scenario. In this
paper, we propose a prioritization approach for manual test cases
written in natural language based on the tested application features
(in particular, highly-used application features). Our approach con-
sists of (1) identifying the tested features from natural language test
cases (with zero-shot classification techniques) and (2) prioritizing
test cases based on the features that they test. We leveraged the
NSGA-II genetic algorithm for the multi-objective optimization of
the test case ordering to maximize the coverage of highly-used
features while minimizing the cumulative execution time. Our find-
ings show that we can successfully identify the application features
covered by test cases using an ensemble of pre-trained models
with strong zero-shot capabilities (an F-score of 76.1%). Also, our
prioritization approaches can find test case orderings that cover
highly-used application features early in the test execution while
keeping the time required to execute test cases short. QA engineers
can use our approach to focus the test execution on test cases that
cover features that are relevant to users.},
keywords = {Computer games, Game development, Natural language processing, Testing},
pubstate = {published},
tppubtype = {inproceedings}
}
as the gaming industry. But manually executing tests becomes im-
practical as the system grows and resources are restricted, mainly
in a scenario with short release cycles. Test case prioritization is a
commonly used technique to optimize the test execution. However,
most prioritization approaches do not work for manual test cases
as they require source code information or test execution history,
which is often not available in a manual testing scenario. In this
paper, we propose a prioritization approach for manual test cases
written in natural language based on the tested application features
(in particular, highly-used application features). Our approach con-
sists of (1) identifying the tested features from natural language test
cases (with zero-shot classification techniques) and (2) prioritizing
test cases based on the features that they test. We leveraged the
NSGA-II genetic algorithm for the multi-objective optimization of
the test case ordering to maximize the coverage of highly-used
features while minimizing the cumulative execution time. Our find-
ings show that we can successfully identify the application features
covered by test cases using an ensemble of pre-trained models
with strong zero-shot capabilities (an F-score of 76.1%). Also, our
prioritization approaches can find test case orderings that cover
highly-used application features early in the test execution while
keeping the time required to execute test cases short. QA engineers
can use our approach to focus the test execution on test cases that
cover features that are relevant to users.
Markos Viggiato
Leveraging Natural Language Processing Techniques to Improve Manual Game Testing PhD Thesis
2023.
Abstract | BibTeX | Tags: Computer games, Game development, Natural language processing, Testing
@phdthesis{ViggiatoPhD,
title = {Leveraging Natural Language Processing Techniques to Improve Manual Game Testing},
author = {Markos Viggiato },
year = {2023},
date = {2023-01-17},
urldate = {2023-01-17},
abstract = {The gaming industry has experienced a sharp growth in recent years, surpassing other popular entertainment segments, such as the film industry. With the ever-increasing scale of the gaming industry and the fact that players are extremely difficult to satisfy, it has become extremely challenging to develop a successful game. In this context, the quality of games has become a critical issue. Game testing is a widely-performed activity to ensure that games meet the desired quality criteria. However, despite recent advancements in test automation, manual game testing is still prevalent in the gaming industry, with test cases often described in natural language only and consisting of one or more test steps that must be manually performed by the Quality Assurance (QA) engineer (i.e., the tester). This makes game testing challenging and costly. Issues such as redundancy (i.e., when different test cases have the same testing objective) and incompleteness (i.e., when test cases miss one or more steps) become a bigger concern in a manual game testing scenario. In addition, as games become bigger and the number of required test cases increases, it becomes impractical to execute all test cases in a scenario with short game release cycles, for example.
Prior work proposed several approaches to analyze and improve test cases with associated source code. However, there is little research on improving manual game testing. Having higher-quality test cases and optimizing test execution help to reduce wasted developer time and allow testers to use testing resources more effectively, which makes game testing more efficient and effective. In addition, even though players are extremely difficult to satisfy, their priorities are not considered during game testing. In this thesis, we investigate how to improve manual game testing from different perspectives.
In the first part of the thesis, we investigated how we can reduce redundancy in the test suite by identifying similar natural language test cases. We evaluated several unsupervised approaches using text embedding, text similarity, and cluster-
ing techniques and showed that we can successfully identify similar test cases with a high performance. We also investigated how we can improve test case descriptions to reduce the number of unclear, ambiguous, and incomplete test cases. We proposed and evaluated an automated framework that leverages statistical and neural language models and (1) provides recommendations to improve test case descriptions, (2) recommends potentially missing steps, and (3) suggests existing similar test cases.
In the second part of the thesis, we investigated how player priorities can be included in the game testing process. We first proposed an approach to prioritize test cases that cover the game features that players use the most, which helps to avoid bugs that could affect a very large number of players. Our approach (1) identifies the game features covered by test cases using an ensemble of zero-shot techniques with a high performance and (2) optimizes the test execution based on highly-used game features covered by test cases. Finally, we investigated how sentiment classifiers perform on game reviews and what issues affect those classifiers. High-performing classifiers can be used to obtain players' sentiments about games and guide testing based on the game features that players like or dislike. We show that, while traditional sentiment classifiers do not perform well, a modern classifier (the OPT-175B Large Language Model) presents a (far) better performance. The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches to support QA engineers and developers to improve manual game testing.},
keywords = {Computer games, Game development, Natural language processing, Testing},
pubstate = {published},
tppubtype = {phdthesis}
}
Prior work proposed several approaches to analyze and improve test cases with associated source code. However, there is little research on improving manual game testing. Having higher-quality test cases and optimizing test execution help to reduce wasted developer time and allow testers to use testing resources more effectively, which makes game testing more efficient and effective. In addition, even though players are extremely difficult to satisfy, their priorities are not considered during game testing. In this thesis, we investigate how to improve manual game testing from different perspectives.
In the first part of the thesis, we investigated how we can reduce redundancy in the test suite by identifying similar natural language test cases. We evaluated several unsupervised approaches using text embedding, text similarity, and cluster-
ing techniques and showed that we can successfully identify similar test cases with a high performance. We also investigated how we can improve test case descriptions to reduce the number of unclear, ambiguous, and incomplete test cases. We proposed and evaluated an automated framework that leverages statistical and neural language models and (1) provides recommendations to improve test case descriptions, (2) recommends potentially missing steps, and (3) suggests existing similar test cases.
In the second part of the thesis, we investigated how player priorities can be included in the game testing process. We first proposed an approach to prioritize test cases that cover the game features that players use the most, which helps to avoid bugs that could affect a very large number of players. Our approach (1) identifies the game features covered by test cases using an ensemble of zero-shot techniques with a high performance and (2) optimizes the test execution based on highly-used game features covered by test cases. Finally, we investigated how sentiment classifiers perform on game reviews and what issues affect those classifiers. High-performing classifiers can be used to obtain players' sentiments about games and guide testing based on the game features that players like or dislike. We show that, while traditional sentiment classifiers do not perform well, a modern classifier (the OPT-175B Large Language Model) presents a (far) better performance. The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches to support QA engineers and developers to improve manual game testing.
Finlay Macklon; Mohammad Reza Taesiri; Markos Viggiato; Stefan Antoszko; Natalia Romanova; Dale Paas; Cor-Paul Bezemer
Automatically Detecting Visual Bugs in HTML5 <canvas> Games Inproceedings
37th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2022.
BibTeX | Tags: Computer games, Game development, Gaming, Regression testing, Testing, Web applications
@inproceedings{finlay_ase2022,
title = {Automatically Detecting Visual Bugs in HTML5
Mohammad Reza Taesiri; Finlay Macklon; Cor-Paul Bezemer
CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning Inproceedings
International Conference on Mining Software Repositories (MSR), 2022.
Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming
@inproceedings{TaesiriMSR2022,
title = {CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning},
author = {Mohammad Reza Taesiri and Finlay Macklon and Cor-Paul Bezemer},
year = {2022},
date = {2022-03-24},
urldate = {2022-03-24},
booktitle = {International Conference on Mining Software Repositories (MSR)},
abstract = {Gameplay videos contain rich information about how players inter-
act with the game and how the game responds. Sharing gameplay
videos on social media platforms, such as Reddit, has become a
common practice for many players. Often, players will share game-
play videos that showcase video game bugs. Such gameplay videos
are software artifacts that can be utilized for game testing, as they
provide insight for bug analysis. Although large repositories of
gameplay videos exist, parsing and mining them in an effective and
structured fashion has still remained a big challenge. In this paper,
we propose a search method that accepts any English text query as
input to retrieve relevant videos from large repositories of gameplay
videos. Our approach does not rely on any external information
(such as video metadata); it works solely based on the content of the
video. By leveraging the zero-shot transfer capabilities of the Con-
trastive Language-Image Pre-Training (CLIP) model, our approach
does not require any data labeling or training. To evaluate our ap-
proach, we present the GamePhysics dataset consisting of 26,954
videos from 1,873 games, that were collected from the GamePhysics
section on the Reddit website. Our approach shows promising re-
sults in our extensive analysis of simple queries, compound queries,
and bug queries, indicating that our approach is useful for object
and event detection in gameplay videos. An example application
of our approach is as a gameplay video search engine to aid in
reproducing video game bugs. Please visit the following link for the
code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/},
keywords = {Bug report, Computer games, Game development, Gameplay videos, Gaming},
pubstate = {published},
tppubtype = {inproceedings}
}
act with the game and how the game responds. Sharing gameplay
videos on social media platforms, such as Reddit, has become a
common practice for many players. Often, players will share game-
play videos that showcase video game bugs. Such gameplay videos
are software artifacts that can be utilized for game testing, as they
provide insight for bug analysis. Although large repositories of
gameplay videos exist, parsing and mining them in an effective and
structured fashion has still remained a big challenge. In this paper,
we propose a search method that accepts any English text query as
input to retrieve relevant videos from large repositories of gameplay
videos. Our approach does not rely on any external information
(such as video metadata); it works solely based on the content of the
video. By leveraging the zero-shot transfer capabilities of the Con-
trastive Language-Image Pre-Training (CLIP) model, our approach
does not require any data labeling or training. To evaluate our ap-
proach, we present the GamePhysics dataset consisting of 26,954
videos from 1,873 games, that were collected from the GamePhysics
section on the Reddit website. Our approach shows promising re-
sults in our extensive analysis of simple queries, compound queries,
and bug queries, indicating that our approach is useful for object
and event detection in gameplay videos. An example application
of our approach is as a gameplay video search engine to aid in
reproducing video game bugs. Please visit the following link for the
code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/
Arthur V. Kamienski
Studying Trends, Topics, and Duplicate Questions on Q&A Websites for Game Developers Masters Thesis
University of Alberta, 2021.
Abstract | BibTeX | Tags: Computer games, Q&A websites
@mastersthesis{msc_arthur,
title = {Studying Trends, Topics, and Duplicate Questions on Q&A Websites for Game Developers},
author = {Arthur V. Kamienski},
year = {2021},
date = {2021-09-29},
urldate = {2021-09-29},
school = {University of Alberta},
abstract = {The game development industry is growing and there is a high demand for develop-
ers that can produce high-quality games. These developers need resources to learn
and improve the skills required to build those games in a reliable and easy manner.
Question and Answer (Q&A) websites are learning resources that are commonly used
by software developers to share knowledge and acquire the information they need.
However, we still know little about how game developers use and interact with Q&A
websites. In this thesis, we analyze the largest Q&A websites that discuss game de-
velopment to understand how effective they are as learning resources and what can
be improved to build a better Q&A community for their users.
In the first part of this thesis, we analyzed data collected from four Q&A websites,
namely Unity Answers, the Unreal Engine 4 (UE4) AnswerHub, the Game Develop-
ment Stack Exchange, and Stack Overflow, to assess their effectiveness in helping
game developers. We also used the 347 responses collected from a survey we ran
with game developers to gauge their perception of Q&A websites. We found that
the studied websites are in decline, with their activity and effectiveness decreasing
over the last few years and users having an overall negative view of the studied Q&A
communities. We also characterized the topics discussed in those websites using a
latent Dirichlet allocation (LDA) model, and analyze how those topics differ across
websites. Finally, we give recommendations to guide developers to the websites that
are most effective in answering the types of questions they have, which could help the
websites in overcoming their decline.
In the second part of the thesis, we explored how we can further help Q&A web-
sites for game developers by automatically identifying duplicate questions. Duplicate
questions have a negative impact on Q&A websites by overloading them with ques-
tions that have already been answered. Therefore, we analyzed the performance of
seven unsupervised and pre-trained techniques on the task of detecting duplicate
questions on Q&A websites for game developers. We achieved the highest perfor-
mance when comparing all the text content of questions and their answers using a
pre-trained technique based on MPNet. Furthermore, we could almost double the
performance by combining all of the techniques into a single question similarity score
using supervised models. Lastly, we show that the supervised models can be used
on websites different from the ones they were trained on with little to no decrease in
performance. Our findings can be used by Q&A websites and future researchers to
build better systems for duplicate question detection, which can ultimately provide
game developers with better Q&A communities.},
keywords = {Computer games, Q&A websites},
pubstate = {published},
tppubtype = {mastersthesis}
}
ers that can produce high-quality games. These developers need resources to learn
and improve the skills required to build those games in a reliable and easy manner.
Question and Answer (Q&A) websites are learning resources that are commonly used
by software developers to share knowledge and acquire the information they need.
However, we still know little about how game developers use and interact with Q&A
websites. In this thesis, we analyze the largest Q&A websites that discuss game de-
velopment to understand how effective they are as learning resources and what can
be improved to build a better Q&A community for their users.
In the first part of this thesis, we analyzed data collected from four Q&A websites,
namely Unity Answers, the Unreal Engine 4 (UE4) AnswerHub, the Game Develop-
ment Stack Exchange, and Stack Overflow, to assess their effectiveness in helping
game developers. We also used the 347 responses collected from a survey we ran
with game developers to gauge their perception of Q&A websites. We found that
the studied websites are in decline, with their activity and effectiveness decreasing
over the last few years and users having an overall negative view of the studied Q&A
communities. We also characterized the topics discussed in those websites using a
latent Dirichlet allocation (LDA) model, and analyze how those topics differ across
websites. Finally, we give recommendations to guide developers to the websites that
are most effective in answering the types of questions they have, which could help the
websites in overcoming their decline.
In the second part of the thesis, we explored how we can further help Q&A web-
sites for game developers by automatically identifying duplicate questions. Duplicate
questions have a negative impact on Q&A websites by overloading them with ques-
tions that have already been answered. Therefore, we analyzed the performance of
seven unsupervised and pre-trained techniques on the task of detecting duplicate
questions on Q&A websites for game developers. We achieved the highest perfor-
mance when comparing all the text content of questions and their answers using a
pre-trained technique based on MPNet. Furthermore, we could almost double the
performance by combining all of the techniques into a single question similarity score
using supervised models. Lastly, we show that the supervised models can be used
on websites different from the ones they were trained on with little to no decrease in
performance. Our findings can be used by Q&A websites and future researchers to
build better systems for duplicate question detection, which can ultimately provide
game developers with better Q&A communities.
Quang N. Vu; Cor-Paul Bezemer
Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games Identifying Important Requirements of a Recommender System Inproceedings
International Conference on the Foundations of Digital Games (FDG), pp. 1–12, 2021.
Abstract | BibTeX | Tags: Computer games, Game discoverability, Indie games, itch.io, Steam
@inproceedings{Quang21,
title = {Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games Identifying Important Requirements of a Recommender System},
author = {Quang N. Vu and Cor-Paul Bezemer},
year = {2021},
date = {2021-04-07},
urldate = {2021-04-07},
booktitle = {International Conference on the Foundations of Digital Games (FDG)},
pages = {1--12},
abstract = {Indie games often lack visibility as compared to top-selling games due to their limited marketing budget and the fact that there are a large number of indie games. Players of top-selling games usually like certain types of games or certain game elements such as theme, gameplay, storyline. Therefore, indie games could leverage their shared game elements with top-selling games to get discovered. In this paper, we propose an approach to improve the discoverability of indie games by recommending similar indie games to gamers of top-selling games. We first matched 2,830 itch.io indie games to 326 top-selling Steam games. We then contacted the indie game
developers for evaluation feedback and suggestions. We found that the majority of them (67.9%) who offered verbose responses show positive support for our approach.We also analyzed the reasons for bad recommendations and the suggestions by indie game developers to lay out the important requirements for such a recommendation system. The most important ones are: a standardized and extensive tag and genre ontology system is needed to bridge the two platforms, the expectations of players of top-selling games should be managed to avoid disappointment, a player’s preferences should be integrated when making recommendations, a standardized age restriction rule is needed, and finally, the recommendation tool should also show indie games that are the least similar or less popular.},
keywords = {Computer games, Game discoverability, Indie games, itch.io, Steam},
pubstate = {published},
tppubtype = {inproceedings}
}
developers for evaluation feedback and suggestions. We found that the majority of them (67.9%) who offered verbose responses show positive support for our approach.We also analyzed the reasons for bad recommendations and the suggestions by indie game developers to lay out the important requirements for such a recommendation system. The most important ones are: a standardized and extensive tag and genre ontology system is needed to bridge the two platforms, the expectations of players of top-selling games should be managed to avoid disappointment, a player’s preferences should be integrated when making recommendations, a standardized age restriction rule is needed, and finally, the recommendation tool should also show indie games that are the least similar or less popular.
Markos Viggiato; Dayi Lin; Abram Hindle; Cor-Paul Bezemer
What Causes Wrong Sentiment Classifications of Game Reviews? Journal Article
IEEE Transactions on Games, pp. 1–14, 2021.
Abstract | BibTeX | Tags: Computer games, Natural language processing, Sentiment analysis, Steam
@article{markos2021sentiment,
title = {What Causes Wrong Sentiment Classifications of Game Reviews?},
author = {Markos Viggiato and Dayi Lin and Abram Hindle and Cor-Paul Bezemer},
year = {2021},
date = {2021-04-05},
urldate = {2021-04-05},
journal = {IEEE Transactions on Games},
pages = {1--14},
institution = {University of Alberta},
abstract = {Sentiment analysis is a popular technique to identify the sentiment of a piece of text. Several different domains have been targeted by sentiment analysis research, such as Twitter, movie reviews, and mobile app reviews. Although several techniques have been proposed, the performance of current sentiment analysis techniques are still far from acceptable, mainly when applied in domains on which they were not trained. In addition, the causes of wrong classifications are not clear. In this paper, we study how sentiment analysis performs on game reviews. We first report the results of a large scale empirical study on the performance of widely-used sentiment classifiers on game reviews. Then, we investigate the root causes for the wrong classifications and quantify the impact of each cause on the overall performance. We study three existing classifiers: Stanford CoreNLP, NLTK, and SentiStrength. Our results show that most classifiers do not perform well on game reviews, with the best one being NLTK (with an AUC of 0.70). We also identified four main causes for wrong classifications, such as reviews that point out advantages and disadvantages of the game, which might confuse the classifier. The identified causes are not trivial to be resolved and we call upon sentiment analysis and game researchers and developers to prioritize a research agenda that investigates how the performance of sentiment analysis of game reviews can be improved, for instance by developing techniques that can automatically deal with specific game-related issues of reviews (e.g., reviews with advantages and disadvantages). Finally, we show that training sentiment classifiers on reviews that are stratified by the game genre is effective.},
keywords = {Computer games, Natural language processing, Sentiment analysis, Steam},
pubstate = {published},
tppubtype = {article}
}
Dayi Lin; Cor-Paul Bezemer; Ahmed E. Hassan
Identifying Gameplay Videos that Exhibit Bugs in Computer Games Journal Article
Empirical Software Engineering Journal (EMSE), 2019.
Abstract | BibTeX | Tags: Bug report, Computer games, Gameplay videos, Steam
@article{Lin2019videos,
title = {Identifying Gameplay Videos that Exhibit Bugs in Computer Games},
author = {Dayi Lin and Cor-Paul Bezemer and Ahmed E. Hassan},
year = {2019},
date = {2019-05-21},
urldate = {2019-05-21},
journal = {Empirical Software Engineering Journal (EMSE)},
abstract = {With the rapid growing market and competition in the gaming industry, it is challenging to develop a successful game, making the quality of games very important. To improve the quality of games, developers commonly use gamer-submitted bug reports to locate bugs in games. Recently, gameplay videos have become popular in the gaming community. A few of these videos showcase a bug, offering developers a new opportunity to collect context-rich bug information.
In this paper, we investigate whether videos that showcase a bug can automatically be identified from the metadata of gameplay videos that are readily available online. Such bug videos could then be used as a supplemental source of bug information for game developers. We studied the number of gameplay videos on the Steam platform, one of the most popular digital game distribution platforms, and the difficulty of identifying bug videos from these gameplay videos. We show that naïve approaches such as using keywords to search for bug videos are time-consuming and imprecise. We propose an approach which uses a random forest classifier to rank gameplay videos based on their likelihood of being a bug video. Our proposed approach achieves a precision that is 43% higher than that of the naïve keyword searching approach on a manually labelled dataset of 96 videos. In addition, by evaluating 1,400 videos that are identified by our approach as bug videos, we calculated that our approach has both a mean average precision at 10 and a mean average precision at 100 of 0.91. Our study demonstrates that it is feasible to automatically identify gameplay videos that showcase a bug.},
keywords = {Bug report, Computer games, Gameplay videos, Steam},
pubstate = {published},
tppubtype = {article}
}
In this paper, we investigate whether videos that showcase a bug can automatically be identified from the metadata of gameplay videos that are readily available online. Such bug videos could then be used as a supplemental source of bug information for game developers. We studied the number of gameplay videos on the Steam platform, one of the most popular digital game distribution platforms, and the difficulty of identifying bug videos from these gameplay videos. We show that naïve approaches such as using keywords to search for bug videos are time-consuming and imprecise. We propose an approach which uses a random forest classifier to rank gameplay videos based on their likelihood of being a bug video. Our proposed approach achieves a precision that is 43% higher than that of the naïve keyword searching approach on a manually labelled dataset of 96 videos. In addition, by evaluating 1,400 videos that are identified by our approach as bug videos, we calculated that our approach has both a mean average precision at 10 and a mean average precision at 100 of 0.91. Our study demonstrates that it is feasible to automatically identify gameplay videos that showcase a bug.
Dayi Lin; Cor-Paul Bezemer; Ying Zou; Ahmed E. Hassan
An Empirical Study of Game Reviews on the Steam Platform Journal Article
Empirical Software Engineering Journal (EMSE), 2018.
Abstract | BibTeX | Tags: Computer games, Game reviews, Steam
@article{Lin2018reviews,
title = {An Empirical Study of Game Reviews on the Steam Platform},
author = {Dayi Lin and Cor-Paul Bezemer and Ying Zou and Ahmed E. Hassan},
year = {2018},
date = {2018-06-15},
urldate = {2018-06-15},
journal = {Empirical Software Engineering Journal (EMSE)},
abstract = {The steadily increasing popularity of computer games has led to the rise of a multi-billion dollar industry. Due to the scale of the computer game industry, developing a successful game is challenging. In addition, prior studies show that gamers are extremely hard to please, making the quality of games an important issue. Most online game stores allow users to review a game that they bought. Such reviews can make or break a game, as other potential buyers often base their purchasing decisions on the reviews of a game. Hence, studying game reviews can help game developers better understand user concerns, and further improve the user-perceived quality of games.
In this paper, we perform an empirical study of the reviews of 6,224 games on the Steam platform, one of the most popular digital game delivery platforms, to better understand if game reviews share similar characteristics with mobile app reviews, and thereby understand whether the conclusions and tools from mobile app review studies can be leveraged by game developers. In addition, new insights from game reviews could possibly open up new research directions for research of mobile app reviews. We first conduct a preliminary study to understand the number of game reviews and the complexity to read through them. In addition, we study the relation between several game-specific characteristics and the fluctuations of the number of reviews that are received on a daily basis. We then focus on the useful information that can be acquired from reviews by studying the major concerns that users express in their reviews, and the amount of play time before players post a review. We find that game reviews are different from mobile app reviews along several aspects. Additionally, the number of playing hours before posting a review is a unique and helpful attribute for developers that is not found in mobile app reviews. Future longitudinal studies should be conducted to help developers and researchers leverage this information. Although negative reviews contain more valuable information about the negative aspects of the game, such as mentioned complaints and bug reports, developers and researchers should also not ignore the potentially useful information in positive reviews. Our study on game reviews serves as a starting point for other game review researchers, and suggests that prior studies on mobile app reviews may need to be revisited.},
keywords = {Computer games, Game reviews, Steam},
pubstate = {published},
tppubtype = {article}
}
In this paper, we perform an empirical study of the reviews of 6,224 games on the Steam platform, one of the most popular digital game delivery platforms, to better understand if game reviews share similar characteristics with mobile app reviews, and thereby understand whether the conclusions and tools from mobile app review studies can be leveraged by game developers. In addition, new insights from game reviews could possibly open up new research directions for research of mobile app reviews. We first conduct a preliminary study to understand the number of game reviews and the complexity to read through them. In addition, we study the relation between several game-specific characteristics and the fluctuations of the number of reviews that are received on a daily basis. We then focus on the useful information that can be acquired from reviews by studying the major concerns that users express in their reviews, and the amount of play time before players post a review. We find that game reviews are different from mobile app reviews along several aspects. Additionally, the number of playing hours before posting a review is a unique and helpful attribute for developers that is not found in mobile app reviews. Future longitudinal studies should be conducted to help developers and researchers leverage this information. Although negative reviews contain more valuable information about the negative aspects of the game, such as mentioned complaints and bug reports, developers and researchers should also not ignore the potentially useful information in positive reviews. Our study on game reviews serves as a starting point for other game review researchers, and suggests that prior studies on mobile app reviews may need to be revisited.
Dayi Lin; Cor-Paul Bezemer; Ahmed E. Hassan
An Empirical Study of Early Access Games on the Steam Platform Journal Article
The Empirical Software Engineering Journal (EMSE), 23 (2), pp. 771–799, 2018.
Abstract | BibTeX | Tags: Computer games, Early access games, Steam
@article{Lin16eag,
title = {An Empirical Study of Early Access Games on the Steam Platform},
author = {Dayi Lin and Cor-Paul Bezemer and Ahmed E. Hassan},
year = {2018},
date = {2018-04-01},
urldate = {2018-04-01},
journal = {The Empirical Software Engineering Journal (EMSE)},
volume = {23},
number = {2},
pages = {771--799},
publisher = {Springer},
abstract = {“Early access†is a release strategy for software that allows consumers to purchase an unfinished version of the software. In turn, consumers can influence the software development process by giving developers early feedback. This early access model has become increasingly popular through digital distribution platforms, such as Steam which is the most popular distribution platform for games. The plethora of options offered by Steam to communicate between developers and game players contribute to the popularity of the early access model. The model is considered a success by the game development community as several games using this approach have gained a large user base (i.e., owners) and high sales. On the other hand, the benefits of the early access model have been questioned as well.
In this paper, we conduct an empirical study on 1,182 Early Access Games (EAGs) on the Steam platform to understand the characteristics, advantages and limitations of the early access model. We find that 15% of the games on Steam make use of the early access model, with the most popular EAG having as many as 29 million owners. 88% of the EAGs are classified by their developers as so-called “indie†games, indicating that most EAGs are developed by individual developers or small studios.
We study the interaction between players and developers of EAGs and the Steam platform. We observe that on the one hand, developers update their games more frequently in the early access stage. On the other hand, the percentage of players that review a game during its early access stage is lower than the percentage of players that review the game after it leaves the early access stage. However, the average rating of the reviews is much higher during the early access stage, suggesting that players are more tolerant of imperfections in the early access stage. The positive review rate does not correlate with the length or the game update frequency of the early access stage.
Based on our findings, we suggest game developers to use the early access model as a method for eliciting early feedback and more positive reviews to attract additional new players. In addition, our findings suggest that developers can determine their release schedule without worrying about the length of the early access stage and the game update frequency during the early access stage.},
keywords = {Computer games, Early access games, Steam},
pubstate = {published},
tppubtype = {article}
}
In this paper, we conduct an empirical study on 1,182 Early Access Games (EAGs) on the Steam platform to understand the characteristics, advantages and limitations of the early access model. We find that 15% of the games on Steam make use of the early access model, with the most popular EAG having as many as 29 million owners. 88% of the EAGs are classified by their developers as so-called “indie†games, indicating that most EAGs are developed by individual developers or small studios.
We study the interaction between players and developers of EAGs and the Steam platform. We observe that on the one hand, developers update their games more frequently in the early access stage. On the other hand, the percentage of players that review a game during its early access stage is lower than the percentage of players that review the game after it leaves the early access stage. However, the average rating of the reviews is much higher during the early access stage, suggesting that players are more tolerant of imperfections in the early access stage. The positive review rate does not correlate with the length or the game update frequency of the early access stage.
Based on our findings, we suggest game developers to use the early access model as a method for eliciting early feedback and more positive reviews to attract additional new players. In addition, our findings suggest that developers can determine their release schedule without worrying about the length of the early access stage and the game update frequency during the early access stage.
Dayi Lin; Cor-Paul Bezemer; Ahmed E. Hassan
Studying the Urgent Updates of Popular Games on the Steam Platform Journal Article
The Empirical Software Engineering Journal (EMSE), 22 (4), pp. 2095–2126, 2017.
Abstract | BibTeX | Tags: Computer games, Steam, Update cycle, Update strategy, Urgent updates
@article{Lin16urgent,
title = {Studying the Urgent Updates of Popular Games on the Steam Platform},
author = {Dayi Lin and Cor-Paul Bezemer and Ahmed E. Hassan},
year = {2017},
date = {2017-08-01},
urldate = {2017-08-01},
journal = {The Empirical Software Engineering Journal (EMSE)},
volume = {22},
number = {4},
pages = {2095--2126},
publisher = {Springer},
abstract = {The steadily increasing popularity of computer games has led to the rise of a multi-billion dollar industry. This increasing popularity is partly enabled by online digital distribution platforms for games, such as Steam. These platforms offer an insight into the development and test processes of game developers. In particular, we can extract the update cycle of a game and study what makes developers deviate from that cycle by releasing so-called urgent updates.
An urgent update is a software update that fixes problems that are deemed critical enough to not be left unfixed until a regular-cycle update. Urgent updates are made in a state of emergency and outside the regular development and test timelines which causes unnecessary stress on the development team. Hence, avoiding the need for an urgent update is important for game developers. We define urgent updates as 0-day updates (updates that are released on the same day), updates that are released faster than the regular cycle, or self-admitted hotfixes.
We conduct an empirical study of the urgent updates of the 50 most popular games from Steam, the dominant digital game delivery platform. As urgent updates are reflections of mistakes in the development and test processes, a better understanding of urgent updates can in turn stimulate the improvement of these processes, and eventually save resources for game developers. In this paper, we argue that the update strategy that is chosen by a game developer affects the number of urgent updates that are released. Although the choice of update strategy does not appear to have an impact on the percentage of updates that are released faster than the regular cycle or self-admitted hotfixes, games that use a frequent update strategy tend to have a higher proportion of 0-day updates than games that use a traditional update strategy.},
keywords = {Computer games, Steam, Update cycle, Update strategy, Urgent updates},
pubstate = {published},
tppubtype = {article}
}
An urgent update is a software update that fixes problems that are deemed critical enough to not be left unfixed until a regular-cycle update. Urgent updates are made in a state of emergency and outside the regular development and test timelines which causes unnecessary stress on the development team. Hence, avoiding the need for an urgent update is important for game developers. We define urgent updates as 0-day updates (updates that are released on the same day), updates that are released faster than the regular cycle, or self-admitted hotfixes.
We conduct an empirical study of the urgent updates of the 50 most popular games from Steam, the dominant digital game delivery platform. As urgent updates are reflections of mistakes in the development and test processes, a better understanding of urgent updates can in turn stimulate the improvement of these processes, and eventually save resources for game developers. In this paper, we argue that the update strategy that is chosen by a game developer affects the number of urgent updates that are released. Although the choice of update strategy does not appear to have an impact on the percentage of updates that are released faster than the regular cycle or self-admitted hotfixes, games that use a frequent update strategy tend to have a higher proportion of 0-day updates than games that use a traditional update strategy.