Publications – Analytics of Software, GAmes And Repository Data (ASGAARD) Lab

1.

Balreet Grewal; James Graham; Jeff Muizelaar; Jan Odvarko; Suhaib Mujahid; Marco Castelluccio; Cor-Paul Bezemer

XBIDetective: Leveraging Vision Language Models for Identifying Cross-Browser Visual Inconsistencies Inproceedings

International Conference on Software Engineering - Software Engineering in Practice (ICSE - SEIP) Track, 2026, 2026.

Files:

Abstract | BibTeX | Tags: Regression testing, Testing, Web applications

2.

Md Saeed Siddik; Hao Li; Cor-Paul Bezemer

A Systematic Literature Review of Software Engineering Research on Jupyter Notebook Journal Article

Journal of Systems and Software, 2025.

Files:

Abstract | BibTeX | Tags: Jupyter Notebook, SLR

@article{siddik_slr_notebook,

title = {A Systematic Literature Review of Software Engineering Research on Jupyter Notebook},

author = {Md Saeed Siddik and Hao Li and Cor-Paul Bezemer},

year  = {2025},

date = {2025-12-27},

urldate = {2025-12-27},

journal = {Journal of Systems and Software},

abstract = {Context: Jupyter Notebook has emerged as a versatile tool that transforms

how researchers, developers, and data scientists conduct and communicate

their work. As the adoption of Jupyter notebooks continues to rise, so does

the interest from the software engineering research community in improving

the software engineering practices for Jupyter notebooks.

Objective: The purpose of this study is to analyze trends, gaps, and

methodologies used in software engineering research on Jupyter notebooks.

Method : We selected 199 relevant publications up to September 2025,

following established systematic literature review guidelines. We explored

publication trends, categorized them based on software engineering topics,

and reported findings based on those topics.

Results: The most popular venues for publishing software engineering

research on Jupyter notebooks are related to human-computer interaction

instead of traditional software engineering venues. Researchers have ad-

dressed a wide range of software engineering topics on notebooks, such as

code reuse, readability, and execution environment. Although reusability is

one of the research topics for Jupyter notebooks, only 82 of the 199 studies

can be reused based on their provided URLs. Additionally, most replication

packages are not hosted on permanent repositories for long-term availability

and adherence to open science principles.

Conclusion: Solutions specific to notebooks for software engineering is-

sues, including testing, refactoring, and documentation, are underexplored.

Future research opportunities exist in automatic testing frameworks, refac-

toring clones between notebooks, and generating group documentation for

coherent code cells.},

keywords = {Jupyter Notebook, SLR},

pubstate = {published},

tppubtype = {article}

}

Close

3.

Lizhi Liao; Simon Eismann; Heng Li; Cor-Paul Bezemer; Diego Costa; André van Hoorn; Weiyi Shang

Early Detection of Performance Regressions by Bridging Local Performance Data and Architectural Models Inproceedings

International Conference on Software Engineering (ICSE), 2025.

Files:

BibTeX | Tags: Performance, Performance analysis, Performance engineering, Performance evaluation, Performance regressions, Performance testing

4.

Mohammad Reza Taesiri; Abhijay Ghildyal; Saman Zadtootaghaj; Nabajeet Barman; Cor-Paul Bezemer

VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Inproceedings

39th Conference on Neural Information Processing Systems (NeurIPS 2025) Track on Datasets and Benchmarks, 2025.

Files:

Abstract | Links | BibTeX | Tags: Computer games, Game development, Game testing, Gameplay videos, LLM, Software quality

5.

Hao Li; Cor-Paul Bezemer; Ahmed E. Hassan

Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models Inproceedings

International Conference on Software Engineering - Software Engineering in Practice (ICSE - SEIP) Track, 2025.

Files:

Abstract | BibTeX | Tags: FM4SE, Foundation models, SE4AI, SE4FM, SE4ML

@inproceedings{Li_SEFM_blogs,

title = {Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models},

author = {Hao Li and Cor-Paul Bezemer and Ahmed E. Hassan},

year  = {2025},

date = {2025-04-27},

booktitle = {International Conference on Software Engineering - Software Engineering in Practice (ICSE - SEIP) Track},

abstract = {Foundation models (FMs) such as large language

models (LLMs) have significantly impacted many fields, including

software engineering (SE). The interaction between SE and FMs

has led to the integration of FMs into SE practices (FM4SE)

and the application of SE methodologies to FMs (SE4FM). While

several literature surveys exist on academic contributions to these

trends, we are the first to provide a practitioner’s view. We

analyze 155 FM4SE and 997 SE4FM blog posts from leading

technology companies, leveraging an FM-powered surveying

approach to systematically label and summarize the discussed

activities and tasks. We observed that while code generation is the

most prominent FM4SE task, FMs are leveraged for many other

SE activities such as code understanding, summarization, and

API recommendation. The majority of blog posts on SE4FM are

about model deployment & operation, and system architecture

& orchestration. Although the emphasis is on cloud deployments,

there is a growing interest in compressing FMs and deploying

them on smaller devices such as edge or mobile devices. We

outline eight future research directions inspired by our gained

insights, aiming to bridge the gap between academic findings

and real-world applications. Our study not only enriches the

body of knowledge on practical applications of FM4SE and

SE4FM but also demonstrates the utility of FMs as a powerful

and efficient approach in conducting literature surveys within

technical and grey literature domains. Our dataset, results, code

and used prompts can be found in our online replication package

at https://zenodo.org/records/14563992.},

keywords = {FM4SE, Foundation models, SE4AI, SE4FM, SE4ML},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

6.

Tajkia Rahman Toma; Balreet Grewal; Cor-Paul Bezemer

Answering User Questions about Machine Learning Models through Standardized Model Cards Inproceedings

International Conference on Software Engineering (ICSE), 2025.

Files:

Abstract | BibTeX | Tags: Hugging Face, Q&A communities, Q&A websites, SE4AI, SE4FM, SE4ML

@inproceedings{Toma_UserQuestions,

title = {Answering User Questions about Machine Learning Models through Standardized Model Cards},

author = {Tajkia Rahman Toma and Balreet Grewal and Cor-Paul Bezemer },

year  = {2025},

date = {2025-04-27},

booktitle = {International Conference on Software Engineering (ICSE)},

abstract = {Reusing pre-trained machine learning models is

becoming very popular due to model hubs such as Hugging Face

(HF). However, similar to when reusing software, many issues

may arise when reusing an ML model. In many cases, users

resort to asking questions on discussion forums such as the HF

community forum. In this paper, we study how we can reduce the

community’s workload in answering these questions and increase

the likelihood that questions receive a quick answer. We analyze

11,278 discussions from the HF model community that contain

user questions about ML models. We focus on the effort spent

handling questions, the high-level topics of discussions, and the

potential for standardizing responses in model cards based on

a model card template. Our findings indicate that there is not

much effort involved in responding to user questions, however,

40.1% of the questions remain open without any response. A

topic analysis shows that discussions are more centered around

technical details on model development and troubleshooting,

indicating that more input from model providers is required. We

show that 42.5% of the questions could have been answered if the

model provider followed a standard model card template for the

model card. Based on our analysis, we recommend that model

providers add more development-related details on the model’s

architecture, algorithm, data preprocessing and training code in

existing documentation (sub)sections and add new (sub)sections

to the template to address common questions about model usage

and hardware requirements.},

keywords = {Hugging Face, Q&A communities, Q&A websites, SE4AI, SE4FM, SE4ML},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

7.

Mohammad Reza Taesiri; Cor-Paul Bezemer

VIDEOGAMEBUNNY: Towards vision assistants for video games Inproceedings

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2025.

Files:

Abstract | BibTeX | Tags: Computer games, Foundation models, Game development, Game testing

8.

Hao Li; Cor-Paul Bezemer

Bridging the language gap: an empirical study of bindings for open source machine learning libraries across software package ecosystems Journal Article

Empirical Software Engineering, 30 (6), 2024.

Files:

Abstract | BibTeX | Tags: Library bindings, Machine learning, SE4AI, SE4ML

@article{li_MLbindings,

title = {Bridging the language gap: an empirical study of bindings for open source machine learning libraries across software package ecosystems},

author = {Hao Li and Cor-Paul Bezemer},

year  = {2024},

date = {2024-10-18},

urldate = {2024-10-18},

journal = {Empirical Software Engineering},

volume = {30},

number = {6},

abstract = {Open source machine learning (ML) libraries enable developers to

integrate advanced ML functionality into their own applications. However,

popular ML libraries, such as TensorFlow, are not available natively in all

programming languages and software package ecosystems. Hence, developers

who wish to use an ML library which is not available in their programming lan-

guage or ecosystem of choice, may need to resort to using a so-called binding

library (or binding). Bindings provide support across programming languages

and package ecosystems for reusing a host library. For example, the Keras

.NET binding provides support for the Keras library in the NuGet (.NET)

ecosystem even though the Keras library was written in Python. In this pa-

per, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13

software package ecosystems by using an approach called BindFind, which can

automatically identify bindings and link them to their host libraries. Further-

more, we conduct an in-depth study of 133 cross-ecosystem bindings and their

development for 40 popular open source ML libraries. Our findings reveal that

the majority of ML library bindings are maintained by the community, with

npm being the most popular ecosystem for these bindings. Our study also

indicates that most bindings cover only a limited range of the host library’s

releases, often experience considerable delays in supporting new releases, and

have widespread technical lag. Our findings highlight key factors to consider

for developers integrating bindings for ML libraries and open avenues for re-

searchers to further investigate bindings in software package ecosystems.},

keywords = {Library bindings, Machine learning, SE4AI, SE4ML},

pubstate = {published},

tppubtype = {article}

}

Close

9.

Mohammad Reza Taesiri

Leveraging Foundation Models for Video Game Quality Assurance PhD Thesis

2024.

Files:

Abstract | BibTeX | Tags: Computer games, Computer vision, Game development, Game testing, Gameplay videos, Machine learning, Software quality

@phdthesis{phd_taesiri,

title = {Leveraging Foundation Models for Video Game Quality Assurance},

author = {Mohammad Reza Taesiri },

year = {2024},

date = {2024-09-25},

abstract = {The video game industry has become a powerhouse in the global entertainment econ-

omy. Creating engaging, high-quality games demands intricate development processes

and significant resources. As projects grow in complexity and scale, developers often

grapple with demanding schedules, tight deadlines, and the risk of burnout. These

pressures highlight the need for more efficient development strategies, with quality

assurance (QA) emerging as a critical area for optimization.

Artificial Intelligence (AI) has the potential to address these challenges by en-

hancing the game QA processes in large gaming companies. Specifically, foundation

models—large pre-trained AI models—offer promising applications to improve these

processes. Exploring novel uses of these advanced AI models could reveal their poten-

tial and limitations in optimizing game development workflows, potentially alleviating

some of the industry’s pressing issues and facilitating the creation of high-quality, en-

gaging games.

In this thesis, my goal is to improve video game testing processes by leveraging

foundation models to ensure the final product reaches a desirable quality. I explore

new opportunities that foundation models bring to game testing, from searching for

instances of game bugs within video repositories to assisting human testers in catching

bugs, through three studies:

First, I investigate the utility of image-text foundation models in retrieving game-

play videos. In this study, I create a video search engine designed to help developers

efficiently search video repositories for examples of video game bugs using textual

descriptions. For example, developers can find all instances of a bug by using a tex-

tual description of the bug, such as a horse flying in the air. This study lays the

groundwork for AI-based game QA processes, with results demonstrating significant

potential.

Next, I introduce GlitchBench, a benchmarking dataset of video game glitches

and anomalies designed to assess state-of-the-art large multimodal models, such as

GPT-4V, in detecting and understanding game bugs. This extensive dataset includes

a wide range of images depicting various glitches, sourced from both online platforms

and synthetic sets created within the Unity game engine. GlitchBench includes both

common and rare glitches encountered in the video game quality assurance process.

The findings from this study highlight both the promise and limitations of existing

models, particularly in unusual and rare cases.

Lastly, I introduce VideoGameBunny, a large multimodal model specifically

trained for video game content, accompanied by a dataset of 389,565 image-instruction

pairs. My analysis demonstrates that VideoGameBunny outperforms much larger

models in video game understanding tasks while using 4.2× fewer parameters. This

result underscores the effectiveness and promise of using a high-quality dataset to

improve models’ understanding of video games, thus making them more effective in

the game QA process.

Future work should focus on enhancing the generalization and robustness of AI

models in the gaming context, particularly through better integration of vision and

language components. This integration could be achieved using either early or late fu-

sion methods. For late fusion methods, where two pre-trained models are connected,

better alignment between these components can be achieved through improved train-

ing data and strategies. Alternatively, early fusion techniques, which involve training

both components simultaneously to enhance their integration, can overcome many

issues that existing models have.},

keywords = {Computer games, Computer vision, Game development, Game testing, Gameplay videos, Machine learning, Software quality},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

The video game industry has become a powerhouse in the global entertainment econ-
omy. Creating engaging, high-quality games demands intricate development processes
and significant resources. As projects grow in complexity and scale, developers often
grapple with demanding schedules, tight deadlines, and the risk of burnout. These
pressures highlight the need for more efficient development strategies, with quality
assurance (QA) emerging as a critical area for optimization.
Artificial Intelligence (AI) has the potential to address these challenges by en-
hancing the game QA processes in large gaming companies. Specifically, foundation
models—large pre-trained AI models—offer promising applications to improve these
processes. Exploring novel uses of these advanced AI models could reveal their poten-
tial and limitations in optimizing game development workflows, potentially alleviating
some of the industry’s pressing issues and facilitating the creation of high-quality, en-
gaging games.
In this thesis, my goal is to improve video game testing processes by leveraging
foundation models to ensure the final product reaches a desirable quality. I explore
new opportunities that foundation models bring to game testing, from searching for
instances of game bugs within video repositories to assisting human testers in catching
bugs, through three studies:
First, I investigate the utility of image-text foundation models in retrieving game-
play videos. In this study, I create a video search engine designed to help developers
efficiently search video repositories for examples of video game bugs using textual
descriptions. For example, developers can find all instances of a bug by using a tex-
tual description of the bug, such as a horse flying in the air. This study lays the
groundwork for AI-based game QA processes, with results demonstrating significant
potential.
Next, I introduce GlitchBench, a benchmarking dataset of video game glitches
and anomalies designed to assess state-of-the-art large multimodal models, such as
GPT-4V, in detecting and understanding game bugs. This extensive dataset includes
a wide range of images depicting various glitches, sourced from both online platforms
and synthetic sets created within the Unity game engine. GlitchBench includes both
common and rare glitches encountered in the video game quality assurance process.
The findings from this study highlight both the promise and limitations of existing
models, particularly in unusual and rare cases.
Lastly, I introduce VideoGameBunny, a large multimodal model specifically
trained for video game content, accompanied by a dataset of 389,565 image-instruction
pairs. My analysis demonstrates that VideoGameBunny outperforms much larger
models in video game understanding tasks while using 4.2× fewer parameters. This
result underscores the effectiveness and promise of using a high-quality dataset to
improve models’ understanding of video games, thus making them more effective in
the game QA process.
Future work should focus on enhancing the generalization and robustness of AI
models in the gaming context, particularly through better integration of vision and
language components. This integration could be achieved using either early or late fu-
sion methods. For late fusion methods, where two pre-trained models are connected,
better alignment between these components can be achieved through improved train-
ing data and strategies. Alternatively, early fusion techniques, which involve training
both components simultaneously to enhance their integration, can overcome many
issues that existing models have.

Close

10.

Hao Li

Investigating the Quality of Bindings for Machine Learning Libraries in Software Package Ecosystems PhD Thesis

2024.

Files:

Abstract | BibTeX | Tags: Machine learning, Software Ecosystem, Software quality

@phdthesis{phd_haoli,

title = {Investigating the Quality of Bindings for Machine Learning Libraries in Software Package Ecosystems},

author = {Hao Li },

year  = {2024},

date = {2024-08-21},

urldate = {2024-08-21},

abstract = {Machine learning (ML) has revolutionized many domains, with developers often re-

lying on open source ML libraries to integrate ML capabilities into their projects.

However, these libraries primarily support a single programming language, limiting

their availability for projects in other languages. Bindings serve as bridges between

programming languages by providing interfaces to ML libraries. This thesis investi-

gates the quality of bindings for ML libraries in software package ecosystems, focusing

on their maintenance and software quality.

The first study presented in this thesis introduces BindFind, an automated ap-

proach to identify bindings and link them with their corresponding host libraries

across various software package ecosystems. By analyzing 2,436 bindings for 546 ML

libraries, we find that most bindings are community-maintained, with npm being the

most popular choice for publishing these bindings. The analysis reveals that these

bindings usually cover a limited range of releases from their host library and experi-

ence significant delays in supporting new releases.

In the second study, we investigate the usage and rationale behind release-level

deprecation in bindings for ML libraries within the Cargo and npm ecosystems. We

discover that bindings in Cargo have a higher percentage of deprecated releases com-

pared to general packages, while the percentages of deprecated releases and general

packages are similar in npm. The primary reasons for deprecation are package re-

moval or replacement and defects in both ecosystems. We also identify the issue of

implicitly deprecated releases in Cargo due to deprecation propagation through the

dependency network.

The third study evaluates the impact of using different bindings on the software

quality of ML systems through experiments on model training and inference using

TensorFlow and PyTorch across four programming languages. The results show that

models trained with one binding perform consistently in inference tasks when utilized

with another binding. Furthermore, non-default bindings can outperform the default

Python bindings in specific tasks without sacrificing accuracy. We also find significant

differences in inference times across bindings, highlighting the benefits of choosing ap-

propriate bindings based on specific performance requirements to maximize efficiency

in ML projects.

The work presented in this thesis provides deep insights, actionable recommenda-

tions, and effective and thoroughly evaluated approaches for assessing and improving

the quality of bindings for ML libraries in software package ecosystems.},

keywords = {Machine learning, Software Ecosystem, Software quality},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

Machine learning (ML) has revolutionized many domains, with developers often re-
lying on open source ML libraries to integrate ML capabilities into their projects.
However, these libraries primarily support a single programming language, limiting
their availability for projects in other languages. Bindings serve as bridges between
programming languages by providing interfaces to ML libraries. This thesis investi-
gates the quality of bindings for ML libraries in software package ecosystems, focusing
on their maintenance and software quality.
The first study presented in this thesis introduces BindFind, an automated ap-
proach to identify bindings and link them with their corresponding host libraries
across various software package ecosystems. By analyzing 2,436 bindings for 546 ML
libraries, we find that most bindings are community-maintained, with npm being the
most popular choice for publishing these bindings. The analysis reveals that these
bindings usually cover a limited range of releases from their host library and experi-
ence significant delays in supporting new releases.
In the second study, we investigate the usage and rationale behind release-level
deprecation in bindings for ML libraries within the Cargo and npm ecosystems. We
discover that bindings in Cargo have a higher percentage of deprecated releases com-
pared to general packages, while the percentages of deprecated releases and general
packages are similar in npm. The primary reasons for deprecation are package re-
moval or replacement and defects in both ecosystems. We also identify the issue of
implicitly deprecated releases in Cargo due to deprecation propagation through the
dependency network.
The third study evaluates the impact of using different bindings on the software
quality of ML systems through experiments on model training and inference using
TensorFlow and PyTorch across four programming languages. The results show that
models trained with one binding perform consistently in inference tasks when utilized
with another binding. Furthermore, non-default bindings can outperform the default
Python bindings in specific tasks without sacrificing accuracy. We also find significant
differences in inference times across bindings, highlighting the benefits of choosing ap-
propriate bindings based on specific performance requirements to maximize efficiency
in ML projects.
The work presented in this thesis provides deep insights, actionable recommenda-
tions, and effective and thoroughly evaluated approaches for assessing and improving
the quality of bindings for ML libraries in software package ecosystems.

Close

11.

Ian Gauk; Cor-Paul Bezemer

Detecting Discrepancies between Subtitles and Audio in Gameplay Videos with EchoTest Journal Article

IEEE Transactions on Games, 2024.

Files:

Abstract | BibTeX | Tags: Accessibility testing, Computer games, Game accessibility, Game development, Game testing

12.

Hao Li; Gopi Krishnan Rajbahadur; Cor-Paul Bezemer

Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality Journal Article

ACM Transactions on Software Engineering and Methodology, 2024.

Files:

Abstract | BibTeX | Tags: Library bindings, Machine learning, SE4AI, SE4ML, Software quality

13.

Mohammad Reza Taesiri; Tianjun Feng; Anh Nguyen; Cor-Paul Bezemer

GlitchBench: Can large multimodal models detect video game glitches? Inproceedings

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

Files:

Abstract | BibTeX | Tags: Computer games, Foundation models, Game development, Gameplay videos, LLM

14.

Hao Li; Gopi Krishnan Rajbahadur; Dayi Lin; Cor-Paul Bezemer; Zhen Ming (Jack) Jiang

Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate Overfitting Journal Article

IEEE Access, 12 , pp. 70676–70689, 2024.

Files:

Abstract | BibTeX | Tags: Machine learning, Overfitting

@article{Li_Overfitting,

title = {Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate Overfitting},

author = {Hao Li and Gopi Krishnan Rajbahadur and Dayi Lin and Cor-Paul Bezemer and Zhen Ming (Jack) Jiang},

year  = {2024},

date = {2024-05-17},

journal = {IEEE Access},

volume = {12},

pages = {70676--70689},

abstract = {In software engineering, deep learning models are increasingly deployed for critical tasks such as bug detection and code review. However, overfitting remains a challenge that affects the quality, reliability, and trustworthiness of software systems that utilize deep learning models. Overfitting can be (1) prevented (e.g., using dropout or early stopping) or (2) detected in a trained model (e.g., using correlation-based approaches). Both overfitting detection and prevention approaches that are currently used have constraints (e.g., requiring modification of the model structure, and high computing resources). In this paper, we propose a simple, yet powerful approach that can both detect and prevent overfitting based on the training history (i.e., validation losses). Our approach first trains a time series classifier on training histories of overfit models. This classifier is then used to detect if a trained model is overfit. In addition, our trained classifier can be used to prevent overfitting by identifying the optimal point to stop a model’s training. We evaluate our approach on its ability to identify and prevent overfitting in real-world samples. We compare our approach against correlation-based detection approaches and the most commonly used prevention approach (i.e., early stopping). Our approach achieves an F1 score of 0.91 which is at least 5% higher than the current best-performing non-intrusive overfitting detection approach. Furthermore, our approach can stop training to avoid overfitting at least 32% of the times earlier than early stopping and has the same or a better rate of returning the best model.},

keywords = {Machine learning, Overfitting},

pubstate = {published},

tppubtype = {article}

}

Close

15.

Balreet Grewal; Wentao Lu; Sarah Nadi; Cor-Paul Bezemer

Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects Inproceedings

International Conference on Mining Software Repositories (MSR), 2024.

Files:

Abstract | BibTeX | Tags: Code reuse, LLM, SE4AI

16.

Mikael Sabuhi; Petr Musilek; Cor-Paul Bezemer

Micro-FL: A Fault-Tolerant Scalable Microservice Based Platform for Federated Learning Journal Article

Future Internet, 16 (3), pp. 1-19, 2024.

Files:

Abstract | BibTeX | Tags: Federated learning, Machine learning, Microservices

17.

Tajkia Rahman Toma; Cor-Paul Bezemer

An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications Inproceedings

3rd IEEE/ACM International Conference on AI Engineering - Software Engineering for AI (CAIN), pp. 1–11, 2024.

Files:

Abstract | BibTeX | Tags: Data maintenance, SE4ML

18.

Mohammad Reza Taesiri; Finlay Macklon; Sarra Habchi; Cor-Paul Bezemer

Searching bug instances in gameplay video repositories Journal Article

IEEE Transactions on Games, 2024.

Files:

Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming

19.

Mohammad Reza Taesiri; Giang Nguyen; Sarra Habchi; Cor-Paul Bezemer; Anh Nguyen

ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification Inproceedings

NeurIPS Dataset and Benchmark track, 2023.

Files:

BibTeX | Tags: Benchmark, Computer vision, Dataset, Image classification, Machine learning

20.

Markos Viggiato; Dale Paas; Cor-Paul Bezemer

Prioritizing Natural Language Test Cases Based on Highly-Used Game Features Inproceedings

Proceedings of the 31st Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 1–12, 2023.

Files:

Abstract | BibTeX | Tags: Computer games, Game development, Natural language processing, Testing

@inproceedings{ViggiatoFSE2023,

title = {Prioritizing Natural Language Test Cases Based on Highly-Used Game Features},

author = {Markos Viggiato and Dale Paas and Cor-Paul Bezemer },

year  = {2023},

date = {2023-12-01},

urldate = {2023-12-01},

booktitle = {Proceedings of the 31st Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)},

pages = {1--12},

abstract = {Software testing is still a manual activity in many industries, such

as the gaming industry. But manually executing tests becomes im-

practical as the system grows and resources are restricted, mainly

in a scenario with short release cycles. Test case prioritization is a

commonly used technique to optimize the test execution. However,

most prioritization approaches do not work for manual test cases

as they require source code information or test execution history,

which is often not available in a manual testing scenario. In this

paper, we propose a prioritization approach for manual test cases

written in natural language based on the tested application features

(in particular, highly-used application features). Our approach con-

sists of (1) identifying the tested features from natural language test

cases (with zero-shot classification techniques) and (2) prioritizing

test cases based on the features that they test. We leveraged the

NSGA-II genetic algorithm for the multi-objective optimization of

the test case ordering to maximize the coverage of highly-used

features while minimizing the cumulative execution time. Our find-

ings show that we can successfully identify the application features

covered by test cases using an ensemble of pre-trained models

with strong zero-shot capabilities (an F-score of 76.1%). Also, our

prioritization approaches can find test case orderings that cover

highly-used application features early in the test execution while

keeping the time required to execute test cases short. QA engineers

can use our approach to focus the test execution on test cases that

cover features that are relevant to users.},

keywords = {Computer games, Game development, Natural language processing, Testing},

pubstate = {published},

tppubtype = {inproceedings}

}

Close