Publications – Analytics of Software, GAmes And Repository Data (ASGAARD) Lab

1.

Mohammad Reza Taesiri; Tianjun Feng; Anh Nguyen; Cor-Paul Bezemer

GlitchBench: Can large multimodal models detect video game glitches? Inproceedings

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.

Files:

Abstract | BibTeX | Tags: Computer games, Foundation models, Game development, Gameplay videos, LLM

2.

Balreet Grewal; Wentao Lu; Sarah Nadi; Cor-Paul Bezemer

Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects Inproceedings

International Conference on Mining Software Repositories (MSR), 2024.

Files:

Abstract | BibTeX | Tags: Code reuse, LLM, SE4AI

3.

Mikael Sabuhi; Petr Musilek; Cor-Paul Bezemer

Micro-FL: A Fault-Tolerant Scalable Microservice Based Platform for Federated Learning Journal Article

Future Internet, 16 (3), pp. 1-19, 2024.

Files:

Abstract | BibTeX | Tags: Federated learning, Machine learning, Microservices

4.

Tajkia Rahman Toma; Cor-Paul Bezemer

An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications Inproceedings

3rd IEEE/ACM International Conference on AI Engineering - Software Engineering for AI (CAIN), pp. 1–11, 2024.

Files:

Abstract | BibTeX | Tags: Data maintenance, SE4ML

5.

Mohammad Reza Taesiri; Finlay Macklon; Sarra Habchi; Cor-Paul Bezemer

Searching bug instances in gameplay video repositories Journal Article

IEEE Transactions on Games, 2024.

Files:

Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming

6.

Mohammad Reza Taesiri; Giang Nguyen; Sarra Habchi; Cor-Paul Bezemer; Anh Nguyen

ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification Inproceedings

NeurIPS Dataset and Benchmark track, 2023.

Files:

BibTeX | Tags: Benchmark, Computer vision, Dataset, Image classification, Machine learning

7.

Markos Viggiato; Dale Paas; Cor-Paul Bezemer

Prioritizing Natural Language Test Cases Based on Highly-Used Game Features Inproceedings

Proceedings of the 31st Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 1–12, 2023.

Files:

Abstract | BibTeX | Tags: Computer games, Game development, Natural language processing, Testing

@inproceedings{ViggiatoFSE2023,

title = {Prioritizing Natural Language Test Cases Based on Highly-Used Game Features},

author = {Markos Viggiato and Dale Paas and Cor-Paul Bezemer },

year  = {2023},

date = {2023-12-01},

urldate = {2023-12-01},

booktitle = {Proceedings of the 31st Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)},

pages = {1--12},

abstract = {Software testing is still a manual activity in many industries, such

as the gaming industry. But manually executing tests becomes im-

practical as the system grows and resources are restricted, mainly

in a scenario with short release cycles. Test case prioritization is a

commonly used technique to optimize the test execution. However,

most prioritization approaches do not work for manual test cases

as they require source code information or test execution history,

which is often not available in a manual testing scenario. In this

paper, we propose a prioritization approach for manual test cases

written in natural language based on the tested application features

(in particular, highly-used application features). Our approach con-

sists of (1) identifying the tested features from natural language test

cases (with zero-shot classification techniques) and (2) prioritizing

test cases based on the features that they test. We leveraged the

NSGA-II genetic algorithm for the multi-objective optimization of

the test case ordering to maximize the coverage of highly-used

features while minimizing the cumulative execution time. Our find-

ings show that we can successfully identify the application features

covered by test cases using an ensemble of pre-trained models

with strong zero-shot capabilities (an F-score of 76.1%). Also, our

prioritization approaches can find test case orderings that cover

highly-used application features early in the test execution while

keeping the time required to execute test cases short. QA engineers

can use our approach to focus the test execution on test cases that

cover features that are relevant to users.},

keywords = {Computer games, Game development, Natural language processing, Testing},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

8.

Md Saeed Siddik; Cor-Paul Bezemer

Do Code Quality and Style Issues Differ Across (Non-)Machine Learning Notebooks? Yes! Inproceedings

23nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 1–12, IEEE, 2023.

Files:

Abstract | BibTeX | Tags: Computational notebooks, Empirical software engineering, Mining software repositories

@inproceedings{SiddikSCAM2023,

title = {Do Code Quality and Style Issues Differ Across (Non-)Machine Learning Notebooks? Yes!},

author = {Md Saeed Siddik and Cor-Paul Bezemer},

year  = {2023},

date = {2023-10-03},

urldate = {2023-10-03},

booktitle = {23nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM)},

pages = {1--12},

publisher = {IEEE},

abstract = {The popularity of computational notebooks is

rapidly increasing because of their interactive code-output vi-

sualization and on-demand non-sequential code block execution.

These notebook features have made notebooks especially popular

with machine learning developers and data scientists. However,

as prior work shows, notebooks generally contain low quality

code. In this paper, we investigate whether the low quality code

is inherent to the programming style in notebooks, or whether

it is correlated with the use of machine learning techniques.

We present a large-scale empirical analysis of 246,599 open-

source notebooks to explore how machine learning code quality

in Jupyter Notebooks differs from non-machine learning code,

thereby focusing on code style issues. We explored code style

issues across the Error, Convention, Warning, and Refactoring

categories. We found that machine learning notebooks are of

lower quality regarding PEP-8 code standards than non-machine

learning notebooks, and their code quality distributions signifi-

cantly differ with a small effect size. We identified several code

style issues with large differences in occurrences between machine

learning and non-machine learning notebooks. For example,

package and import-related issues are more prevalent in machine

learning notebooks. Our study shows that code quality and code

style issues differ significantly across machine learning and non-

machine learning notebooks.},

keywords = {Computational notebooks, Empirical software engineering, Mining software repositories},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

9.

Mikael Sabuhi

Strategies For Building Performant Containerized Applications PhD Thesis

2023.

Files:

Abstract | BibTeX | Tags: Docker, Docker Hub, Microservices, Performance, Performance analysis, Performance engineering

@phdthesis{phd_mikael,

title = {Strategies For Building Performant Containerized Applications},

author = {Mikael Sabuhi},

year  = {2023},

date = {2023-09-25},

urldate = {2023-09-25},

abstract = {The evolution of cloud computing in the last decade has offered unprecedented access to sizable, configurable computing resources with minimal management effort. Containerization of applications, particularly through Docker, has been pivotal in this progression. As modern software increasingly relies on various cloud services, designing performant cloud applications has emerged as a critical concern. Key attributes of such applications include reliability, scalability, efficiency, fault tolerance, and responsiveness. This thesis seeks to address the challenges intrinsic to creating performant cloud applications by developing strategies aimed at achieving these characteristics through: 1) the application of autoscaling techniques to enhance scalability, efficiency, and responsiveness; 2) the introduction of a methodology for assessing the impact of Docker image upgrades on containerized applications to prevent performance degradation; and 3) the utilization of microservices architecture to develop scalable, reliable, and fault-tolerant cloud applications. In our initial research, we propose a pioneering approach to optimize the performance and resource usage of containerized cloud applications using adaptive controllers grounded in control theory. Our methodology harnesses the capacity of neural networks to capture the intrinsic non-linearity of these applications, and adapts the parameters of a proportional-integral-derivative (PID) controller to accommodate environmental changes. The outcomes demonstrate significant enhancements in resource utilization and a reduction in service level agreement violations, surpassing the performance of other examined autoscaling techniques. In the subsequent study, we present a method to evaluate the performance implications of Docker image upgrades on cloud software systems and their correlation with application dependencies. Our case study of 90 official WordPress images underscores the need for comprehensive performance testing before upgrades, the importance of maintaining a performance repository for reporting test results, and the potential benefits of extending semantic versioning to encompass performance modifications. This investigation encourages an enlightened approach to Docker image management, promoting enhanced cloud application performance. Lastly, we introduce Micro-FL, a fault-tolerant federated learning framework crafted to enhance the reliability and scalability of cloud-based machine learning platforms. By incorporating a microservices-based architecture within Docker containers, Micro-FL overcomes challenges typically associated with federated learning, such as resource constraints, scalability, and system faults. Performance assessments demonstrate Micro-FL’s capability to efficiently manage faults and streamline federated learning processes, offering a more robust and scalable solution for federated learning. The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches for building performant cloud applications.

},

keywords = {Docker, Docker Hub, Microservices, Performance, Performance analysis, Performance engineering},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

The evolution of cloud computing in the last decade has offered unprecedented access to sizable, configurable computing resources with minimal management effort. Containerization of applications, particularly through Docker, has been pivotal in this progression. As modern software increasingly relies on various cloud services, designing performant cloud applications has emerged as a critical concern. Key attributes of such applications include reliability, scalability, efficiency, fault tolerance, and responsiveness. This thesis seeks to address the challenges intrinsic to creating performant cloud applications by developing strategies aimed at achieving these characteristics through: 1) the application of autoscaling techniques to enhance scalability, efficiency, and responsiveness; 2) the introduction of a methodology for assessing the impact of Docker image upgrades on containerized applications to prevent performance degradation; and 3) the utilization of microservices architecture to develop scalable, reliable, and fault-tolerant cloud applications. In our initial research, we propose a pioneering approach to optimize the performance and resource usage of containerized cloud applications using adaptive controllers grounded in control theory. Our methodology harnesses the capacity of neural networks to capture the intrinsic non-linearity of these applications, and adapts the parameters of a proportional-integral-derivative (PID) controller to accommodate environmental changes. The outcomes demonstrate significant enhancements in resource utilization and a reduction in service level agreement violations, surpassing the performance of other examined autoscaling techniques. In the subsequent study, we present a method to evaluate the performance implications of Docker image upgrades on cloud software systems and their correlation with application dependencies. Our case study of 90 official WordPress images underscores the need for comprehensive performance testing before upgrades, the importance of maintaining a performance repository for reporting test results, and the potential benefits of extending semantic versioning to encompass performance modifications. This investigation encourages an enlightened approach to Docker image management, promoting enhanced cloud application performance. Lastly, we introduce Micro-FL, a fault-tolerant federated learning framework crafted to enhance the reliability and scalability of cloud-based machine learning platforms. By incorporating a microservices-based architecture within Docker containers, Micro-FL overcomes challenges typically associated with federated learning, such as resource constraints, scalability, and system faults. Performance assessments demonstrate Micro-FL’s capability to efficiently manage faults and streamline federated learning processes, offering a more robust and scalable solution for federated learning. The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches for building performant cloud applications.

Close

10.

Finlay Macklon; Markos Viggiato; Natalia Romanova; Chris Buzon; Dale Paas; Cor-Paul Bezemer

A Taxonomy of Testable HTML5 Canvas Issues Journal Article

Transactions of Software Engineering (TSE), 49 (6), pp. 3647–3659, 2023.

Files:

Abstract | BibTeX | Tags: Testing, Web applications

11.

Markos Viggiato

Leveraging Natural Language Processing Techniques to Improve Manual Game Testing PhD Thesis

2023.

Files:

Abstract | BibTeX | Tags: Computer games, Game development, Natural language processing, Testing

@phdthesis{ViggiatoPhD,

title = {Leveraging Natural Language Processing Techniques to Improve Manual Game Testing},

author = {Markos Viggiato },

year = {2023},

date = {2023-01-17},

urldate = {2023-01-17},

abstract = {The gaming industry has experienced a sharp growth in recent years, surpassing other popular entertainment segments, such as the film industry. With the ever-increasing scale of the gaming industry and the fact that players are extremely difficult to satisfy, it has become extremely challenging to develop a successful game. In this context, the quality of games has become a critical issue. Game testing is a widely-performed activity to ensure that games meet the desired quality criteria. However, despite recent advancements in test automation, manual game testing is still prevalent in the gaming industry, with test cases often described in natural language only and consisting of one or more test steps that must be manually performed by the Quality Assurance (QA) engineer (i.e., the tester). This makes game testing challenging and costly. Issues such as redundancy (i.e., when different test cases have the same testing objective) and incompleteness (i.e., when test cases miss one or more steps) become a bigger concern in a manual game testing scenario. In addition, as games become bigger and the number of required test cases increases, it becomes impractical to execute all test cases in a scenario with short game release cycles, for example.

Prior work proposed several approaches to analyze and improve test cases with associated source code. However, there is little research on improving manual game testing. Having higher-quality test cases and optimizing test execution help to reduce wasted developer time and allow testers to use testing resources more effectively, which makes game testing more efficient and effective. In addition, even though players are extremely difficult to satisfy, their priorities are not considered during game testing. In this thesis, we investigate how to improve manual game testing from different perspectives.

In the first part of the thesis, we investigated how we can reduce redundancy in the test suite by identifying similar natural language test cases. We evaluated several unsupervised approaches using text embedding, text similarity, and cluster-

ing techniques and showed that we can successfully identify similar test cases with a high performance. We also investigated how we can improve test case descriptions to reduce the number of unclear, ambiguous, and incomplete test cases. We proposed and evaluated an automated framework that leverages statistical and neural language models and (1) provides recommendations to improve test case descriptions, (2) recommends potentially missing steps, and (3) suggests existing similar test cases.

In the second part of the thesis, we investigated how player priorities can be included in the game testing process. We first proposed an approach to prioritize test cases that cover the game features that players use the most, which helps to avoid bugs that could affect a very large number of players. Our approach (1) identifies the game features covered by test cases using an ensemble of zero-shot techniques with a high performance and (2) optimizes the test execution based on highly-used game features covered by test cases. Finally, we investigated how sentiment classifiers perform on game reviews and what issues affect those classifiers. High-performing classifiers can be used to obtain players' sentiments about games and guide testing based on the game features that players like or dislike. We show that, while traditional sentiment classifiers do not perform well, a modern classifier (the OPT-175B Large Language Model) presents a (far) better performance. The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches to support QA engineers and developers to improve manual game testing.},

keywords = {Computer games, Game development, Natural language processing, Testing},

pubstate = {published},

tppubtype = {phdthesis}

}

Close

The gaming industry has experienced a sharp growth in recent years, surpassing other popular entertainment segments, such as the film industry. With the ever-increasing scale of the gaming industry and the fact that players are extremely difficult to satisfy, it has become extremely challenging to develop a successful game. In this context, the quality of games has become a critical issue. Game testing is a widely-performed activity to ensure that games meet the desired quality criteria. However, despite recent advancements in test automation, manual game testing is still prevalent in the gaming industry, with test cases often described in natural language only and consisting of one or more test steps that must be manually performed by the Quality Assurance (QA) engineer (i.e., the tester). This makes game testing challenging and costly. Issues such as redundancy (i.e., when different test cases have the same testing objective) and incompleteness (i.e., when test cases miss one or more steps) become a bigger concern in a manual game testing scenario. In addition, as games become bigger and the number of required test cases increases, it becomes impractical to execute all test cases in a scenario with short game release cycles, for example.

Prior work proposed several approaches to analyze and improve test cases with associated source code. However, there is little research on improving manual game testing. Having higher-quality test cases and optimizing test execution help to reduce wasted developer time and allow testers to use testing resources more effectively, which makes game testing more efficient and effective. In addition, even though players are extremely difficult to satisfy, their priorities are not considered during game testing. In this thesis, we investigate how to improve manual game testing from different perspectives.

In the first part of the thesis, we investigated how we can reduce redundancy in the test suite by identifying similar natural language test cases. We evaluated several unsupervised approaches using text embedding, text similarity, and cluster-
ing techniques and showed that we can successfully identify similar test cases with a high performance. We also investigated how we can improve test case descriptions to reduce the number of unclear, ambiguous, and incomplete test cases. We proposed and evaluated an automated framework that leverages statistical and neural language models and (1) provides recommendations to improve test case descriptions, (2) recommends potentially missing steps, and (3) suggests existing similar test cases.

In the second part of the thesis, we investigated how player priorities can be included in the game testing process. We first proposed an approach to prioritize test cases that cover the game features that players use the most, which helps to avoid bugs that could affect a very large number of players. Our approach (1) identifies the game features covered by test cases using an ensemble of zero-shot techniques with a high performance and (2) optimizes the test execution based on highly-used game features covered by test cases. Finally, we investigated how sentiment classifiers perform on game reviews and what issues affect those classifiers. High-performing classifiers can be used to obtain players' sentiments about games and guide testing based on the game features that players like or dislike. We show that, while traditional sentiment classifiers do not perform well, a modern classifier (the OPT-175B Large Language Model) presents a (far) better performance. The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches to support QA engineers and developers to improve manual game testing.

Close

12.

Arthur V. Kamienski; Abram Hindle; Cor-Paul Bezemer

Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers Journal Article

Empirical Software Engineering Journal (EMSE), 28 (17), 2022.

Files:

Abstract | BibTeX | Tags:

@article{arthur2022,

title = {Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers},

author = {Arthur V. Kamienski and Abram Hindle and Cor-Paul Bezemer},

year  = {2022},

date = {2022-12-08},

journal = {Empirical Software Engineering Journal (EMSE)},

volume = {28},

number = {17},

abstract = {Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.},

keywords = {},

pubstate = {published},

tppubtype = {article}

}

Close

13.

Finlay Macklon; Mohammad Reza Taesiri; Markos Viggiato; Stefan Antoszko; Natalia Romanova; Dale Paas; Cor-Paul Bezemer

Automatically Detecting Visual Bugs in HTML5 <canvas> Games Inproceedings

37th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2022.

Files:

BibTeX | Tags: Computer games, Game development, Gaming, Regression testing, Testing, Web applications

14.

Luisa Palechor

Characterizing (un)successful open source blockchain projects and their testing practices Masters Thesis

2022.

Files:

Abstract | BibTeX | Tags: blockchain, Smart contracts, Testing

@mastersthesis{luisa2022,

title = {Characterizing (un)successful open source blockchain projects and their testing practices},

author = {Luisa Palechor},

year  = {2022},

date = {2022-09-26},

urldate = {2022-09-26},

abstract = {The most well-known blockchain applications are cryptocurrencies, e.g., Ether and Bitcoin, which both sum a market cap of more than 560 billion US dollars. Besides cryptocurrency applications, programmable blockchain allows the development of different applications, e.g., peer-to-peer selling of renewable energy from smart grids, digital rights management, and supply chain tracking and operation. These applications can be developed and deployed on the blockchain through smart contracts, which are small programs that run on the blockchain under particular conditions. As bugs in blockchain applications (in particular, cryptocurrencies) can have large financial impact, it is important to ensure that these applications are well-developed and well-tested. However, currently software development and testing practices of blockchain projects are largely unstudied. In this thesis, we study data from GitHub and CoinMarketCap to understand the characteristics of successful and unsuccessful blockchain projects and reveal the testing practices in Solidity projects with the aim of helping developers to identify projects from which they can learn, or should contribute to. In the first part of the thesis, we study data from CoinMarketCap and GitHub to gain knowledge about the characteristics of successful and unsuccessful blockchain projects. We build a random forest classifier with 320 labelled projects and metrics from 3 dimensions (activity, popularity, and complexity). We found that a large number of stars and a project’s age can help distinguish between successful and unsuccessful projects. Additionally, we found that code cloning practices tend to be common in unsuccessful projects written in Python, C++, Java and Solidity. In the second part of the thesis, we explore how quality is addressed in blockchain applications by studying how 139 open source Solidity projects are tested. We show that core development team members are the developers who usually contribute to

testing files, leaving external contributions rare. In addition, our results indicate that only functional testing is practiced among the majority of Solidity projects, with Truffle and Hardhat being the tools commonly used to test Solidity smart contracts. Moreover, security testing is a practice rarely conducted, and performance testing is not conducted at all. We finally found that audits by a third party are common in several smart contracts. Future researchers and developers can use our findings to understand what characterizes successful and unsuccessful blockchain projects and be aware of the testing practices developers conduct in open source blockchain projects.},

keywords = {blockchain, Smart contracts, Testing},

pubstate = {published},

tppubtype = {mastersthesis}

}

Close

The most well-known blockchain applications are cryptocurrencies, e.g., Ether and Bitcoin, which both sum a market cap of more than 560 billion US dollars. Besides cryptocurrency applications, programmable blockchain allows the development of different applications, e.g., peer-to-peer selling of renewable energy from smart grids, digital rights management, and supply chain tracking and operation. These applications can be developed and deployed on the blockchain through smart contracts, which are small programs that run on the blockchain under particular conditions. As bugs in blockchain applications (in particular, cryptocurrencies) can have large financial impact, it is important to ensure that these applications are well-developed and well-tested. However, currently software development and testing practices of blockchain projects are largely unstudied. In this thesis, we study data from GitHub and CoinMarketCap to understand the characteristics of successful and unsuccessful blockchain projects and reveal the testing practices in Solidity projects with the aim of helping developers to identify projects from which they can learn, or should contribute to. In the first part of the thesis, we study data from CoinMarketCap and GitHub to gain knowledge about the characteristics of successful and unsuccessful blockchain projects. We build a random forest classifier with 320 labelled projects and metrics from 3 dimensions (activity, popularity, and complexity). We found that a large number of stars and a project’s age can help distinguish between successful and unsuccessful projects. Additionally, we found that code cloning practices tend to be common in unsuccessful projects written in Python, C++, Java and Solidity. In the second part of the thesis, we explore how quality is addressed in blockchain applications by studying how 139 open source Solidity projects are tested. We show that core development team members are the developers who usually contribute to
testing files, leaving external contributions rare. In addition, our results indicate that only functional testing is practiced among the majority of Solidity projects, with Truffle and Hardhat being the tools commonly used to test Solidity smart contracts. Moreover, security testing is a practice rarely conducted, and performance testing is not conducted at all. We finally found that audits by a third party are common in several smart contracts. Future researchers and developers can use our findings to understand what characterizes successful and unsuccessful blockchain projects and be aware of the testing practices developers conduct in open source blockchain projects.

Close

15.

Markos Viggiato; Dale Paas; Chris Buzon; Cor-Paul Bezemer

Using Natural Language Processing Techniques to Improve Manual Test Case Descriptions Inproceedings

International Conference on Software Engineering - Software Engineering in Practice (ICSE - SEIP) Track, 2022.

Files:

Abstract | BibTeX | Tags:

16.

Markos Viggiato; Dale Paas; Chris Buzon; Cor-Paul Bezemer

Identifying Similar Test Cases That Are Specified in Natural Language Journal Article

Transactions of Software Engineering (TSE), 2022.

Files:

Abstract | BibTeX | Tags: Game development, Testing

17.

Mikael Sabuhi; Petr Musilek; Cor-Paul Bezemer

Studying the Performance Risks of Upgrading Docker Hub Images: A Case Study of WordPress Inproceedings

ACM/SPEC International Conference on Performance Engineering (ICPE), 2022.

Files:

Abstract | BibTeX | Tags:

18.

Mohammad Reza Taesiri; Finlay Macklon; Cor-Paul Bezemer

CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning Inproceedings

International Conference on Mining Software Repositories (MSR), 2022.

Files:

Abstract | BibTeX | Tags: Bug report, Computer games, Game development, Gameplay videos, Gaming

@inproceedings{TaesiriMSR2022,

title = {CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning},

author = {Mohammad Reza Taesiri and Finlay Macklon and Cor-Paul Bezemer},

year  = {2022},

date = {2022-03-24},

urldate = {2022-03-24},

booktitle = {International Conference on Mining Software Repositories (MSR)},

abstract = {Gameplay videos contain rich information about how players inter-

act with the game and how the game responds. Sharing gameplay

videos on social media platforms, such as Reddit, has become a

common practice for many players. Often, players will share game-

play videos that showcase video game bugs. Such gameplay videos

are software artifacts that can be utilized for game testing, as they

provide insight for bug analysis. Although large repositories of

gameplay videos exist, parsing and mining them in an effective and

structured fashion has still remained a big challenge. In this paper,

we propose a search method that accepts any English text query as

input to retrieve relevant videos from large repositories of gameplay

videos. Our approach does not rely on any external information

(such as video metadata); it works solely based on the content of the

video. By leveraging the zero-shot transfer capabilities of the Con-

trastive Language-Image Pre-Training (CLIP) model, our approach

does not require any data labeling or training. To evaluate our ap-

proach, we present the GamePhysics dataset consisting of 26,954

videos from 1,873 games, that were collected from the GamePhysics

section on the Reddit website. Our approach shows promising re-

sults in our extensive analysis of simple queries, compound queries,

and bug queries, indicating that our approach is useful for object

and event detection in gameplay videos. An example application

of our approach is as a gameplay video search engine to aid in

reproducing video game bugs. Please visit the following link for the

code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/},

keywords = {Bug report, Computer games, Game development, Gameplay videos, Gaming},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

19.

Simon Eismann; Diego Costa; Lizhi Liao; Cor-Paul Bezemer; Weiyi Shang; André van Hoorn; Samuel Kounev

A Case Study on the Stability of Performance Tests for Serverless Applications Journal Article

Journal of Systems and Software, 2022.

Files:

Abstract | BibTeX | Tags: Performance engineering, Performance regressions, Performance testing, Serverless

20.

Luisa Palechor; Cor-Paul Bezemer

How are Solidity smart contracts tested in open source projects? An exploratory study Inproceedings

3rd IEEE/ACM International Conference on Automation of Software Test (AST), 2022.

Files:

Abstract | BibTeX | Tags: Smart contracts, Testing