“Automatically Detecting Visual Bugs in HTML5 <canvas> Games” accepted at ASE 2022!

Finlay’s paper “Automatically Detecting Visual Bugs in HTML5 <canvas> Games” was accepted for publication at the International Conference on Automated Software Engineering (ASE) 2022! Super congrats Finlay and co-authors Mohammad Reza, Stefan and Markos! This paper was a collaboration with Natalia Romanova and Dale Paas from our industry partner Prodigy Education.

Abstract: “The HTML5 <canvas> is used to display high quality graphics in web applications such as web games (i.e., <canvas> games). However, automatically testing <canvas> games is not possible with existing web testing techniques and tools, and manual testing is laborious. Many widely used web testing tools rely on the Document Object Model (DOM) to drive web test automation, but the contents of the <canvas> are not represented in the DOM. The main alternative approach, snapshot testing, involves comparing oracle snapshot images with test-time snapshot images using an image similarity metric to catch visual bugs, i.e., bugs in the graphics of the web application. However, creating and maintaining oracle snapshot images for <canvas> games is onerous, defeating the purpose of test automation. In this paper, we present a novel approach to automatically detect visual bugs in <canvas> games. By leveraging an internal representation of objects on the <canvas>, we decompose snapshot images into a set of object images, each of which is compared with a respective oracle asset (e.g., a sprite)
using four similarity metrics: percentage overlap, mean squared error, structural similarity, and embedding similarity. We evaluate our approach by injecting 24 visual bugs into a custom <canvas> game, and find that our approach achieves an accuracy of 100%, compared to an accuracy of 44.6% with traditional snapshot testing.”

A preprint of the paper is available here.

“Identifying Similar Test Cases That Are Specified in Natural Language” accepted in TSE!

Markos’ paper “Identifying Similar Test Cases That Are Specified in Natural Language” was accepted for publication in the Transactions on Software Engineering (TSE) journal! Super congrats Markos! This paper was a collaboration with Dale Paas and Chris Buzon from our industry partner Prodigy Education.

Abstract:
“Software testing is still a manual process in many industries, despite the recent improvements in automated testing
techniques. As a result, test cases (which consist of one or more test steps that need to be executed manually by the tester) are often
specified in natural language by different employees and many redundant test cases might exist in the test suite. This increases the
(already high) cost of test execution. Manually identifying similar test cases is a time-consuming and error-prone task. Therefore, in this
paper, we propose an unsupervised approach to identify similar test cases. Our approach uses a combination of text embedding, text
similarity and clustering techniques to identify similar test cases. We evaluate five different text embedding techniques, two text
similarity metrics, and two clustering techniques to cluster similar test steps and three techniques to identify similar test cases from the
test step clusters. Through an evaluation in an industrial setting, we showed that our approach achieves a high performance to cluster
test steps (an F-score of 87.39%) and identify similar test cases (an F-score of 83.47%). Furthermore, a validation with developers
indicates several different practical usages of our approach (such as identifying redundant test cases), which help to reduce the testing
manual effort and time.”

See our Publications for the full paper.

“CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning” accepted at MSR 2022!

Mohammad Reza’s paper “CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning” was accepted for publication at the Mining Software Repositories (MSR) conference 2022! Super congrats Mohammad Reza and co-author Finlay!

Abstract:
“Gameplay videos contain rich information about how players interact with the game and how the game responds. Sharing gameplay videos on social media platforms, such as Reddit, has become a common practice for many players. Often, players will share game-play videos that showcase video game bugs. Such gameplay videos are software artifacts that can be utilized for game testing, as they provide insight for bug analysis. Although large repositories of gameplay videos exist, parsing and mining them in an effective and structured fashion has still remained a big challenge. In this paper, we propose a search method that accepts any English text query as input to retrieve relevant videos from large repositories of gameplay videos. Our approach does not rely on any external information (such as video metadata); it works solely based on the content of the video. By leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset consisting of 26,954 videos from 1,873 games, that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple queries, compound queries, and bug queries, indicating that our approach is useful for object and event detection in gameplay videos. An example application of our approach is as a gameplay video search engine to aid in reproducing video game bugs. Please visit the following link for the code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/.”

See our Publications or arXiv for the full paper

“A Case Study on the Stability of Performance Tests for Serverless Applications” accepted in JSS!

Simon’s paper “A Case Study on the Stability of Performance Tests for Serverless Applications” was accepted for publication in the Journal of Systems and Software (JSS)! This paper was a collaboration with Diego Costa, Lizhi Liao, Weiyi Shang, Andre van Hoorn and Samuel Kounev through the SPEC RG DevOps Performance Working Group.

Abstract:
“Context. While in serverless computing, application resource management and operational concerns are generally delegated to the cloud provider, ensuring that serverless applications meet their performance requirements is still a responsibility of the developers. Performance testing is a commonly used performance assessment practice; however, it traditionally requires visibility of the resource environment.
Objective. In this study, we investigate whether performance tests of serverless applications are stable, that is, if their results are reproducible, and what implications the serverless paradigm has for performance tests.
Method. We conduct a case study where we collect two datasets of performance test results: (a) repetitions of performance tests for varying memory size and load intensities and (b) three repetitions of the same performance test every day for ten months.
Results. We find that performance tests of serverless applications are comparatively stable if conducted on the same day. However, we also observe short-term performance variations and frequent long-term performance changes.
Conclusion. Performance tests for serverless applications can be stable; however, the serverless model impacts the planning, execution, and analysis of performance tests.”

See our Publications for the full paper.

“How are Solidity smart contracts tested in open source projects? An exploratory study” accepted at AST 2022!

Luisa’s paper “How are Solidity smart contracts tested in open source projects? An exploratory study” was accepted for publication at AST 2022! Super congrats Luisa!

Abstract:
“Smart contracts are self-executing programs that are stored on the blockchain. Once a smart contract is compiled and deployed on the blockchain, it cannot be modified. Therefore, having a bug-free smart contract is vital. To ensure a bug-free smart contract, it must be tested thoroughly. However, little is known about how developers test smart contracts in practice. Our study explores 139 open source smart contract projects that are written in Solidity to investigate the state of smart contract testing from three dimensions: (1) the developers working on the tests, (2) the used testing frameworks and testnets and (3) the type of tests that are conducted. We found that mostly core developers of a project are responsible for testing the contracts. Second, developers typically use only functional testing frameworks to test a smart contract, with Truffle being the most popular one. Finally, our results show that functional testing is conducted in most of the studied projects (93%), security testing is only performed in a few projects (9.4%) and traditional performance testing is conducted in none. In addition, we found 25 projects that mentioned or published external audit reports.”

See our Publications for the full paper.

“An Empirical Study of Yanked Releases in the Rust Package Registry” accepted in TSE!

Li Hao’s paper “An Empirical Study of Yanked Releases in the Rust Package Registry” was accepted for publication in the Transactions on Software Engineering (TSE) journal! Super congrats Li Hao! This was a collaboration with Filipe R. Cogo from the Huawei Centre of Software Excellence.

Abstract:
“Cargo, the software packaging manager of Rust, provides a yank mechanism to support release-level deprecation, which
can prevent packages from depending on yanked releases. Most prior studies focused on code-level (i.e., deprecated APIs) and
package-level deprecation (i.e., deprecated packages). However, few studies have focused on release-level deprecation. In this study,
we investigate how often and how the yank mechanism is used, the rationales behind its usage, and the adoption of yanked releases in
the Cargo ecosystem. Our study shows that 9.6% of the packages in Cargo have at least one yanked release, and the proportion of
yanked releases kept increasing from 2014 to 2020. Package owners yank releases for other reasons than withdrawing a defective
release, such as fixing a release that does not follow semantic versioning or indicating a package is removed or replaced. In addition,
we found that 46% of the packages directly adopted at least one yanked release and the yanked releases propagated through the
dependency network, which leads to 1.4% of the releases in the ecosystem having unresolved dependencies.”

See our Publications for the full paper.

“Studying the Performance Risks of Upgrading Docker Hub Images: A Case Study of WordPress” accepted at ICPE 2022!

Mikael’s paper “Studying the Performance Risks of Upgrading Docker Hub Images: A Case Study of WordPress” was accepted for publication at ICPE 2022! Super congrats Mikael!

Abstract:
“The Docker Hub repository contains Docker images of applications, which allow users to do in-place upgrades to benefit from the latest released features and security patches. However, prior work showed that upgrading a Docker image not only changes the main application, but can also change many dependencies. In this paper, we present a methodology to study the performance impact of upgrading the Docker Hub image of an application, thereby focusing on changes to dependencies. We demonstrate our methodology through a case study of 90 official images of the WordPress application. Our study shows that Docker image users should be cautious and conduct a performance test before upgrading to a newer Docker image in most cases. Our methodology can assist them to better understand the performance risks of such upgrades, and helps them to decide how thorough such a performance test should be.”

See our Publications for the full paper.

“Using Natural Language Processing Techniques to Improve Manual Test Case Descriptions” accepted at ICSE-SEIP 2022!

Markos’ paper “Using Natural Language Processing Techniques to Improve Manual Test Case Descriptions” was accepted for publication at the Software Engineering in Practice (SEIP) track of ICSE 2022! Super congrats Markos!

Abstract:
“Despite the recent advancements in test automation, software testing often remains a manual, and costly, activity in many industries. Manual test cases, often described only in natural language, consist of one or more test steps, which are instructions that must be performed to achieve the testing objective. Having different employees specifying test cases might result in redundant, unclear, or incomplete test cases. Manually reviewing and validating newly-specified test cases is time-consuming and becomes impractical in a scenario with a large test suite. Therefore, in this paper, we propose an automated framework to automatically analyze test cases that are specified in natural language and provide actionable recommendations on how to improve the test cases. Our framework consists of configurable components and modules for analysis, which are capable of recommending improvements to the following: (1) the terminology of a new test case through language modeling, (2) potentially missing test steps for a new test case through frequent itemset and association rule mining, and (3) recommendation of similar test cases that already exist in the test suite through text embedding and clustering. We thoroughly evaluated the three modules on data from our industry partner. Our framework can provide actionable recommendations, which is an important challenge given the widespread occurrence of test cases that are described only in natural language in the software industry (in particular, the game industry).”

See our Publications for the full paper.

“Applications of Generative Adversarial Networks in Anomaly Detection: A Systematic Literature Review” accepted in IEEE ACCESS!

Mikael and Chloe’s paper “Applications of Generative Adversarial Networks in Anomaly Detection: A Systematic Literature Review” was accepted for publication in the IEEE ACCESS journal! Super congrats Mikael and Chloe!

Abstract:
“Anomaly detection has become an indispensable tool for modern society, applied in a wide range of applications, from detecting fraudulent transactions to malignant brain tumors. Over time, many
anomaly detection techniques have been introduced. However, in general, they all suffer from the same problem: lack of data that represents anomalous behaviour. As anomalous behaviour is usually costly (or
dangerous) for a system, it is difficult to gather enough data that represents such behaviour. This, in turn, makes it difficult to develop and evaluate anomaly detection techniques. Recently, generative adversarial
networks (GANs) have attracted much attention in anomaly detection research, due to their unique ability to generate new data. In this paper, we present a systematic review of the literature in this area, covering
128 papers. The goal of this review paper is to analyze the relation between anomaly detection techniques and types of GANs, to identify the most common application domains for GAN-assisted and GAN-based
anomaly detection, and to assemble information on datasets and performance metrics used to assess them. Our study helps researchers and practitioners to find the most suitable GAN-assisted anomaly detection
technique for their application. In addition, we present a research roadmap for future studies in this area. In summary, GANs are used in anomaly detection to address the problem of insufficient amount of data for the
anomalous behaviour, either through data augmentation or representation learning. The most commonly used GAN architectures are DCGANs, standard GANs, and cGANs. The primary application domains include
medicine, surveillance and intrusion detection.”

See our Publications for the full paper.

“An Empirical Study of Q&A Websites for Game Developers” accepted for publication in the EMSE journal!

Arthur’s paper “An Empirical Study of Q&A Websites for Game Developers” was accepted for publication in the Empirical Software Engineering journal! Super congrats Arthur!

Abstract:
The game development industry is growing, and training new developers in game development-specific abilities is essential to satisfying its need for skilled game developers. These developers require effective learning resources to acquire the information they need and improve their game development skills. Question and Answer (Q&A) websites stand out as some of the most used online learning resources in software development. Many studies have investigated how Q&A websites help software developers become more experienced. However, no studies have explored Q&A websites aimed at game development, and there is little information about how game developers use and interact with these websites. In this paper, we study four Q&A communities by analyzing game development data we collected from their websites and the 347 responses received on a survey we ran with game developers. We observe that the communities have declined over the past few years and identify factors that correlate to these changes. Using a Latent Dirichlet Allocation (LDA) model, we characterize the topics discussed in the communities. We also analyze how topics differ across communities and identify the most discussed topics. Furthermore, we find that survey respondents have a mostly negative view of the communities and tended to stop using the websites once they became more experienced. Finally, we provide recommendations on where game developers should post their questions, which can help mitigate the websites’ declines and improve their effectiveness.

See our Publications for the full paper.