“Identifying Similar Test Cases That Are Specified in Natural Language” accepted in TSE!

Markos’ paper “Identifying Similar Test Cases That Are Specified in Natural Language” was accepted for publication in the Transactions on Software Engineering (TSE) journal! Super congrats Markos! This paper was a collaboration with Dale Paas and Chris Buzon from our industry partner Prodigy Education.

Abstract:
“Software testing is still a manual process in many industries, despite the recent improvements in automated testing
techniques. As a result, test cases (which consist of one or more test steps that need to be executed manually by the tester) are often
specified in natural language by different employees and many redundant test cases might exist in the test suite. This increases the
(already high) cost of test execution. Manually identifying similar test cases is a time-consuming and error-prone task. Therefore, in this
paper, we propose an unsupervised approach to identify similar test cases. Our approach uses a combination of text embedding, text
similarity and clustering techniques to identify similar test cases. We evaluate five different text embedding techniques, two text
similarity metrics, and two clustering techniques to cluster similar test steps and three techniques to identify similar test cases from the
test step clusters. Through an evaluation in an industrial setting, we showed that our approach achieves a high performance to cluster
test steps (an F-score of 87.39%) and identify similar test cases (an F-score of 83.47%). Furthermore, a validation with developers
indicates several different practical usages of our approach (such as identifying redundant test cases), which help to reduce the testing
manual effort and time.”

See our Publications for the full paper.

“CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning” accepted at MSR 2022!

Mohammad Reza’s paper “CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning” was accepted for publication at the Mining Software Repositories (MSR) conference 2022! Super congrats Mohammad Reza and co-author Finlay!

Abstract:
“Gameplay videos contain rich information about how players interact with the game and how the game responds. Sharing gameplay videos on social media platforms, such as Reddit, has become a common practice for many players. Often, players will share game-play videos that showcase video game bugs. Such gameplay videos are software artifacts that can be utilized for game testing, as they provide insight for bug analysis. Although large repositories of gameplay videos exist, parsing and mining them in an effective and structured fashion has still remained a big challenge. In this paper, we propose a search method that accepts any English text query as input to retrieve relevant videos from large repositories of gameplay videos. Our approach does not rely on any external information (such as video metadata); it works solely based on the content of the video. By leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset consisting of 26,954 videos from 1,873 games, that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple queries, compound queries, and bug queries, indicating that our approach is useful for object and event detection in gameplay videos. An example application of our approach is as a gameplay video search engine to aid in reproducing video game bugs. Please visit the following link for the code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/.”

See our Publications or arXiv for the full paper

“A Case Study on the Stability of Performance Tests for Serverless Applications” accepted in JSS!

Simon’s paper “A Case Study on the Stability of Performance Tests for Serverless Applications” was accepted for publication in the Journal of Systems and Software (JSS)! This paper was a collaboration with Diego Costa, Lizhi Liao, Weiyi Shang, Andre van Hoorn and Samuel Kounev through the SPEC RG DevOps Performance Working Group.

Abstract:
“Context. While in serverless computing, application resource management and operational concerns are generally delegated to the cloud provider, ensuring that serverless applications meet their performance requirements is still a responsibility of the developers. Performance testing is a commonly used performance assessment practice; however, it traditionally requires visibility of the resource environment.
Objective. In this study, we investigate whether performance tests of serverless applications are stable, that is, if their results are reproducible, and what implications the serverless paradigm has for performance tests.
Method. We conduct a case study where we collect two datasets of performance test results: (a) repetitions of performance tests for varying memory size and load intensities and (b) three repetitions of the same performance test every day for ten months.
Results. We find that performance tests of serverless applications are comparatively stable if conducted on the same day. However, we also observe short-term performance variations and frequent long-term performance changes.
Conclusion. Performance tests for serverless applications can be stable; however, the serverless model impacts the planning, execution, and analysis of performance tests.”

See our Publications for the full paper.

“How are Solidity smart contracts tested in open source projects? An exploratory study” accepted at AST 2022!

Luisa’s paper “How are Solidity smart contracts tested in open source projects? An exploratory study” was accepted for publication at AST 2022! Super congrats Luisa!

Abstract:
“Smart contracts are self-executing programs that are stored on the blockchain. Once a smart contract is compiled and deployed on the blockchain, it cannot be modified. Therefore, having a bug-free smart contract is vital. To ensure a bug-free smart contract, it must be tested thoroughly. However, little is known about how developers test smart contracts in practice. Our study explores 139 open source smart contract projects that are written in Solidity to investigate the state of smart contract testing from three dimensions: (1) the developers working on the tests, (2) the used testing frameworks and testnets and (3) the type of tests that are conducted. We found that mostly core developers of a project are responsible for testing the contracts. Second, developers typically use only functional testing frameworks to test a smart contract, with Truffle being the most popular one. Finally, our results show that functional testing is conducted in most of the studied projects (93%), security testing is only performed in a few projects (9.4%) and traditional performance testing is conducted in none. In addition, we found 25 projects that mentioned or published external audit reports.”

See our Publications for the full paper.

“Studying the Performance Risks of Upgrading Docker Hub Images: A Case Study of WordPress” accepted at ICPE 2022!

Mikael’s paper “Studying the Performance Risks of Upgrading Docker Hub Images: A Case Study of WordPress” was accepted for publication at ICPE 2022! Super congrats Mikael!

Abstract:
“The Docker Hub repository contains Docker images of applications, which allow users to do in-place upgrades to benefit from the latest released features and security patches. However, prior work showed that upgrading a Docker image not only changes the main application, but can also change many dependencies. In this paper, we present a methodology to study the performance impact of upgrading the Docker Hub image of an application, thereby focusing on changes to dependencies. We demonstrate our methodology through a case study of 90 official images of the WordPress application. Our study shows that Docker image users should be cautious and conduct a performance test before upgrading to a newer Docker image in most cases. Our methodology can assist them to better understand the performance risks of such upgrades, and helps them to decide how thorough such a performance test should be.”

See our Publications for the full paper.

“Using Natural Language Processing Techniques to Improve Manual Test Case Descriptions” accepted at ICSE-SEIP 2022!

Markos’ paper “Using Natural Language Processing Techniques to Improve Manual Test Case Descriptions” was accepted for publication at the Software Engineering in Practice (SEIP) track of ICSE 2022! Super congrats Markos!

Abstract:
“Despite the recent advancements in test automation, software testing often remains a manual, and costly, activity in many industries. Manual test cases, often described only in natural language, consist of one or more test steps, which are instructions that must be performed to achieve the testing objective. Having different employees specifying test cases might result in redundant, unclear, or incomplete test cases. Manually reviewing and validating newly-specified test cases is time-consuming and becomes impractical in a scenario with a large test suite. Therefore, in this paper, we propose an automated framework to automatically analyze test cases that are specified in natural language and provide actionable recommendations on how to improve the test cases. Our framework consists of configurable components and modules for analysis, which are capable of recommending improvements to the following: (1) the terminology of a new test case through language modeling, (2) potentially missing test steps for a new test case through frequent itemset and association rule mining, and (3) recommendation of similar test cases that already exist in the test suite through text embedding and clustering. We thoroughly evaluated the three modules on data from our industry partner. Our framework can provide actionable recommendations, which is an important challenge given the widespread occurrence of test cases that are described only in natural language in the software industry (in particular, the game industry).”

See our Publications for the full paper.

“Applications of Generative Adversarial Networks in Anomaly Detection: A Systematic Literature Review” accepted in IEEE ACCESS!

Mikael and Chloe’s paper “Applications of Generative Adversarial Networks in Anomaly Detection: A Systematic Literature Review” was accepted for publication in the IEEE ACCESS journal! Super congrats Mikael and Chloe!

Abstract:
“Anomaly detection has become an indispensable tool for modern society, applied in a wide range of applications, from detecting fraudulent transactions to malignant brain tumors. Over time, many
anomaly detection techniques have been introduced. However, in general, they all suffer from the same problem: lack of data that represents anomalous behaviour. As anomalous behaviour is usually costly (or
dangerous) for a system, it is difficult to gather enough data that represents such behaviour. This, in turn, makes it difficult to develop and evaluate anomaly detection techniques. Recently, generative adversarial
networks (GANs) have attracted much attention in anomaly detection research, due to their unique ability to generate new data. In this paper, we present a systematic review of the literature in this area, covering
128 papers. The goal of this review paper is to analyze the relation between anomaly detection techniques and types of GANs, to identify the most common application domains for GAN-assisted and GAN-based
anomaly detection, and to assemble information on datasets and performance metrics used to assess them. Our study helps researchers and practitioners to find the most suitable GAN-assisted anomaly detection
technique for their application. In addition, we present a research roadmap for future studies in this area. In summary, GANs are used in anomaly detection to address the problem of insufficient amount of data for the
anomalous behaviour, either through data augmentation or representation learning. The most commonly used GAN architectures are DCGANs, standard GANs, and cGANs. The primary application domains include
medicine, surveillance and intrusion detection.”

See our Publications for the full paper.

“An Empirical Study of Q&A Websites for Game Developers” accepted for publication in the EMSE journal!

Arthur’s paper “An Empirical Study of Q&A Websites for Game Developers” was accepted for publication in the Empirical Software Engineering journal! Super congrats Arthur!

Abstract:
The game development industry is growing, and training new developers in game development-specific abilities is essential to satisfying its need for skilled game developers. These developers require effective learning resources to acquire the information they need and improve their game development skills. Question and Answer (Q&A) websites stand out as some of the most used online learning resources in software development. Many studies have investigated how Q&A websites help software developers become more experienced. However, no studies have explored Q&A websites aimed at game development, and there is little information about how game developers use and interact with these websites. In this paper, we study four Q&A communities by analyzing game development data we collected from their websites and the 347 responses received on a survey we ran with game developers. We observe that the communities have declined over the past few years and identify factors that correlate to these changes. Using a Latent Dirichlet Allocation (LDA) model, we characterize the topics discussed in the communities. We also analyze how topics differ across communities and identify the most discussed topics. Furthermore, we find that survey respondents have a mostly negative view of the communities and tended to stop using the websites once they became more experienced. Finally, we provide recommendations on where game developers should post their questions, which can help mitigate the websites’ declines and improve their effectiveness.

See our Publications for the full paper.

“An empirical study of same-day releases of popular packages in the npm ecosystem” accepted in the EMSE journal!

Filipe’s paper “An empirical study of same-day releases of popular packages in the npm ecosystem” was accepted for publication in the Empirical Software Engineering journal! Super congrats Filipe! This was a collaboration with Gustavo Oliva and Ahmed Hassan.

Abstract:
Within a software ecosystem, client packages can reuse provider packages as third-party libraries. The reuse relation between client and provider packages is called a dependency. When a client package depends on the code of a provider package, every change that is introduced in a release of the provider has the potential to impact the client package. Since a large number of dependencies exist within a software ecosystem, releases of a popular provider package can impact a large number of clients. Occasionally, multiple releases of a popular package need to be published on the same day, leading to a scenario in which the time available to revise, test, build, and document the release is restricted compared to releases published within a regular schedule. In this paper, our objective is to study the same-day releases that are published by popular packages in the npm ecosystem. We design an exploratory study to characterize the type of changes that are introduced in same-day releases, the prevalence of same-day releases in the npm ecosystem, and the adoption of same-day releases by client packages. A preliminary manual analysis of the existing release notes suggests that same-day releases introduce non-trivial changes (e.g., bug fixes). We then focus on three RQs. First, we study how often same-day releases are published. We found that the median proportion of regularly scheduled releases that are interrupted by a same-day release (per popular package) is 22%, suggesting the importance of having timely and systematic procedures to cope with same-day releases. Second, we study the performed code changes in same-day releases. We observe that 32% of the same-day releases have larger changes compared with their prior release, thus showing that some same-day releases can undergo significant maintenance activity despite their time-constrained nature. In our third RQ, we study how client packages react to same-day releases of their providers. We observe the vast majority of client packages that adopt the release preceding the same-day release would also adopt the latter without having to change their versioning statement (implicit updates). We also note that explicit adoptions of same-day releases (i.e., adoptions that require a change to the versioning statement of the provider in question) is significantly faster than the explicit adoption of regular releases. Based on our findings, we argue that (i) third-party tools that support the automation of dependency management (e.g., Dependabot) should consider explicitly flagging same-day releases, (ii) popular packages should strive for optimized release pipelines that can properly handle same-day releases, and (iii) future research should design scalable, ecosystem-ready tools that support provider packages in assessing the impact of their code changes on client packages.

See our Publications for the full paper.

“Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games Identifying Important Requirements of a Recommender System” accepted at FDG 2021!

Quang’s paper “Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games Identifying Important Requirements of a Recommender System” was accepted for publication at the International Conference on the Foundations of Digital Games (FDG) 2021! Super congrats Quang!

* Update July 23: This paper won a best paper award at FDG 2021!

Abstract:
Indie games often lack visibility as compared to top-selling games due to their limited marketing budget and the fact that there are a large number of indie games. Players of top-selling games usually like certain types of games or certain game elements such as theme, gameplay, storyline. Therefore, indie games could leverage their shared game elements with top-selling games to get discovered. In this paper, we propose an approach to improve the discoverability of indie games by recommending similar indie games to gamers of top-selling games. We first matched 2,830 itch.io indie games to 326 top-selling Steam games. We then contacted the indie game developers for evaluation feedback and suggestions. We found that the majority of them (67.9%) who offered verbose responses show positive support for our approach. We also analyzed the reasons for bad recommendations and the suggestions by indie game developers to lay out the important requirements for such a recommendation system. The most important ones are: a standardized and extensive tag and genre ontology system is needed to bridge the two platforms, the expectations of players of top-selling games should be managed to avoid disappointment, a player’s preferences should be integrated when making recommendations, a standardized age restriction rule is needed, and finally, the recommendation tool should also show indie games that are the least similar or less popular.

The paper can be downloaded here.