“Searching bug instances in gameplay video repositories” accepted in IEEE’s Transactions on Games!

Mohammad Reza’s paper “Searching bug instances in gameplay video repositories” was accepted for publication in IEEE’s Transactions on Games! Super congrats Mohammad Reza!

Abstract: “Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs. Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This paper introduces a method for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no external information, like video metadata; it solely depends on video content. Leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset, comprising 26,954 videos from 1,873 games that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate
the potential of our approach for applications such as the creation of a video search tool tailored to identifying video game bugs, which could greatly benefit Quality Assurance (QA) teams in finding and reproducing bugs. The code and data used in this paper can be found at https://zenodo.org/records/10211390

A preprint of the paper is available here.

“An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications” accepted at CAIN 2024!

Tajkia’s paper “An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications” was accepted for publication at CAIN 2024! Super congrats Tajkia!

Abstract: “Datasets and models are two key artifacts in machine learning (ML) applications. Although there exist tools to support dataset and model developers in managing ML artifacts, little is known about how these datasets and models are integrated into ML applications. In this paper, we study how datasets and models in ML applications are managed. In particular, we focus on how these artifacts are stored and versioned alongside the applications. After analyzing 93 repositories, we identified the most common storage location to store datasets and models is the file system, which causes availability issues. Notably, large data and model files, exceeding approximately 60 MB, are stored exclusively in remote storage and downloaded as needed. Most of the datasets and models lack proper integration with the version control system, posing potential traceability and reproducibility issues. Additionally, although datasets and models are likely to evolve during the application development, they are rarely updated in application repositories.”

A preprint of the paper is available here.

“Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects” accepted at MSR Mining Challenge 2024!

Balreet’s paper “Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects” was accepted for publication at the MSR Mining Challenge 2024! Super congrats Balreet and co-author Wentao! This paper was a collaboration with Dr. Sarah Nadi from New York University Abu Dhabi.

Abstract: “The rapid development of large language models such as ChatGPT have made them particularly useful to developers in generating code snippets for their projects. To understand how ChatGPT’s generated code is leveraged by developers, we conducted an empirical study of 3,044 ChatGPT-generated code snippets integrated within GitHub projects. A median of 54% of the generated lines of code is found in the project’s code and this code typically remains unchanged once added. The modifications of the 76 code snippets that changed in a subsequent commit, consisted of minor functionality changes and code reorganizations that were made within a day. Our findings offer insights that help drive the development of AI-assisted programming tools. We highlight the importance of making changes in ChatGPT code before integrating it into a project.”

A preprint of the paper is available here.

“ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification” accepted at NeurIPS Dataset track 2023!

Mohammad Reza’s paper “ImageNet-Hard: The Hardest Images Remaining from a Study of the Power of Zoom and Spatial Biases in Image Classification” was accepted for publication at the Dataset and Benchmark track of NeurIPS 2023! Super congrats Mohammad Reza and co-author Giang! This paper was a collaboration with Sarra Habchi from our industry partner Ubisoft La Forge and Giang Nguyen and Anh Nguyen from Auburn University.

Abstract: “Image classifiers are information-discarding machines, by design. Yet, how these models discard information remains mysterious. We hypothesize that one way for image classifiers to reach high accuracy is to zoom to the most discriminative
region in the image and then extract features from there to predict image labels, discarding the rest of the image. Studying six popular networks ranging from AlexNet to CLIP, we find that proper framing of the input image can lead to the correct classification of 98.91% of ImageNet images. Furthermore, we uncover positional biases in various datasets, especially a strong center bias in two popular datasets: ImageNet-A and ObjectNet. Finally, leveraging our insights into the potential of zooming, we propose a test-time augmentation (TTA) technique that improves classification accuracy by forcing models to explicitly perform zoom-in operations before making predictions. Our method is more interpretable, accurate, and faster than MEMO, a state-of-the-art (SOTA) TTA method. We introduce ImageNet-Hard, a new benchmark that challenges SOTA classifiers including large vision-language models even when optimal zooming is allowed.”

A preprint of the paper is available here.

“Prioritizing Natural Language Test Cases Based on Highly-Used Game Features” accepted at ESEC/FSE 2023!

Markos’ paper “Prioritizing Natural Language Test Cases Based on Highly-Used Game Features” was accepted for publication at the industry track of the Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2023! Super congrats Markos! This paper was a collaboration with Dale Paas from our industry partner Prodigy Education.

Abstract: “Software testing is still a manual activity in many industries, such as the gaming industry. But manually executing tests becomes impractical as the system grows and resources are restricted, mainly in a scenario with short release cycles. Test case prioritization is a commonly used technique to optimize the test execution. However, most prioritization approaches do not work for manual test cases as they require source code information or test execution history, which is often not available in a manual testing scenario. In this paper, we propose a prioritization approach for manual test cases written in natural language based on the tested application features (in particular, highly-used application features). Our approach consists of (1) identifying the tested features from natural language test cases (with zero-shot classification techniques) and (2) prioritizing
test cases based on the features that they test. We leveraged the NSGA-II genetic algorithm for the multi-objective optimization of
the test case ordering to maximize the coverage of highly-used features while minimizing the cumulative execution time. Our findings show that we can successfully identify the application features covered by test cases using an ensemble of pre-trained models with strong zero-shot capabilities (an F-score of 76.1%). Also, our prioritization approaches can find test case orderings that cover highly-used application features early in the test execution while keeping the time required to execute test cases short. QA engineers can use our approach to focus the test execution on test cases that cover features that are relevant to users.”

A preprint of the paper is available here.

“Do Code Quality and Style Issues Differ Across (Non-)Machine Learning Notebooks? Yes!” accepted at SCAM!

Saeed’s paper “Do Code Quality and Style Issues Differ Across (Non-)Machine Learning Notebooks? Yes!” was accepted for publication at the International Working Conference on Source Code Analysis and Manipulation (SCAM) 2023! Super congrats Saeed!

Abstract: “The popularity of computational notebooks is rapidly increasing because of their interactive code-output visualization and on-demand non-sequential code block execution. These notebook features have made notebooks especially popular with machine learning developers and data scientists. However, as prior work shows, notebooks generally contain low quality code. In this paper, we investigate whether the low quality code is inherent to the programming style in notebooks, or whether it is correlated with the use of machine learning techniques. We present a large-scale empirical analysis of 246,599 open-source notebooks to explore how machine learning code quality in Jupyter Notebooks differs from non-machine learning code, thereby focusing on code style issues. We explored code style issues across the Error, Convention, Warning, and Refactoring
categories. We found that machine learning notebooks are of lower quality regarding PEP-8 code standards than non-machine learning notebooks, and their code quality distributions significantly differ with a small effect size. We identified several code style issues with large differences in occurrences between machine learning and non-machine learning notebooks. For example, package and import-related issues are more prevalent in machine learning notebooks. Our study shows that code quality and code style issues differ significantly across machine learning and non-machine learning notebooks.”

A preprint of the paper is available here.

“A Taxonomy of Testable HTML5 Canvas Issues” accepted in TSE!

Finlay’s paper “” was accepted for publication in the Transactions on Software Engineering (TSE) journal! Super congrats Finlay (and co-author Markos!)! This paper was a collaboration with Natalia Romanova, Chris Buzon and Dale Paas from our industry partner Prodigy Education.

Abstract:
“The HTML5 canvas is widely used to display high quality graphics in web applications. However, the combination of
web, GUI, and visual techniques that are required to build canvas applications, together with the lack of testing and debugging
tools, makes developing such applications very challenging. To help direct future research on testing canvas applications, in this
paper we present a taxonomy of testable canvas issues. First, we extracted 2,403 canvas related issue reports from 123 open
source GitHub projects that use the HTML5 canvas. Second, we constructed our taxonomy by manually classifying a random
sample of 332 issue reports. Our manual classification identified five broad categories of testable canvas issues, such as Visual
and Performance issues. We found that Visual issues are the most frequent (35%), while Performance issues are relatively infrequent
(5%). We also found that many testable canvas issues that present themselves visually on the canvas are actually caused by
other components of the web application. Our taxonomy of testable canvas issues can be used to steer future research into
canvas issues and testing.”

See our Publications for the full paper.

“Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers” accepted for publication in EMSE!

Arthur’s paper “Analyzing Techniques for Duplicate Question Detection on Q&A Websites for Game Developers” was accepted for publication in the Empirical Software Engineering journal! Super congrats Arthur! This was a collaboration with Dr. Abram Hindle.

Abstract:
Game development is currently the largest industry in the entertainment segment and has a high demand for skilled game developers that can produce high-quality games. To satiate this demand, game developers need resources that can provide them with the knowledge they need to learn and improve their skills. Question and Answer (Q&A) websites are one of such resources that provide a valuable source of knowledge about game development practices. However, the presence of duplicate questions on Q&A websites hinders their ability to effectively provide information for their users. While several researchers created and analyzed techniques for duplicate question detection on websites such as Stack Overflow, so far no studies have explored how well those techniques work on Q&A websites for game development. With that in mind, in this paper we analyze how we can use pre-trained and unsupervised techniques to detect duplicate questions on Q&A websites focused on game development using data extracted from the Game Development Stack Exchange and Stack Overflow. We also explore how we can leverage a small set of labelled data to improve the performance of those techniques. The pre-trained technique based on MPNet achieved the highest results in identifying duplicate questions about game development, and we could achieve a better performance when combining multiple unsupervised techniques into a single supervised model. Furthermore, the supervised models could identify duplicate questions on websites different from those they were trained on with little to no decrease in performance. Our results lay the groundwork for building better duplicate question detection systems in Q&A websites for game developers and ultimately providing game developers with a more effective Q&A community.

See our Publications for the full paper, or access the preprint directly.

“Identifying Similar Test Cases That Are Specified in Natural Language” accepted in TSE!

Markos’ paper “Identifying Similar Test Cases That Are Specified in Natural Language” was accepted for publication in the Transactions on Software Engineering (TSE) journal! Super congrats Markos! This paper was a collaboration with Dale Paas and Chris Buzon from our industry partner Prodigy Education.

Abstract:
“Software testing is still a manual process in many industries, despite the recent improvements in automated testing
techniques. As a result, test cases (which consist of one or more test steps that need to be executed manually by the tester) are often
specified in natural language by different employees and many redundant test cases might exist in the test suite. This increases the
(already high) cost of test execution. Manually identifying similar test cases is a time-consuming and error-prone task. Therefore, in this
paper, we propose an unsupervised approach to identify similar test cases. Our approach uses a combination of text embedding, text
similarity and clustering techniques to identify similar test cases. We evaluate five different text embedding techniques, two text
similarity metrics, and two clustering techniques to cluster similar test steps and three techniques to identify similar test cases from the
test step clusters. Through an evaluation in an industrial setting, we showed that our approach achieves a high performance to cluster
test steps (an F-score of 87.39%) and identify similar test cases (an F-score of 83.47%). Furthermore, a validation with developers
indicates several different practical usages of our approach (such as identifying redundant test cases), which help to reduce the testing
manual effort and time.”

See our Publications for the full paper.

“CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning” accepted at MSR 2022!

Mohammad Reza’s paper “CLIP meets GamePhysics: Towards bug identification in gameplay videos using zero-shot transfer learning” was accepted for publication at the Mining Software Repositories (MSR) conference 2022! Super congrats Mohammad Reza and co-author Finlay!

Abstract:
“Gameplay videos contain rich information about how players interact with the game and how the game responds. Sharing gameplay videos on social media platforms, such as Reddit, has become a common practice for many players. Often, players will share game-play videos that showcase video game bugs. Such gameplay videos are software artifacts that can be utilized for game testing, as they provide insight for bug analysis. Although large repositories of gameplay videos exist, parsing and mining them in an effective and structured fashion has still remained a big challenge. In this paper, we propose a search method that accepts any English text query as input to retrieve relevant videos from large repositories of gameplay videos. Our approach does not rely on any external information (such as video metadata); it works solely based on the content of the video. By leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset consisting of 26,954 videos from 1,873 games, that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple queries, compound queries, and bug queries, indicating that our approach is useful for object and event detection in gameplay videos. An example application of our approach is as a gameplay video search engine to aid in reproducing video game bugs. Please visit the following link for the code and the data: https://asgaardlab.github.io/CLIPxGamePhysics/.”

See our Publications or arXiv for the full paper