“Searching bug instances in gameplay video repositories” accepted in IEEE’s Transactions on Games!

Mohammad Reza’s paper “Searching bug instances in gameplay video repositories” was accepted for publication in IEEE’s Transactions on Games! Super congrats Mohammad Reza!

Abstract: “Gameplay videos offer valuable insights into player interactions and game responses, particularly data about game bugs. Despite the abundance of gameplay videos online, extracting useful information remains a challenge. This paper introduces a method for searching and extracting relevant videos from extensive video repositories using English text queries. Our approach requires no external information, like video metadata; it solely depends on video content. Leveraging the zero-shot transfer capabilities of the Contrastive Language-Image Pre-Training (CLIP) model, our approach does not require any data labeling or training. To evaluate our approach, we present the GamePhysics dataset, comprising 26,954 videos from 1,873 games that were collected from the GamePhysics section on the Reddit website. Our approach shows promising results in our extensive analysis of simple and compound queries, indicating that our method is useful for detecting objects and events in gameplay videos. Moreover, we assess the effectiveness of our method by analyzing a carefully annotated dataset of 220 gameplay videos. The results of our study demonstrate
the potential of our approach for applications such as the creation of a video search tool tailored to identifying video game bugs, which could greatly benefit Quality Assurance (QA) teams in finding and reproducing bugs. The code and data used in this paper can be found at https://zenodo.org/records/10211390

A preprint of the paper is available here.

“An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications” accepted at CAIN 2024!

Tajkia’s paper “An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications” was accepted for publication at CAIN 2024! Super congrats Tajkia!

Abstract: “Datasets and models are two key artifacts in machine learning (ML) applications. Although there exist tools to support dataset and model developers in managing ML artifacts, little is known about how these datasets and models are integrated into ML applications. In this paper, we study how datasets and models in ML applications are managed. In particular, we focus on how these artifacts are stored and versioned alongside the applications. After analyzing 93 repositories, we identified the most common storage location to store datasets and models is the file system, which causes availability issues. Notably, large data and model files, exceeding approximately 60 MB, are stored exclusively in remote storage and downloaded as needed. Most of the datasets and models lack proper integration with the version control system, posing potential traceability and reproducibility issues. Additionally, although datasets and models are likely to evolve during the application development, they are rarely updated in application repositories.”

A preprint of the paper is available here.

“Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects” accepted at MSR Mining Challenge 2024!

Balreet’s paper “Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects” was accepted for publication at the MSR Mining Challenge 2024! Super congrats Balreet and co-author Wentao! This paper was a collaboration with Dr. Sarah Nadi from New York University Abu Dhabi.

Abstract: “The rapid development of large language models such as ChatGPT have made them particularly useful to developers in generating code snippets for their projects. To understand how ChatGPT’s generated code is leveraged by developers, we conducted an empirical study of 3,044 ChatGPT-generated code snippets integrated within GitHub projects. A median of 54% of the generated lines of code is found in the project’s code and this code typically remains unchanged once added. The modifications of the 76 code snippets that changed in a subsequent commit, consisted of minor functionality changes and code reorganizations that were made within a day. Our findings offer insights that help drive the development of AI-assisted programming tools. We highlight the importance of making changes in ChatGPT code before integrating it into a project.”

A preprint of the paper is available here.