“An Empirical Study of Q&A Websites for Game Developers” accepted for publication in the EMSE journal!

Arthur’s paper “An Empirical Study of Q&A Websites for Game Developers” was accepted for publication in the Empirical Software Engineering journal! Super congrats Arthur!

The game development industry is growing, and training new developers in game development-specific abilities is essential to satisfying its need for skilled game developers. These developers require effective learning resources to acquire the information they need and improve their game development skills. Question and Answer (Q&A) websites stand out as some of the most used online learning resources in software development. Many studies have investigated how Q&A websites help software developers become more experienced. However, no studies have explored Q&A websites aimed at game development, and there is little information about how game developers use and interact with these websites. In this paper, we study four Q&A communities by analyzing game development data we collected from their websites and the 347 responses received on a survey we ran with game developers. We observe that the communities have declined over the past few years and identify factors that correlate to these changes. Using a Latent Dirichlet Allocation (LDA) model, we characterize the topics discussed in the communities. We also analyze how topics differ across communities and identify the most discussed topics. Furthermore, we find that survey respondents have a mostly negative view of the communities and tended to stop using the websites once they became more experienced. Finally, we provide recommendations on where game developers should post their questions, which can help mitigate the websites’ declines and improve their effectiveness.

See our Publications for the full paper.

“An empirical study of same-day releases of popular packages in the npm ecosystem” accepted in the EMSE journal!

Filipe’s paper “An empirical study of same-day releases of popular packages in the npm ecosystem” was accepted for publication in the Empirical Software Engineering journal! Super congrats Filipe! This was a collaboration with Gustavo Oliva and Ahmed Hassan.

Within a software ecosystem, client packages can reuse provider packages as third-party libraries. The reuse relation between client and provider packages is called a dependency. When a client package depends on the code of a provider package, every change that is introduced in a release of the provider has the potential to impact the client package. Since a large number of dependencies exist within a software ecosystem, releases of a popular provider package can impact a large number of clients. Occasionally, multiple releases of a popular package need to be published on the same day, leading to a scenario in which the time available to revise, test, build, and document the release is restricted compared to releases published within a regular schedule. In this paper, our objective is to study the same-day releases that are published by popular packages in the npm ecosystem. We design an exploratory study to characterize the type of changes that are introduced in same-day releases, the prevalence of same-day releases in the npm ecosystem, and the adoption of same-day releases by client packages. A preliminary manual analysis of the existing release notes suggests that same-day releases introduce non-trivial changes (e.g., bug fixes). We then focus on three RQs. First, we study how often same-day releases are published. We found that the median proportion of regularly scheduled releases that are interrupted by a same-day release (per popular package) is 22%, suggesting the importance of having timely and systematic procedures to cope with same-day releases. Second, we study the performed code changes in same-day releases. We observe that 32% of the same-day releases have larger changes compared with their prior release, thus showing that some same-day releases can undergo significant maintenance activity despite their time-constrained nature. In our third RQ, we study how client packages react to same-day releases of their providers. We observe the vast majority of client packages that adopt the release preceding the same-day release would also adopt the latter without having to change their versioning statement (implicit updates). We also note that explicit adoptions of same-day releases (i.e., adoptions that require a change to the versioning statement of the provider in question) is significantly faster than the explicit adoption of regular releases. Based on our findings, we argue that (i) third-party tools that support the automation of dependency management (e.g., Dependabot) should consider explicitly flagging same-day releases, (ii) popular packages should strive for optimized release pipelines that can properly handle same-day releases, and (iii) future research should design scalable, ecosystem-ready tools that support provider packages in assessing the impact of their code changes on client packages.

See our Publications for the full paper.

“Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games Identifying Important Requirements of a Recommender System” accepted at FDG 2021!

Quang’s paper “Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games Identifying Important Requirements of a Recommender System” was accepted for publication at the International Conference on the Foundations of Digital Games (FDG) 2021! Super congrats Quang!

* Update July 23: This paper won a best paper award at FDG 2021!

Indie games often lack visibility as compared to top-selling games due to their limited marketing budget and the fact that there are a large number of indie games. Players of top-selling games usually like certain types of games or certain game elements such as theme, gameplay, storyline. Therefore, indie games could leverage their shared game elements with top-selling games to get discovered. In this paper, we propose an approach to improve the discoverability of indie games by recommending similar indie games to gamers of top-selling games. We first matched 2,830 itch.io indie games to 326 top-selling Steam games. We then contacted the indie game developers for evaluation feedback and suggestions. We found that the majority of them (67.9%) who offered verbose responses show positive support for our approach. We also analyzed the reasons for bad recommendations and the suggestions by indie game developers to lay out the important requirements for such a recommendation system. The most important ones are: a standardized and extensive tag and genre ontology system is needed to bridge the two platforms, the expectations of players of top-selling games should be managed to avoid disappointment, a player’s preferences should be integrated when making recommendations, a standardized age restriction rule is needed, and finally, the recommendation tool should also show indie games that are the least similar or less popular.

The paper can be downloaded here.

“What Causes Wrong Sentiment Classifications of Game Reviews?” accepted for publication in the TG journal!

Markos’ paper “What Causes Wrong Sentiment Classifications of Game Reviews?” was accepted for publication in the IEEE Transactions on Games journal! Super congrats Markos! This was a collaboration with Dayi Lin and Abram Hindle.

Sentiment analysis is a popular technique to identify the sentiment of a piece of text. Several different domains have been targeted by sentiment analysis research, such as Twitter, movie reviews, and mobile app reviews. Although several techniques have been proposed, the performance of current sentiment analysis techniques is still far from acceptable, mainly when applied in domains on which they were not trained. In addition, the causes of wrong classifications are not clear. In this paper, we study how sentiment analysis performs on game reviews. We first report the results of a large scale empirical study on the performance of widely-used sentiment classifiers on game reviews. Then, we investigate the root causes for the wrong classifications and quantify the impact of each cause on the overall performance. We study three existing classifiers: Stanford CoreNLP, NLTK, and SentiStrength. Our results show that most classifiers do not perform well on game reviews, with the best one being NLTK (with an AUC of 0.70). We also identified four main causes for wrong classifications, such as reviews that point out advantages and disadvantages of the game, which might confuse the classifier. The identified causes are not trivial to be resolved and we call upon sentiment analysis and game researchers and developers to prioritize a research agenda that investigates how the performance of sentiment analysis of game reviews can be improved, for instance by developing techniques that can automatically deal with specific game-related issues of reviews (e.g., reviews with advantages and disadvantages). Finally, we show that training sentiment classifiers on reviews that are stratified by the game genre is effective.

See our Publications for the full paper.

“PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects” accepted at MSR 2021 mining challenge!

Arthur and Luisa’s paper “PySStuBs: Characterizing Single-Statement Bugs inPopular Open-Source Python Projects” was accepted for publication at the MSR 2021 mining challenge! Super congrats Arthur and Luisa! This was a collaboration with Dr. Abram Hindle.

“Single-statement bugs (SStuBs) can have a severe impact on developer productivity. Despite usually being simple and not offering much of a challenge to fix, these bugs may still disturb a developer’s workflow and waste precious development time. However, few studies have paid attention to these simple bugs, focusing instead on bugs of any size and complexity. In this study, we explore the occurrence of SStuBs in some of the most popular open-source Python projects on GitHub, while also characterizing their patterns and distribution. We further compare these bugs to SStuBs found in a previous study on Java Maven projects. We find that these Python projects have different SStuB patterns than the ones in Java Maven projects and identify 7 new SStuB patterns. Our results may help uncover the importance of understanding these bugs for the Python programming language, and how developers can handle them more effectively.”

A preprint of the paper can be found on our publications page.

“An Empirical Study of Trends of Popular Virtual Reality Games and Their Complaints” was accepted for publication in the TG journal!

Rain’s paper “An Empirical Study of Trends of Popular Virtual Reality Games and Their Complaints” was accepted for publication in the IEEE Transactions on Games journal! Super congrats Rain!

The market for virtual reality (VR) games is growing rapidly, and is expected to grow from $3.3B in 2018 to $13.7B in 2022. Due to the immersive nature of such games and the use of VR headsets, players may have complaints about VR games which are distinct from those about traditional computer games, and an understanding of those complaints could enable developers to better take advantage of the growing VR market. We conduct an empirical study of 750 popular VR games and 17,635 user reviews on Steam in order to understand trends in VR games and their complaints. We find that the VR games market is maturing. Fewer VR games are released each month but their quality appears to be improving over time. Most games support multiple headsets and play areas, and support for smaller-scale play areas is increasing. Complaints of cybersickness are rare and declining, indicating that players are generally more concerned with other issues. Recently, complaints about game-specific issues have become the most frequent type of complaint, and VR game developers can now focus on these issues and worry less about VR-comfort issues such as cybersickness.

See our Publications for the full paper.

“Should you Upgrade Official Docker Hub Images in Production Environments?” accepted for publication at ICSE NIER’21!

Sara’s paper “Should you Upgrade Official Docker Hub Images in Production Environments?” was accepted for publication at ICSE New Ideas and Emerging Results (NIER)’21! Super congrats Sara! This was a joint work with Hamzeh Khazaei (York University).

Docker, one of the most popular software containerization technologies, allows a user to deploy Docker images to create and run containers. While Docker images facilitate the deployment and in-place upgrading of an application in a production environment by replacing its container with one based on a newer image, many dependencies could change at once during such an image upgrade, which can potentially be a source of risk. In this paper, we study the official Docker images on Docker Hub and explore how packages are changing in these images. We found that the number of package changes varies across different types of applications and that often the changing packages are utility packages. Our study takes a first important look at potential risks when doing an in-place upgrade of a Docker image.

See our Publications for the paper.

“How are issue reports discussed in Gitter chat rooms?” accepted for publication in JSS!

Hareem’s paper “How are issue reports discussed in Gitter chat rooms?” was accepted for publication in Elsevier’s Journal of System and Software! Congrats Hareem! This was a collaboration between Hareem Sahar, Abram Hindle and Cor-Paul. See Abram’s site for the original post and the pre-print of the paper.

Abstract: “Informal communication channels like mailing lists, IRC and instant messaging play a vital role in open source software development by facilitating communication within geographically diverse project teams e.g., to discuss issue reports to facilitate the bug-fixing process. More recently, chat systems like Slack and Gitter have gained a lot of popularity and developers are rapidly adopting them. Gitter is a chat system that is specifically designed to address the needs of GitHub users. Gitter hosts project-based asynchronous chats which foster frequent project discussions among participants. Developer discussions contain a wealth of information such as the rationale behind decisions made during the evolution of a project. In this study, we explore 24 open source project chat rooms that are hosted on Gitter, containing a total of 3,407,622 messages and 16,665 issue references. We manually analyze the contents of chat room discussions around 476 issue reports. The results of our study show the prevalence of issue discussions on Gitter, and that the discussed issue reports have a longer resolution time than the issue reports that are never brought on Gitter.”

“Trouncing in Dota 2: An Investigation of Blowout Matches” accepted for publication at AIIDE’20!

Markos’ paper “Trouncing in Dota 2: An Investigation of Blowout Matches” was accepted for publication at AIIDE’20! Super congrats Markos!

“With an increasing popularity, Multiplayer Online Battle Arena games where two teams compete against each other, such as Dota 2, play a major role in esports tournaments, attracting millions of spectators. Some matches (so-called blowout matches) end extremely quickly or have a very large difference in scores. Understanding which factors lead to a victory in a blowout match is useful knowledge for players who wish to improve their chances of winning and for improving the accuracy of recommendation systems for heroes. In this paper, we perform a comparative study between blowout and regular matches. We study 55,287 past professional Dota 2 matches to (1) investigate how accurately we can predict victory using only pre-match features and (2) explain the factors that are correlated with the victory. We investigate three machine learning algorithms and find that Gradient Boosting Machines (XGBoost) perform best with an Area Under the Curve (AUC) of up to 0.86. Our results show that the experience of the player with the picked hero has a different importance for blowout and regular matches. Also, hero attributes are more important for blowouts with a large score difference. Based on our results, we suggest that players (1) pick heroes with which they achieved a high performance in previous matches to increase their chances of winning and (2) focus on heroes’ attributes such as intelligence to win with a large score difference.”

See our publications for the full paper.

“Towards Reducing the Time Needed for Load Testing” was accepted for publication in the JSEP journal!

Hammam’s paper “Towards Reducing the Time Needed for Load Testing” was accepted for publication in the Journal of Software Evolution and Process! Super congrats Hammam!

The performance of large-scale systems must be thoroughly tested under various levels of workload, as load-related issues can have a disastrous impact on the system. However, load tests often require a large amount of time, running from hours to even days, to execute. Nowadays, with the increased popularity of rapid releases and continuous deployment, testing time is at a premium and should be minimized while still delivering a complete test of the system. In our prior work, we proposed to reduce the execution time of a load test by detecting repetitiveness in individual performance metric values, such as CPU utilization or memory usage, that are observed during the test. However, as we explain in this paper, disregarding combinations of performance metrics may miss important information about the load-related behaviour of a system. Therefore, in this paper we revisit our prior approach, by proposing a new approach that reduces the execution time of a load test by detecting whether a test no longer exercises new combinations of the observed performance metrics. We conduct an experimental case study on three open source systems (CloudStore, PetClinic, and Dell DVD Store 2), in which we use our new and prior approaches to reduce the execution time of a 24-hour load test. We show that our new approach is capable of reducing the execution time of the test to less than 8.5 hours, while preserving a coverage of at least 95% of the combinations that are observed between the performance metrics during the 24-hour tests. In addition, we show that our prior approach recommends a stopping time that is too early for two of the three studied systems. Finally, we discuss the challenges of applying our approach to an industrial setting, and we call upon the community to help us to address these challenges.

See our Publications for the full paper.