“What Causes Wrong Sentiment Classifications of Game Reviews?” accepted for publication in the TG journal!

Markos’ paper “What Causes Wrong Sentiment Classifications of Game Reviews?” was accepted for publication in the IEEE Transactions on Games journal! Super congrats Markos! This was a collaboration with Dayi Lin and Abram Hindle.

Abstract:
Sentiment analysis is a popular technique to identify the sentiment of a piece of text. Several different domains have been targeted by sentiment analysis research, such as Twitter, movie reviews, and mobile app reviews. Although several techniques have been proposed, the performance of current sentiment analysis techniques is still far from acceptable, mainly when applied in domains on which they were not trained. In addition, the causes of wrong classifications are not clear. In this paper, we study how sentiment analysis performs on game reviews. We first report the results of a large scale empirical study on the performance of widely-used sentiment classifiers on game reviews. Then, we investigate the root causes for the wrong classifications and quantify the impact of each cause on the overall performance. We study three existing classifiers: Stanford CoreNLP, NLTK, and SentiStrength. Our results show that most classifiers do not perform well on game reviews, with the best one being NLTK (with an AUC of 0.70). We also identified four main causes for wrong classifications, such as reviews that point out advantages and disadvantages of the game, which might confuse the classifier. The identified causes are not trivial to be resolved and we call upon sentiment analysis and game researchers and developers to prioritize a research agenda that investigates how the performance of sentiment analysis of game reviews can be improved, for instance by developing techniques that can automatically deal with specific game-related issues of reviews (e.g., reviews with advantages and disadvantages). Finally, we show that training sentiment classifiers on reviews that are stratified by the game genre is effective.

See our Publications for the full paper.

“PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects” accepted at MSR 2021 mining challenge!

Arthur and Luisa’s paper “PySStuBs: Characterizing Single-Statement Bugs inPopular Open-Source Python Projects” was accepted for publication at the MSR 2021 mining challenge! Super congrats Arthur and Luisa! This was a collaboration with Dr. Abram Hindle.

Abstract:
“Single-statement bugs (SStuBs) can have a severe impact on developer productivity. Despite usually being simple and not offering much of a challenge to fix, these bugs may still disturb a developer’s workflow and waste precious development time. However, few studies have paid attention to these simple bugs, focusing instead on bugs of any size and complexity. In this study, we explore the occurrence of SStuBs in some of the most popular open-source Python projects on GitHub, while also characterizing their patterns and distribution. We further compare these bugs to SStuBs found in a previous study on Java Maven projects. We find that these Python projects have different SStuB patterns than the ones in Java Maven projects and identify 7 new SStuB patterns. Our results may help uncover the importance of understanding these bugs for the Python programming language, and how developers can handle them more effectively.”

A preprint of the paper can be found on our publications page.

“An Empirical Study of Trends of Popular Virtual Reality Games and Their Complaints” was accepted for publication in the TG journal!

Rain’s paper “An Empirical Study of Trends of Popular Virtual Reality Games and Their Complaints” was accepted for publication in the IEEE Transactions on Games journal! Super congrats Rain!

Abstract:
The market for virtual reality (VR) games is growing rapidly, and is expected to grow from $3.3B in 2018 to $13.7B in 2022. Due to the immersive nature of such games and the use of VR headsets, players may have complaints about VR games which are distinct from those about traditional computer games, and an understanding of those complaints could enable developers to better take advantage of the growing VR market. We conduct an empirical study of 750 popular VR games and 17,635 user reviews on Steam in order to understand trends in VR games and their complaints. We find that the VR games market is maturing. Fewer VR games are released each month but their quality appears to be improving over time. Most games support multiple headsets and play areas, and support for smaller-scale play areas is increasing. Complaints of cybersickness are rare and declining, indicating that players are generally more concerned with other issues. Recently, complaints about game-specific issues have become the most frequent type of complaint, and VR game developers can now focus on these issues and worry less about VR-comfort issues such as cybersickness.

See our Publications for the full paper.

“Should you Upgrade Official Docker Hub Images in Production Environments?” accepted for publication at ICSE NIER’21!

Sara’s paper “Should you Upgrade Official Docker Hub Images in Production Environments?” was accepted for publication at ICSE New Ideas and Emerging Results (NIER)’21! Super congrats Sara! This was a joint work with Hamzeh Khazaei (York University).

Abstract:
Docker, one of the most popular software containerization technologies, allows a user to deploy Docker images to create and run containers. While Docker images facilitate the deployment and in-place upgrading of an application in a production environment by replacing its container with one based on a newer image, many dependencies could change at once during such an image upgrade, which can potentially be a source of risk. In this paper, we study the official Docker images on Docker Hub and explore how packages are changing in these images. We found that the number of package changes varies across different types of applications and that often the changing packages are utility packages. Our study takes a first important look at potential risks when doing an in-place upgrade of a Docker image.

See our Publications for the paper.

“How are issue reports discussed in Gitter chat rooms?” accepted for publication in JSS!

Hareem’s paper “How are issue reports discussed in Gitter chat rooms?” was accepted for publication in Elsevier’s Journal of System and Software! Congrats Hareem! This was a collaboration between Hareem Sahar, Abram Hindle and Cor-Paul. See Abram’s site for the original post and the pre-print of the paper.

Abstract: “Informal communication channels like mailing lists, IRC and instant messaging play a vital role in open source software development by facilitating communication within geographically diverse project teams e.g., to discuss issue reports to facilitate the bug-fixing process. More recently, chat systems like Slack and Gitter have gained a lot of popularity and developers are rapidly adopting them. Gitter is a chat system that is specifically designed to address the needs of GitHub users. Gitter hosts project-based asynchronous chats which foster frequent project discussions among participants. Developer discussions contain a wealth of information such as the rationale behind decisions made during the evolution of a project. In this study, we explore 24 open source project chat rooms that are hosted on Gitter, containing a total of 3,407,622 messages and 16,665 issue references. We manually analyze the contents of chat room discussions around 476 issue reports. The results of our study show the prevalence of issue discussions on Gitter, and that the discussed issue reports have a longer resolution time than the issue reports that are never brought on Gitter.”

“Trouncing in Dota 2: An Investigation of Blowout Matches” accepted for publication at AIIDE’20!

Markos’ paper “Trouncing in Dota 2: An Investigation of Blowout Matches” was accepted for publication at AIIDE’20! Super congrats Markos!

Abstract:
“With an increasing popularity, Multiplayer Online Battle Arena games where two teams compete against each other, such as Dota 2, play a major role in esports tournaments, attracting millions of spectators. Some matches (so-called blowout matches) end extremely quickly or have a very large difference in scores. Understanding which factors lead to a victory in a blowout match is useful knowledge for players who wish to improve their chances of winning and for improving the accuracy of recommendation systems for heroes. In this paper, we perform a comparative study between blowout and regular matches. We study 55,287 past professional Dota 2 matches to (1) investigate how accurately we can predict victory using only pre-match features and (2) explain the factors that are correlated with the victory. We investigate three machine learning algorithms and find that Gradient Boosting Machines (XGBoost) perform best with an Area Under the Curve (AUC) of up to 0.86. Our results show that the experience of the player with the picked hero has a different importance for blowout and regular matches. Also, hero attributes are more important for blowouts with a large score difference. Based on our results, we suggest that players (1) pick heroes with which they achieved a high performance in previous matches to increase their chances of winning and (2) focus on heroes’ attributes such as intelligence to win with a large score difference.”

See our publications for the full paper.

“Towards Reducing the Time Needed for Load Testing” was accepted for publication in the JSEP journal!

Hammam’s paper “Towards Reducing the Time Needed for Load Testing” was accepted for publication in the Journal of Software Evolution and Process! Super congrats Hammam!

Abstract:
The performance of large-scale systems must be thoroughly tested under various levels of workload, as load-related issues can have a disastrous impact on the system. However, load tests often require a large amount of time, running from hours to even days, to execute. Nowadays, with the increased popularity of rapid releases and continuous deployment, testing time is at a premium and should be minimized while still delivering a complete test of the system. In our prior work, we proposed to reduce the execution time of a load test by detecting repetitiveness in individual performance metric values, such as CPU utilization or memory usage, that are observed during the test. However, as we explain in this paper, disregarding combinations of performance metrics may miss important information about the load-related behaviour of a system. Therefore, in this paper we revisit our prior approach, by proposing a new approach that reduces the execution time of a load test by detecting whether a test no longer exercises new combinations of the observed performance metrics. We conduct an experimental case study on three open source systems (CloudStore, PetClinic, and Dell DVD Store 2), in which we use our new and prior approaches to reduce the execution time of a 24-hour load test. We show that our new approach is capable of reducing the execution time of the test to less than 8.5 hours, while preserving a coverage of at least 95% of the combinations that are observed between the performance metrics during the 24-hour tests. In addition, we show that our prior approach recommends a stopping time that is too early for two of the three studied systems. Finally, we discuss the challenges of applying our approach to an industrial setting, and we call upon the community to help us to address these challenges.

See our Publications for the full paper.

“An Empirical Study of the Characteristics of Popular Minecraft Mods” was accepted for publication in the EMSE journal!

Daniel’s paper “An Empirical Study of the Characteristics of Popular Minecraft Mods” was accepted for publication in the Empirical Software Engineering journal! Super congrats Daniel!

Abstract:
It is becoming increasingly difficult for game developers to manage the cost of developing a game, while meeting the high expectations of gamers. One way to balance the increasing gamer expectation and development stress is to build an active modding community around the game. There exist several examples of games with an extremely active and successful modding community, with the Minecraft game being one of the most notable ones. This paper reports on an empirical study of 1,114 popular and 1,114 unpopular Minecraft mods from the CurseForge mod distribution platform, one of the largest distribution platforms for Minecraft mods. We analyzed the relationship between 33 features across 5 dimensions of mod characteristics and the popularity of mods (i.e., mod category, mod documentation, environmental context of the mod, remuneration for the mod, and community contribution for the mod), to understand the characteristics of popular Minecraft mods. We firstly verify that the studied dimensions have significant explanatory power in distinguishing the popularity of the studied mods. Then we evaluated the contribution of each of the 33 features across the 5 dimensions. We observed that popular mods tend to have a high quality description and promote community contribution. In addition, simplifying the mod development is positively correlated with mod popularity.

See our Publications for the full paper.

“An Empirical Study of the Characteristics of Popular Game Jams and Their High-ranking Submissions on itch.io” accepted for publication at FDG!

Quang’s paper “An Empirical Study of the Characteristics of Popular Game Jams and Their High-ranking Submissions on itch.io” was accepted for publication at the International Conference on the Foundations of Digital Games (FDG) 2020! Super congrats Quang!

Abstract:
Game jams are hackathon-like events that allow participants to develop a playable game prototype within a time limit. They foster creativity and the exchange of ideas by letting developers with different skill sets collaborate. Having a high-ranking game is a great bonus to a beginning game developer’s résumé and their pursuit of a career in the game industry. However, participants often face time constraints set by jam hosts while balancing what aspects of their games should be emphasized to have the highest chance of winning. Similarly, hosts need to understand what to emphasize when organizing online jams so that their jams are more popular, in terms of submission rate. In this paper, we study 1,290 past game jams and their 3,752 submissions on itch.io to understand better what makes popular jams and high-ranking games perceived well by the audience. We find that a quality description has a positive contribution to both a jam’s popularity and a game’s ranking. Additionally, more manpower organizing a jam or developing a game increases a jam’s popularity and a game’s high-ranking likelihood. High-ranking games tend to support Windows or macOS, and belong to the “Puzzle”, “Platformer”, “Interactive Fiction”, or “Action” genres. Also, shorter competitive jams tend to be more popular. Based on our findings, we suggest jam hosts and participants improve the description of their products and consider co-organizing or co-participating in a jam. Furthermore, jam participants should develop multi-platform multi-genre games. Finally, jam hosts should introduce a tighter time limit to increase their jam’s popularity.

See our Publications for the full paper.

“Studying the Association between Bountysource Bounties and the Issue-addressing Likelihood of GitHub Issue Reports” was accepted for publication in the TSE journal!

Jiayuan’s paper “Studying the Association between Bountysource Bounties and the Issue-addressing Likelihood of GitHub Issue Reports” was accepted for publication in the EMSE journal! Super congrats Jiayuan!

Abstract:
Due to the voluntary nature of open source software, it can be hard to find a developer to work on a particular task. For
example, some issue reports may be too cumbersome and unexciting for someone to volunteer to do them, yet these issue reports
may be of high priority to the success of a project. To provide an incentive for implementing such issue reports, one can propose a
monetary reward, i.e., a bounty, to the developer who completes that particular task. In this paper, we study bounties in open source
projects on GitHub to better understand how bounties can be leveraged to evolve such projects in terms of addressing issue reports.
We investigated 5,445 bounties for GitHub projects. These bounties were proposed through the Bountysource platform with a total
bounty value of $406,425. We find that 1) in general, the timing of proposing bounties is the most important factor that is associated
with the likelihood of an issue being addressed. More specifically, issue reports are more likely to be addressed if they are for projects
in which bounties are used more frequently and if they are proposed earlier. 2) The bounty value of an issue report is the most
important factor that is associated with the issue-addressing likelihood in the projects in which no bounties were used before. 3) There
is a risk of wasting money for backers who invest money on long-standing issue reports.

See our Publications for the full paper.