( = Paper PDF, = Presentation slides, = Presentation video)
Hao Li; Filipe R. Cogo; Cor-Paul Bezemer
An Empirical Study of Yanked Releases in the Rust Package Registry Journal Article
Transactions of Software Engineering (TSE), 2022.
Abstract | BibTeX | Tags: Release Management, Software Ecosystem
@article{LiTSE2022,
title = {An Empirical Study of Yanked Releases in the Rust Package Registry},
author = {Hao Li and Filipe R. Cogo and Cor-Paul Bezemer},
year = {2022},
date = {2022-02-14},
urldate = {2022-02-14},
journal = {Transactions of Software Engineering (TSE)},
abstract = {Cargo, the software packaging manager of Rust, provides a yank mechanism to support release-level deprecation, which
can prevent packages from depending on yanked releases. Most prior studies focused on code-level (i.e., deprecated APIs) and
package-level deprecation (i.e., deprecated packages). However, few studies have focused on release-level deprecation. In this study,
we investigate how often and how the yank mechanism is used, the rationales behind its usage, and the adoption of yanked releases in
the Cargo ecosystem. Our study shows that 9.6% of the packages in Cargo have at least one yanked release, and the proportion of
yanked releases kept increasing from 2014 to 2020. Package owners yank releases for other reasons than withdrawing a defective
release, such as fixing a release that does not follow semantic versioning or indicating a package is removed or replaced. In addition,
we found that 46% of the packages directly adopted at least one yanked release and the yanked releases propagated through the
dependency network, which leads to 1.4% of the releases in the ecosystem having unresolved dependencies.},
keywords = {Release Management, Software Ecosystem},
pubstate = {published},
tppubtype = {article}
}
can prevent packages from depending on yanked releases. Most prior studies focused on code-level (i.e., deprecated APIs) and
package-level deprecation (i.e., deprecated packages). However, few studies have focused on release-level deprecation. In this study,
we investigate how often and how the yank mechanism is used, the rationales behind its usage, and the adoption of yanked releases in
the Cargo ecosystem. Our study shows that 9.6% of the packages in Cargo have at least one yanked release, and the proportion of
yanked releases kept increasing from 2014 to 2020. Package owners yank releases for other reasons than withdrawing a defective
release, such as fixing a release that does not follow semantic versioning or indicating a package is removed or replaced. In addition,
we found that 46% of the packages directly adopted at least one yanked release and the yanked releases propagated through the
dependency network, which leads to 1.4% of the releases in the ecosystem having unresolved dependencies.
Mikael Sabuhi; Ming (Chloe) Zhou; Cor-Paul Bezemer; Petr Musilek
Applications of Generative Adversarial Networks in Anomaly Detection: A Systematic Literature Review Journal Article
IEEE Access, 2021.
@article{mikael_gan2021,
title = {Applications of Generative Adversarial Networks in Anomaly Detection: A Systematic Literature Review},
author = {Mikael Sabuhi and Ming (Chloe) Zhou and Cor-Paul Bezemer and Petr Musilek},
year = {2021},
date = {2021-12-01},
urldate = {2021-12-01},
journal = {IEEE Access},
abstract = {Anomaly detection has become an indispensable tool for modern society, applied in a wide
range of applications, from detecting fraudulent transactions to malignant brain tumors. Over time, many
anomaly detection techniques have been introduced. However, in general, they all suffer from the same
problem: lack of data that represents anomalous behaviour. As anomalous behaviour is usually costly (or
dangerous) for a system, it is difficult to gather enough data that represents such behaviour. This, in turn,
makes it difficult to develop and evaluate anomaly detection techniques. Recently, generative adversarial
networks (GANs) have attracted much attention in anomaly detection research, due to their unique ability
to generate new data. In this paper, we present a systematic review of the literature in this area, covering
128 papers. The goal of this review paper is to analyze the relation between anomaly detection techniques
and types of GANs, to identify the most common application domains for GAN-assisted and GAN-based
anomaly detection, and to assemble information on datasets and performance metrics used to assess them.
Our study helps researchers and practitioners to find the most suitable GAN-assisted anomaly detection
technique for their application. In addition, we present a research roadmap for future studies in this area. In
summary, GANs are used in anomaly detection to address the problem of insufficient amount of data for the
anomalous behaviour, either through data augmentation or representation learning. The most commonly used
GAN architectures are DCGANs, standard GANs, and cGANs. The primary application domains include
medicine, surveillance and intrusion detection.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}
range of applications, from detecting fraudulent transactions to malignant brain tumors. Over time, many
anomaly detection techniques have been introduced. However, in general, they all suffer from the same
problem: lack of data that represents anomalous behaviour. As anomalous behaviour is usually costly (or
dangerous) for a system, it is difficult to gather enough data that represents such behaviour. This, in turn,
makes it difficult to develop and evaluate anomaly detection techniques. Recently, generative adversarial
networks (GANs) have attracted much attention in anomaly detection research, due to their unique ability
to generate new data. In this paper, we present a systematic review of the literature in this area, covering
128 papers. The goal of this review paper is to analyze the relation between anomaly detection techniques
and types of GANs, to identify the most common application domains for GAN-assisted and GAN-based
anomaly detection, and to assemble information on datasets and performance metrics used to assess them.
Our study helps researchers and practitioners to find the most suitable GAN-assisted anomaly detection
technique for their application. In addition, we present a research roadmap for future studies in this area. In
summary, GANs are used in anomaly detection to address the problem of insufficient amount of data for the
anomalous behaviour, either through data augmentation or representation learning. The most commonly used
GAN architectures are DCGANs, standard GANs, and cGANs. The primary application domains include
medicine, surveillance and intrusion detection.
Arthur V. Kamienski
Studying Trends, Topics, and Duplicate Questions on Q&A Websites for Game Developers Masters Thesis
University of Alberta, 2021.
Abstract | BibTeX | Tags: Computer games, Q&A websites
@mastersthesis{msc_arthur,
title = {Studying Trends, Topics, and Duplicate Questions on Q&A Websites for Game Developers},
author = {Arthur V. Kamienski},
year = {2021},
date = {2021-09-29},
urldate = {2021-09-29},
school = {University of Alberta},
abstract = {The game development industry is growing and there is a high demand for develop-
ers that can produce high-quality games. These developers need resources to learn
and improve the skills required to build those games in a reliable and easy manner.
Question and Answer (Q&A) websites are learning resources that are commonly used
by software developers to share knowledge and acquire the information they need.
However, we still know little about how game developers use and interact with Q&A
websites. In this thesis, we analyze the largest Q&A websites that discuss game de-
velopment to understand how effective they are as learning resources and what can
be improved to build a better Q&A community for their users.
In the first part of this thesis, we analyzed data collected from four Q&A websites,
namely Unity Answers, the Unreal Engine 4 (UE4) AnswerHub, the Game Develop-
ment Stack Exchange, and Stack Overflow, to assess their effectiveness in helping
game developers. We also used the 347 responses collected from a survey we ran
with game developers to gauge their perception of Q&A websites. We found that
the studied websites are in decline, with their activity and effectiveness decreasing
over the last few years and users having an overall negative view of the studied Q&A
communities. We also characterized the topics discussed in those websites using a
latent Dirichlet allocation (LDA) model, and analyze how those topics differ across
websites. Finally, we give recommendations to guide developers to the websites that
are most effective in answering the types of questions they have, which could help the
websites in overcoming their decline.
In the second part of the thesis, we explored how we can further help Q&A web-
sites for game developers by automatically identifying duplicate questions. Duplicate
questions have a negative impact on Q&A websites by overloading them with ques-
tions that have already been answered. Therefore, we analyzed the performance of
seven unsupervised and pre-trained techniques on the task of detecting duplicate
questions on Q&A websites for game developers. We achieved the highest perfor-
mance when comparing all the text content of questions and their answers using a
pre-trained technique based on MPNet. Furthermore, we could almost double the
performance by combining all of the techniques into a single question similarity score
using supervised models. Lastly, we show that the supervised models can be used
on websites different from the ones they were trained on with little to no decrease in
performance. Our findings can be used by Q&A websites and future researchers to
build better systems for duplicate question detection, which can ultimately provide
game developers with better Q&A communities.},
keywords = {Computer games, Q&A websites},
pubstate = {published},
tppubtype = {mastersthesis}
}
ers that can produce high-quality games. These developers need resources to learn
and improve the skills required to build those games in a reliable and easy manner.
Question and Answer (Q&A) websites are learning resources that are commonly used
by software developers to share knowledge and acquire the information they need.
However, we still know little about how game developers use and interact with Q&A
websites. In this thesis, we analyze the largest Q&A websites that discuss game de-
velopment to understand how effective they are as learning resources and what can
be improved to build a better Q&A community for their users.
In the first part of this thesis, we analyzed data collected from four Q&A websites,
namely Unity Answers, the Unreal Engine 4 (UE4) AnswerHub, the Game Develop-
ment Stack Exchange, and Stack Overflow, to assess their effectiveness in helping
game developers. We also used the 347 responses collected from a survey we ran
with game developers to gauge their perception of Q&A websites. We found that
the studied websites are in decline, with their activity and effectiveness decreasing
over the last few years and users having an overall negative view of the studied Q&A
communities. We also characterized the topics discussed in those websites using a
latent Dirichlet allocation (LDA) model, and analyze how those topics differ across
websites. Finally, we give recommendations to guide developers to the websites that
are most effective in answering the types of questions they have, which could help the
websites in overcoming their decline.
In the second part of the thesis, we explored how we can further help Q&A web-
sites for game developers by automatically identifying duplicate questions. Duplicate
questions have a negative impact on Q&A websites by overloading them with ques-
tions that have already been answered. Therefore, we analyzed the performance of
seven unsupervised and pre-trained techniques on the task of detecting duplicate
questions on Q&A websites for game developers. We achieved the highest perfor-
mance when comparing all the text content of questions and their answers using a
pre-trained technique based on MPNet. Furthermore, we could almost double the
performance by combining all of the techniques into a single question similarity score
using supervised models. Lastly, we show that the supervised models can be used
on websites different from the ones they were trained on with little to no decrease in
performance. Our findings can be used by Q&A websites and future researchers to
build better systems for duplicate question detection, which can ultimately provide
game developers with better Q&A communities.
Arthur V. Kamienski; Cor-Paul Bezemer
An Empirical Study of Q&A Websites for Game Developers Journal Article
Empirical Software Engineering Journal (EMSE), 2021.
Abstract | BibTeX | Tags: Game development, Q&A communities
@article{arthur2021,
title = {An Empirical Study of Q&A Websites for Game Developers},
author = {Arthur V. Kamienski and Cor-Paul Bezemer},
year = {2021},
date = {2021-07-07},
urldate = {2021-07-07},
journal = {Empirical Software Engineering Journal (EMSE)},
abstract = {The game development industry is growing, and training new developers in game development-specific abilities is essential to satisfying its need for skilled game developers. These developers require effective learning resources to acquire the information they need and improve their game development skills. Question and Answer (Q&A) websites stand out as some of the most used online learning resources in software development. Many studies have investigated how Q&A websites help software developers become more experienced. However, no studies have explored Q&A websites aimed at game development, and there is little information about how game developers use and interact with these websites. In this paper, we study four Q&A communities by analyzing game development data we collected from their websites and the 347 responses received on a survey we ran with game developers. We observe that the communities have declined over the past few years and identify factors that correlate to these changes. Using a Latent Dirichlet Allocation (LDA) model, we characterize the topics discussed in the communities. We also analyze how topics differ across communities and identify the most discussed topics. Furthermore, we find that survey respondents have a mostly negative view of the communities and tended to stop using the websites once they became more experienced. Finally, we provide recommendations on where game developers should post their questions, which can help mitigate the websites’ declines and improve their effectiveness.},
keywords = {Game development, Q&A communities},
pubstate = {published},
tppubtype = {article}
}
Quang N. Vu; Cor-Paul Bezemer
Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games Identifying Important Requirements of a Recommender System Inproceedings
International Conference on the Foundations of Digital Games (FDG), pp. 1–12, 2021.
Abstract | BibTeX | Tags: Computer games, Game discoverability, Indie games, itch.io, Steam
@inproceedings{Quang21,
title = {Improving the Discoverability of Indie Games by Leveraging their Similarity to Top-Selling Games Identifying Important Requirements of a Recommender System},
author = {Quang N. Vu and Cor-Paul Bezemer},
year = {2021},
date = {2021-04-07},
urldate = {2021-04-07},
booktitle = {International Conference on the Foundations of Digital Games (FDG)},
pages = {1--12},
abstract = {Indie games often lack visibility as compared to top-selling games due to their limited marketing budget and the fact that there are a large number of indie games. Players of top-selling games usually like certain types of games or certain game elements such as theme, gameplay, storyline. Therefore, indie games could leverage their shared game elements with top-selling games to get discovered. In this paper, we propose an approach to improve the discoverability of indie games by recommending similar indie games to gamers of top-selling games. We first matched 2,830 itch.io indie games to 326 top-selling Steam games. We then contacted the indie game
developers for evaluation feedback and suggestions. We found that the majority of them (67.9%) who offered verbose responses show positive support for our approach.We also analyzed the reasons for bad recommendations and the suggestions by indie game developers to lay out the important requirements for such a recommendation system. The most important ones are: a standardized and extensive tag and genre ontology system is needed to bridge the two platforms, the expectations of players of top-selling games should be managed to avoid disappointment, a player’s preferences should be integrated when making recommendations, a standardized age restriction rule is needed, and finally, the recommendation tool should also show indie games that are the least similar or less popular.},
keywords = {Computer games, Game discoverability, Indie games, itch.io, Steam},
pubstate = {published},
tppubtype = {inproceedings}
}
developers for evaluation feedback and suggestions. We found that the majority of them (67.9%) who offered verbose responses show positive support for our approach.We also analyzed the reasons for bad recommendations and the suggestions by indie game developers to lay out the important requirements for such a recommendation system. The most important ones are: a standardized and extensive tag and genre ontology system is needed to bridge the two platforms, the expectations of players of top-selling games should be managed to avoid disappointment, a player’s preferences should be integrated when making recommendations, a standardized age restriction rule is needed, and finally, the recommendation tool should also show indie games that are the least similar or less popular.
Filipe R. Cogo; Gustavo A. Oliva; Cor-Paul Bezemer; Ahmed E. Hassan
An Empirical Study of Same-day Releases of Popular Packages in the npm Ecosystem Journal Article
Empirical Software Engineering Journal (EMSE), 2021.
Abstract | BibTeX | Tags: Dependencies, Release Management, Same-day Release, Software Ecosystem
@article{cogo2021,
title = {An Empirical Study of Same-day Releases of Popular Packages in the npm Ecosystem},
author = {Filipe R. Cogo and Gustavo A. Oliva and Cor-Paul Bezemer and Ahmed E. Hassan},
year = {2021},
date = {2021-04-05},
urldate = {2021-04-05},
journal = {Empirical Software Engineering Journal (EMSE)},
abstract = {Within a software ecosystem, client packages can reuse provider
packages as third-party libraries. The reuse relation between client and provider packages is called a dependency. When a client package depends on the code of a provider package, every change that is introduced in a release of the provider has the potential to impact the client package. Since a large number of dependencies exist within a software ecosystem, releases of a popular provider package can impact a large number of clients. Occasionally, multiple releases of a popular package need to be published on the same day, leading to a scenario in which the time available to revise, test, build, and document the release is restricted compared to releases published within a regular schedule. In this paper, our objective is to study the same-day releases that are published by popular packages in the npm ecosystem. We design an exploratory study to characterize the type of changes that are introduced in same-day releases, the prevalence of same-day releases in the npm ecosystem, and the adoption of same-day releases by client packages. A preliminary manual analysis of the existing release notes suggests that same-day releases introduce non-trivial changes (e.g., bug fixes). We then focus on three RQs. First, we study how often same-day releases are published. We found that the median proportion of regularly scheduled releases that are interrupted by a same-day release (per popular package) is 22%, suggesting the importance of having timely and systematic procedures to cope with same-day releases. Second, we
study the performed code changes in same-day releases. We observe that 32% of the same-day releases have larger changes compared with their prior release, thus showing that some same-day releases can undergo significant maintenance activity despite their time-constrained nature. In our third RQ, we study how client packages react to same-day releases of their providers. We observe the vast majority of client packages that adopt the release preceding the same-day release would also adopt the latter without having to change their versioning statement (implicit updates). We also note that explicit adoptions of sameday releases (i.e., adoptions that require a change to the versioning statement of the provider in question) is significantly faster than the explicit adoption of regular releases. Based on our findings, we argue that (i) third-party tools that support the automation of dependency management (e.g., Dependabot) should consider explicitly flagging same-day releases, (ii) popular packages should strive for optimized release pipelines that can properly handle same-day releases, and (iii) future research should design scalable, ecosystem-ready tools that support provider packages in assessing the impact of their code changes on client packages.},
keywords = {Dependencies, Release Management, Same-day Release, Software Ecosystem},
pubstate = {published},
tppubtype = {article}
}
packages as third-party libraries. The reuse relation between client and provider packages is called a dependency. When a client package depends on the code of a provider package, every change that is introduced in a release of the provider has the potential to impact the client package. Since a large number of dependencies exist within a software ecosystem, releases of a popular provider package can impact a large number of clients. Occasionally, multiple releases of a popular package need to be published on the same day, leading to a scenario in which the time available to revise, test, build, and document the release is restricted compared to releases published within a regular schedule. In this paper, our objective is to study the same-day releases that are published by popular packages in the npm ecosystem. We design an exploratory study to characterize the type of changes that are introduced in same-day releases, the prevalence of same-day releases in the npm ecosystem, and the adoption of same-day releases by client packages. A preliminary manual analysis of the existing release notes suggests that same-day releases introduce non-trivial changes (e.g., bug fixes). We then focus on three RQs. First, we study how often same-day releases are published. We found that the median proportion of regularly scheduled releases that are interrupted by a same-day release (per popular package) is 22%, suggesting the importance of having timely and systematic procedures to cope with same-day releases. Second, we
study the performed code changes in same-day releases. We observe that 32% of the same-day releases have larger changes compared with their prior release, thus showing that some same-day releases can undergo significant maintenance activity despite their time-constrained nature. In our third RQ, we study how client packages react to same-day releases of their providers. We observe the vast majority of client packages that adopt the release preceding the same-day release would also adopt the latter without having to change their versioning statement (implicit updates). We also note that explicit adoptions of sameday releases (i.e., adoptions that require a change to the versioning statement of the provider in question) is significantly faster than the explicit adoption of regular releases. Based on our findings, we argue that (i) third-party tools that support the automation of dependency management (e.g., Dependabot) should consider explicitly flagging same-day releases, (ii) popular packages should strive for optimized release pipelines that can properly handle same-day releases, and (iii) future research should design scalable, ecosystem-ready tools that support provider packages in assessing the impact of their code changes on client packages.
Markos Viggiato; Dayi Lin; Abram Hindle; Cor-Paul Bezemer
What Causes Wrong Sentiment Classifications of Game Reviews? Journal Article
IEEE Transactions on Games, pp. 1–14, 2021.
Abstract | BibTeX | Tags: Computer games, Natural language processing, Sentiment analysis, Steam
@article{markos2021sentiment,
title = {What Causes Wrong Sentiment Classifications of Game Reviews?},
author = {Markos Viggiato and Dayi Lin and Abram Hindle and Cor-Paul Bezemer},
year = {2021},
date = {2021-04-05},
urldate = {2021-04-05},
journal = {IEEE Transactions on Games},
pages = {1--14},
institution = {University of Alberta},
abstract = {Sentiment analysis is a popular technique to identify the sentiment of a piece of text. Several different domains have been targeted by sentiment analysis research, such as Twitter, movie reviews, and mobile app reviews. Although several techniques have been proposed, the performance of current sentiment analysis techniques are still far from acceptable, mainly when applied in domains on which they were not trained. In addition, the causes of wrong classifications are not clear. In this paper, we study how sentiment analysis performs on game reviews. We first report the results of a large scale empirical study on the performance of widely-used sentiment classifiers on game reviews. Then, we investigate the root causes for the wrong classifications and quantify the impact of each cause on the overall performance. We study three existing classifiers: Stanford CoreNLP, NLTK, and SentiStrength. Our results show that most classifiers do not perform well on game reviews, with the best one being NLTK (with an AUC of 0.70). We also identified four main causes for wrong classifications, such as reviews that point out advantages and disadvantages of the game, which might confuse the classifier. The identified causes are not trivial to be resolved and we call upon sentiment analysis and game researchers and developers to prioritize a research agenda that investigates how the performance of sentiment analysis of game reviews can be improved, for instance by developing techniques that can automatically deal with specific game-related issues of reviews (e.g., reviews with advantages and disadvantages). Finally, we show that training sentiment classifiers on reviews that are stratified by the game genre is effective.},
keywords = {Computer games, Natural language processing, Sentiment analysis, Steam},
pubstate = {published},
tppubtype = {article}
}
Arthur V. Kamienski; Luisa Palechor; Cor-Paul Bezemer; Abram Hindle
PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects Inproceedings
MSR Mining Challenge, pp. 1–5, 2021.
Abstract | BibTeX | Tags: Open-source projects, Python, Single-statement bugs
@inproceedings{athur2021pysstubs,
title = {PySStuBs: Characterizing Single-Statement Bugs in Popular Open-Source Python Projects},
author = {Arthur V. Kamienski and Luisa Palechor and Cor-Paul Bezemer and Abram Hindle},
year = {2021},
date = {2021-03-08},
urldate = {2021-03-08},
booktitle = {MSR Mining Challenge},
pages = {1--5},
abstract = {Single-statement bugs (SStuBs) can have a severe impact on developer productivity. Despite usually being simple and not offering much of a challenge to fix, these bugs may still disturb a developer’s workflow and waste precious development time. However, few studies have paid attention to these simple bugs, focusing instead on bugs of any size and complexity. In this study, we explore the occurrence of SStuBs in some of the most popular open-source Python projects on GitHub, while also characterizing their patterns and distribution. We further compare these bugs to SStuBs found in a previous study on Java Maven projects. We find that these Python projects have different SStuB patterns than the ones in Java Maven projects and identify 7 new SStuB patterns. Our results may help uncover the importance of understanding these bugs for the Python programming language, and how developers can handle them more effectively.},
keywords = {Open-source projects, Python, Single-statement bugs},
pubstate = {published},
tppubtype = {inproceedings}
}
Rain Epp; Dayi Lin; Cor-Paul Bezemer
An Empirical Study of Trends of Popular Virtual Reality Games and Their Complaints Journal Article
IEEE Transactions on Games, pp. 1–12, 2021.
Abstract | BibTeX | Tags: Gamer complaints, Virtual reality games
@article{rain2021vr,
title = {An Empirical Study of Trends of Popular Virtual Reality Games and Their Complaints},
author = {Rain Epp and Dayi Lin and Cor-Paul Bezemer},
year = {2021},
date = {2021-01-29},
urldate = {2021-01-29},
journal = {IEEE Transactions on Games},
pages = {1--12},
institution = {University of Alberta},
abstract = {The market for virtual reality (VR) games is growing rapidly, and is expected to grow from $3.3B in 2018 to $13.7B in 2022. Due to the immersive nature of such games and the use of VR headsets, players may have complaints about VR games which are distinct from those about traditional computer games, and an understanding of those complaints could enable developers to better take advantage of the growing VR market. We conduct an empirical study of 750 popular VR games and 17,635 user reviews on Steam in order to understand trends in VR games and their complaints. We find that the VR games market is maturing. Fewer VR games are released each month but their quality appears to be improving over time. Most games support multiple headsets and play areas, and support for smaller-scale play areas is increasing. Complaints of cybersickness are rare and declining, indicating that players are generally more concerned with other issues. Recently, complaints about game-specific issues have become the most frequent type of complaint, and VR game developers can now focus on these issues and worry less about VR-comfort issues such as cybersickness.},
keywords = {Gamer complaints, Virtual reality games},
pubstate = {published},
tppubtype = {article}
}
Sara Gholami; Hamzeh Khazaei; Cor-Paul Bezemer
Should you Upgrade Official Docker Hub Images in Production Environments? Inproceedings
ICSE New Ideas and Emerging Results (NIER), pp. 1–5, 2021.
Abstract | BibTeX | Tags: Containerization, Dependency upgrades, Docker, Docker Hub, Downgrades
@inproceedings{sara2021icsenier,
title = {Should you Upgrade Official Docker Hub Images in Production Environments?},
author = {Sara Gholami and Hamzeh Khazaei and Cor-Paul Bezemer},
year = {2021},
date = {2021-01-29},
urldate = {2021-01-29},
booktitle = {ICSE New Ideas and Emerging Results (NIER)},
pages = {1--5},
abstract = {Docker, one of the most popular software containerization technologies, allows a user to deploy Docker images to create and run containers. While Docker images facilitate the deployment and in-place upgrading of an application in a production environment by replacing its container with one based on a newer image, many dependencies could change at once during such an image upgrade, which can potentially be a source of risk. In this paper, we study the official Docker images on Docker Hub and explore how packages are changing in these images. We found that the number of package changes varies across different types of applications and that often the changing packages are utility packages. Our study takes a first important look at potential risks when doing an in-place upgrade of a Docker image.},
keywords = {Containerization, Dependency upgrades, Docker, Docker Hub, Downgrades},
pubstate = {published},
tppubtype = {inproceedings}
}
Hareem Sahar; Abram Hindle; Cor-Paul Bezemer
How are Issue Reports Discussed in Gitter Chat Rooms? Journal Article
Journal of Systems and Software (JSS), pp. 1–53, 2020.
Abstract | BibTeX | Tags: Developer discussions, Gitter, Issue reports
@article{sahar2020JSS-Gitter-Issues,
title = {How are Issue Reports Discussed in Gitter Chat Rooms?},
author = {Hareem Sahar and Abram Hindle and Cor-Paul Bezemer},
year = {2020},
date = {2020-10-29},
urldate = {2020-10-29},
journal = {Journal of Systems and Software (JSS)},
pages = {1--53},
institution = {University of Alberta},
abstract = {Informal communication channels like mailing lists, IRC and instant messaging play a vital role in open source software development by facilitating communication within geographically diverse project teams e.g., to discuss issue reports to facilitate the bug-xing process. More recently, chat systems like Slack and Gitter have gained a lot of popularity and developers are rapidly adopting them. Gitter is a chat system that is specically designed to address the needs of GitHub users. Gitter hosts project-based asynchronous chats which foster frequent project discussions among participants. Developer discussions contain a wealth of information such as the rationale behind decisions made during the evolution of a project. In this study, we explore 24 open source project chat rooms that are hosted on Gitter, containing a total of 3,133,106 messages and 14,096 issue references. We manually analyze the contents of chat room discussions around 457 issue reports. The results of our study show the prevalence of issue discussions on Gitter, and that the discussed issue reports have a longer resolution time than the issue reports that are never brought on Gitter.},
keywords = {Developer discussions, Gitter, Issue reports},
pubstate = {published},
tppubtype = {article}
}
Quang N. Vu
Leveraging Data From the Itch.io Online Game Distribution Platform to Help Indie Game Developers Masters Thesis
University of Alberta, 2020.
@mastersthesis{msc_quang,
title = {Leveraging Data From the Itch.io Online Game Distribution Platform to Help Indie Game Developers},
author = {Quang N. Vu},
year = {2020},
date = {2020-09-01},
urldate = {2020-09-01},
school = {University of Alberta},
abstract = {In the game distribution world, Steam is often regarded as the most prominent digital platform for its many famous games made by large developers. On the other hand, the itch.io game distribution platform is praised for its friendliness toward small independent (indie) games developed by small teams or even a single developer. itch.io allows game developers to participate in online game jams (hackathons during which games are built) or publish their games at no publishing cost. In this thesis, we study game data mined from itch.io to help indie game developers: (1) have a higher chance of winning a game jam and (2) increase the discoverability of their games.
In the first part of the thesis, we study the game jams and their high-ranking submissions to better understand the characteristics of a popular game jam (i.e., a jam that receives many submissions) and the characteristics of high-ranking game submissions in these jams. We collected data of 1,290 past game jams and their 3,752 submissions for our analysis. We found that a quality description contributes positively to a jam's popularity and a game's ranking. Additionally, more manpower organizing a jam or developing a game increases their likelihood of being popular or high-ranking respectively. High-ranking games tend to support Windows or macOS, and belong to the Puzzle, Platformer, Interactive Fiction, or Action genres. Finally, shorter competitive jams tend to be more popular. Our findings are useful for both future game jam organizers and participants.
In the second part of the thesis, we study an approach to increase the discoverability of the indie games hosted on itch.io by recommending similar indie games to players of top-selling Steam games. We implemented a content-based recommendation technique that leverages the similarity in tags, genres, and game description between an indie game and a top-selling game using the metadata of 2,830 itch.io indie games and 326 top-selling Steam games. We then contacted the indie game
developers for feedback and suggestion on our approach. We found that the majority (67.9%) of them show positive support for our idea. We analyzed the downvoted recommendations to understand the reasons and lay out the important requirements for such an indie game recommendation approach. These requirements are useful for future research and development in indie game discoverability and recommendation.},
keywords = {},
pubstate = {published},
tppubtype = {mastersthesis}
}
In the first part of the thesis, we study the game jams and their high-ranking submissions to better understand the characteristics of a popular game jam (i.e., a jam that receives many submissions) and the characteristics of high-ranking game submissions in these jams. We collected data of 1,290 past game jams and their 3,752 submissions for our analysis. We found that a quality description contributes positively to a jam's popularity and a game's ranking. Additionally, more manpower organizing a jam or developing a game increases their likelihood of being popular or high-ranking respectively. High-ranking games tend to support Windows or macOS, and belong to the Puzzle, Platformer, Interactive Fiction, or Action genres. Finally, shorter competitive jams tend to be more popular. Our findings are useful for both future game jam organizers and participants.
In the second part of the thesis, we study an approach to increase the discoverability of the indie games hosted on itch.io by recommending similar indie games to players of top-selling Steam games. We implemented a content-based recommendation technique that leverages the similarity in tags, genres, and game description between an indie game and a top-selling game using the metadata of 2,830 itch.io indie games and 326 top-selling Steam games. We then contacted the indie game
developers for feedback and suggestion on our approach. We found that the majority (67.9%) of them show positive support for our idea. We analyzed the downvoted recommendations to understand the reasons and lay out the important requirements for such an indie game recommendation approach. These requirements are useful for future research and development in indie game discoverability and recommendation.
Markos Viggiato; Cor-Paul Bezemer
Trouncing in Dota 2: An Investigation of Blowout Matches Inproceedings
The 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE), pp. 1–7, 2020.
@inproceedings{Markos2020dota2,
title = {Trouncing in Dota 2: An Investigation of Blowout Matches},
author = {Markos Viggiato and Cor-Paul Bezemer},
year = {2020},
date = {2020-08-10},
urldate = {2020-08-10},
booktitle = {The 16th AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (AIIDE)},
pages = {1--7},
abstract = {With an increasing popularity, Multiplayer Online Battle Arena games where two teams compete against each other, such as Dota 2, play a major role in esports tournaments, attracting millions of spectators. Some matches (so-called blowout matches) end extremely quickly or have a very large difference in scores. Understanding which factors lead to a victory in a blowout match is useful knowledge for players who wish to improve their chances of winning and for improving the accuracy of recommendation systems for heroes. In this paper, we perform a comparative study between blowout and regular matches. We study 55,287 past professional Dota 2 matches to (1) investigate how accurately we can predict victory using only pre-match features and (2) explain the factors that are correlated with the victory. We investigate three machine learning algorithms and find that Gradient Boosting Machines (XGBoost) perform best with an Area Under the Curve (AUC) of up to 0.86. Our results show that the experience of the player with the picked hero has a different importance for blowout and regular matches. Also, hero attributes are more important for blowouts with a large score difference. Based on our results, we suggest that players (1) pick heroes with which they achieved a high performance in previous matches to increase their chances of winning and (2) focus on heroes’ attributes such as intelligence to win with a large score difference.},
keywords = {},
pubstate = {published},
tppubtype = {inproceedings}
}
Safwat Hassan; Cor-Paul Bezemer; Ahmed E. Hassan
Studying Bad Updates of Top Free-to-Download Apps in the Google Play Store Journal Article
The Transactions of Software Engineering (TSE) journal, 2020.
Abstract | BibTeX | Tags: Android mobile apps, Bad updates, Google Play Store, Mobile app reviews
@article{safwat_tse,
title = {Studying Bad Updates of Top Free-to-Download Apps in the Google Play Store},
author = {Safwat Hassan and Cor-Paul Bezemer and Ahmed E. Hassan},
year = {2020},
date = {2020-07-01},
urldate = {2020-07-01},
journal = {The Transactions of Software Engineering (TSE) journal},
publisher = {IEEE},
abstract = {Developers always focus on delivering high-quality updates to improve, or maintain the rating of their apps. Prior work has studied user reviews by analyzing all reviews of an app. However, this app-level analysis misses the point that users post reviews to provide their feedback on a certain update. For example, two bad updates of an app with a history of good updates would not be spotted using app-level analysis. In this paper, we examine reviews at the update-level to better understand how users perceive bad updates. We focus our study on the top 250 bad updates (i.e., updates with the highest increase in the percentage of negative reviews relative to the prior updates of the app) from 26,726 updates of 2,526 top free-to-download apps in the Google Play Store. We find that feature removal and UI issues have the highest increase in the percentage of negative reviews. Bad updates with crashes and functional issues are the most likely to be fixed by a later update. However, developers often do not mention these fixes in the release notes. Our work demonstrates the necessity of an update-level analysis of reviews to capture the feelings of an app’s user-base about a particular update.},
keywords = {Android mobile apps, Bad updates, Google Play Store, Mobile app reviews},
pubstate = {published},
tppubtype = {article}
}
Sara Gholami
Studying Dependency Updates and a Framework for Multi-Versioning in Docker Containers Masters Thesis
University of Alberta, 2020.
@mastersthesis{msc_sara,
title = {Studying Dependency Updates and a Framework for Multi-Versioning in Docker Containers},
author = {Sara Gholami},
year = {2020},
date = {2020-06-01},
urldate = {2020-06-01},
school = {University of Alberta},
abstract = {Containerized software systems are becoming more popular and complex as they are one of the essential techniques that enable cloud computing. One of the enabling technologies for containerized software systems is the Docker framework. Docker is an open-source framework for deploying containers, lightweight, standalone, and executable units of software with all their dependencies (packages and libraries) that can run on any computing environment. Docker images facilitate deploying and upgrading systems as all of the dependencies required for a software package are included in an image. However, there exist several risks with running Docker images in production environments. One risky situation can occur when upgrading images, as an upgrade may result in many changing packages or libraries at once.
Therefore, in this thesis, we study the Docker images and analyze them to identify the risks of package changes. Also, we propose our solution, DockerMV, to mitigate this risk by running multiple versions of an image at the same time.
In this first part of this thesis, we analyze the official Docker image repositories that are available on Docker Hub, Docker’s public registry that holds Docker images. For each image in these repositories, we extract details about its native, Node, and Python packages. Afterward, we investigate which types of applications have more package changes in their image upgrades. We find that, depending on the type of applications, the package changes have different trends. For example, Operating systems and Base Images repositories have a lower median number of changes. However, Analytics and Application Services repositories have the highest median number of package changes. Our findings show that practitioners should be extra cautious when doing in-place upgrades of images of such applications in their production environments.
In the second part of this thesis, we provide a solution for mitigating this risk by applying software multi-versioning to Docker images. We present DockerMV, an open-source extension of the Docker framework that supports multi-versioning for containerized software systems. We demonstrate the usefulness of DockerMV from the performance point of view and test it on two open-source subject systems. In particular, we demonstrate how DockerMV can be used to balance the workload between Docker images that contain different versions of the same application. In both experiments, DockerMV maintained the system’s performance while using a limited set of resources.},
keywords = {},
pubstate = {published},
tppubtype = {mastersthesis}
}
Therefore, in this thesis, we study the Docker images and analyze them to identify the risks of package changes. Also, we propose our solution, DockerMV, to mitigate this risk by running multiple versions of an image at the same time.
In this first part of this thesis, we analyze the official Docker image repositories that are available on Docker Hub, Docker’s public registry that holds Docker images. For each image in these repositories, we extract details about its native, Node, and Python packages. Afterward, we investigate which types of applications have more package changes in their image upgrades. We find that, depending on the type of applications, the package changes have different trends. For example, Operating systems and Base Images repositories have a lower median number of changes. However, Analytics and Application Services repositories have the highest median number of package changes. Our findings show that practitioners should be extra cautious when doing in-place upgrades of images of such applications in their production environments.
In the second part of this thesis, we provide a solution for mitigating this risk by applying software multi-versioning to Docker images. We present DockerMV, an open-source extension of the Docker framework that supports multi-versioning for containerized software systems. We demonstrate the usefulness of DockerMV from the performance point of view and test it on two open-source subject systems. In particular, we demonstrate how DockerMV can be used to balance the workload between Docker images that contain different versions of the same application. In both experiments, DockerMV maintained the system’s performance while using a limited set of resources.
Daniel Lee; Gopi Krishnan Rajbahadur; Dayi Lin; Mohammed Sayagh; Cor-Paul Bezemer; Ahmed E. Hassan
An Empirical Study of the Characteristics of Popular Minecraft Mods Journal Article
Empirical Software Engineering (EMSE) Journal, 2020.
Abstract | BibTeX | Tags: CurseForge, Minecraft, Mod development, Mods
@article{Lee2020curseforge,
title = {An Empirical Study of the Characteristics of Popular Minecraft Mods},
author = {Daniel Lee and Gopi Krishnan Rajbahadur and Dayi Lin and Mohammed Sayagh and Cor-Paul Bezemer and Ahmed E. Hassan},
year = {2020},
date = {2020-06-01},
urldate = {2020-06-01},
journal = {Empirical Software Engineering (EMSE) Journal},
abstract = {It is becoming increasingly difficult for game developers to manage the cost of developing a game, while meeting the high expectations of gamers. One way to balance the increasing gamer expectation and development stress is to build an active modding community around the game. There exist several examples of games with an extremely active and successful modding community, with the Minecraft game being one of the most notable ones.
This paper reports on an empirical study of 1,114 popular and 1,114 unpopular Minecraft mods from the CurseForge mod distribution platform, one of the largest distribution platforms for Minecraft mods. We analyzed the relationship between 33 features across 5 dimensions of mod characteristics and the popularity of mods (i.e., mod category, mod documentation, environmental context of the mod, remuneration for the mod, and community contribution for the mod), to understand the characteristics of popular Minecraft mods. We firstly verify that the studied dimensions have significant explanatory power in distinguishing the popularity of the studied mods. Then we evaluated the contribution of each of the 33 features across the 5 dimensions. We observed that popular mods tend to have a high quality description and promote community contribution. In addition, simplifying the mod development is positively correlated with mod popularity.},
keywords = {CurseForge, Minecraft, Mod development, Mods},
pubstate = {published},
tppubtype = {article}
}
This paper reports on an empirical study of 1,114 popular and 1,114 unpopular Minecraft mods from the CurseForge mod distribution platform, one of the largest distribution platforms for Minecraft mods. We analyzed the relationship between 33 features across 5 dimensions of mod characteristics and the popularity of mods (i.e., mod category, mod documentation, environmental context of the mod, remuneration for the mod, and community contribution for the mod), to understand the characteristics of popular Minecraft mods. We firstly verify that the studied dimensions have significant explanatory power in distinguishing the popularity of the studied mods. Then we evaluated the contribution of each of the 33 features across the 5 dimensions. We observed that popular mods tend to have a high quality description and promote community contribution. In addition, simplifying the mod development is positively correlated with mod popularity.
Hammam M. AlGhamdi; Cor-Paul Bezemer; Weiyi Shang; Ahmed E. Hassan; Parminder Flora
Towards Reducing the Time Needed for Load Testing Journal Article
Journal of Software Evolution and Process (JSEP), 2020.
Abstract | BibTeX | Tags: Load testing, Performance analysis, Performance testing
@article{AlGhamdi2020loadtests,
title = {Towards Reducing the Time Needed for Load Testing},
author = {Hammam M. AlGhamdi and Cor-Paul Bezemer and Weiyi Shang and Ahmed E. Hassan and Parminder Flora},
year = {2020},
date = {2020-05-12},
urldate = {2020-05-12},
journal = {Journal of Software Evolution and Process (JSEP)},
abstract = {The performance of large-scale systems must be thoroughly tested under various levels of workload, as load-related issues can have a disastrous impact on the system. However, load tests often require a large amount of time, running from hours to even days, to execute. Nowadays, with the increased popularity of rapid releases and continuous deployment, testing time is at a premium and should be minimized while still delivering a complete test of the system. In our prior work, we proposed to reduce the execution time of a load test by detecting repetitiveness in individual performance metric values, such as CPU utilization or memory usage, that are observed during the test. However, as we explain in this paper, disregarding combinations of performance metrics may miss important information about the load-related behaviour of a system.
Therefore, in this paper we revisit our prior approach, by proposing a new approach that reduces the execution time of a load test by detecting whether a test no longer exercises new combinations of the observed performance metrics. We conduct an experimental case study on three open source systems (CloudStore, PetClinic, and Dell DVD Store 2), in which we use our new and prior approaches to reduce the execution time of a 24-hour load test. We show that our new approach is capable of reducing the execution time of the test to less than 8.5 hours, while preserving a coverage of at least 95% of the combinations that are observed between the performance metrics during the 24-hour tests. In addition, we show that our prior approach recommends a stopping time that is too early for two of the three studied systems. Finally, we discuss the challenges of applying our approach to an industrial setting, and we call upon the community to help us to address these challenges.},
keywords = {Load testing, Performance analysis, Performance testing},
pubstate = {published},
tppubtype = {article}
}
Therefore, in this paper we revisit our prior approach, by proposing a new approach that reduces the execution time of a load test by detecting whether a test no longer exercises new combinations of the observed performance metrics. We conduct an experimental case study on three open source systems (CloudStore, PetClinic, and Dell DVD Store 2), in which we use our new and prior approaches to reduce the execution time of a 24-hour load test. We show that our new approach is capable of reducing the execution time of the test to less than 8.5 hours, while preserving a coverage of at least 95% of the combinations that are observed between the performance metrics during the 24-hour tests. In addition, we show that our prior approach recommends a stopping time that is too early for two of the three studied systems. Finally, we discuss the challenges of applying our approach to an industrial setting, and we call upon the community to help us to address these challenges.
Quang N. Vu; Cor-Paul Bezemer
An Empirical Study of the Characteristics of Popular Game Jams and Their High-ranking Submissions on itch.io Inproceedings
International Conference on the Foundations of Digital Games (FDG), pp. 1–12, 2020.
Abstract | BibTeX | Tags: Empirical software engineering, Game development, Game jams, itch.io, Mining software repositories
@inproceedings{Quang20,
title = {An Empirical Study of the Characteristics of Popular Game Jams and Their High-ranking Submissions on itch.io},
author = {Quang N. Vu and Cor-Paul Bezemer},
year = {2020},
date = {2020-04-14},
urldate = {2020-04-14},
booktitle = {International Conference on the Foundations of Digital Games (FDG)},
pages = {1--12},
abstract = {Game jams are hackathon-like events that allow participants to develop a playable game prototype within a time limit. They foster creativity and the exchange of ideas by letting developers with different skill sets collaborate. Having a high-ranking game is a great bonus to a beginning game developer’s résumé and their pursuit of a career in the game industry. However, participants often face time constraints set by jam hosts while balancing what aspects of their games should be emphasized to have the highest chance of winning. Similarly, hosts need to understand what to emphasize when organizing online jams so that their jams are more popular, in terms of submission rate. In this paper, we study 1,290 past game jams and their 3,752 submissions on itch.io to understand better what makes popular jams and high-ranking games perceived well by the audience. We find that a quality description has a positive contribution to both a jam’s popularity and a game’s ranking. Additionally, more manpower organizing a jam or developing a game increases a jam’s popularity and a game’s high-ranking likelihood. Highranking games tend to support Windows or macOS, and belong to the “Puzzleâ€, “Platformerâ€, “Interactive Fictionâ€, or “Action†genres. Also, shorter competitive jams tend to be more popular. Based on our findings, we suggest jam hosts and participants improve the description of their products and consider co-organizing or co-participating in a jam. Furthermore, jam participants should develop multi-platform multi-genre games. Finally, jam hosts should introduce a tighter time limit to increase their jam’s popularity.},
keywords = {Empirical software engineering, Game development, Game jams, itch.io, Mining software repositories},
pubstate = {published},
tppubtype = {inproceedings}
}
Jiayuan Zhou; Shaowei Wang; Cor-Paul Bezemer; Ying Zou; Ahmed E. Hassan
Studying the Association between Bountysource Bounties and the Issue-addressing Likelihood of GitHub Issue Reports Journal Article
Transactions on Software Engineering (TSE), 2020.
Abstract | BibTeX | Tags: Bounties, Bountysource, GitHub, Open source software, Software evolution
@article{Zhou2020bountysource,
title = {Studying the Association between Bountysource Bounties and the Issue-addressing Likelihood of GitHub Issue Reports},
author = {Jiayuan Zhou and Shaowei Wang and Cor-Paul Bezemer and Ying Zou and Ahmed E. Hassan},
year = {2020},
date = {2020-02-12},
urldate = {2020-02-12},
journal = {Transactions on Software Engineering (TSE)},
abstract = {Due to the voluntary nature of open source software, it can be hard to find a developer to work on a particular task. For example, some issue reports may be too cumbersome and unexciting for someone to volunteer to do them, yet these issue reports may be of high priority to the success of a project. To provide an incentive for implementing such issue reports, one can propose a monetary reward, i.e., a bounty, to the developer who completes that particular task. In this paper, we study bounties in open source projects on GitHub to better understand how bounties can be leveraged to evolve such projects in terms of addressing issue reports. We investigated 5,445 bounties for GitHub projects. These bounties were proposed through the Bountysource platform with a total bounty value of $406,425. We find that 1) in general, the timing of proposing bounties is the most important factor that is associated with the likelihood of an issue being addressed. More specifically, issue reports are more likely to be addressed if they are for projects in which bounties are used more frequently and if they are proposed earlier. 2) The bounty value of an issue report is the most important factor that is associated with the issue-addressing likelihood in the projects in which no bounties were used before. 3) There is a risk of wasting money for backers who invest money on long-standing issue reports.},
keywords = {Bounties, Bountysource, GitHub, Open source software, Software evolution},
pubstate = {published},
tppubtype = {article}
}
Simon Eismann; Cor-Paul Bezemer; Weiyi Shang; Dušan Okanović; André van Hoorn
Microservices: A Performance Tester's Dream or Nightmare? Inproceedings
ACM/SPEC International Conference on Performance Engineering (ICPE), pp. 1–12, 2020.
Abstract | BibTeX | Tags: DevOps, Microservices, Performance, Regression testing
@inproceedings{Simon20,
title = {Microservices: A Performance Tester's Dream or Nightmare?},
author = {Simon Eismann and Cor-Paul Bezemer and Weiyi Shang and Dušan Okanović and André van Hoorn },
year = {2020},
date = {2020-01-24},
urldate = {2020-01-24},
booktitle = {ACM/SPEC International Conference on Performance Engineering (ICPE)},
pages = {1--12},
abstract = {In recent years, there has been a shift in software development towards microservice-based architectures, which consist of small services that focus on one particular functionality. Many companies are migrating their applications to such architectures to reap the benefits of microservices, such as increased flexibility, scalability and a smaller granularity of the offered functionality by a service.
On the one hand, the benefits of microservices for functional testing are often praised, as the focus on one functionality and their smaller granularity allow for more targeted and more convenient testing. On the other hand, using microservices has their consequences (both positive and negative) on other types of testing, such as performance testing. Performance testing is traditionally done by establishing the baseline performance of a software version, which is then used to compare the performance testing results of later software versions. However, as we show in this paper, establishing such a baseline performance is challenging in microservice applications.
In this paper, we discuss the benefits and challenges of microservices from a performance tester’s point of view. Through a series of experiments on the TeaStore application, we demonstrate how microservices affect the performance testing process, and we demonstrate that it is not straightforward to achieve reliable performance testing results for a microservice application.},
keywords = {DevOps, Microservices, Performance, Regression testing},
pubstate = {published},
tppubtype = {inproceedings}
}
On the one hand, the benefits of microservices for functional testing are often praised, as the focus on one functionality and their smaller granularity allow for more targeted and more convenient testing. On the other hand, using microservices has their consequences (both positive and negative) on other types of testing, such as performance testing. Performance testing is traditionally done by establishing the baseline performance of a software version, which is then used to compare the performance testing results of later software versions. However, as we show in this paper, establishing such a baseline performance is challenging in microservice applications.
In this paper, we discuss the benefits and challenges of microservices from a performance tester’s point of view. Through a series of experiments on the TeaStore application, we demonstrate how microservices affect the performance testing process, and we demonstrate that it is not straightforward to achieve reliable performance testing results for a microservice application.