Publications – Analytics of Software, GAmes And Repository Data (ASGAARD) Lab

61.

Mohamed Sami Rakha; Cor-Paul Bezemer; Ahmed E. Hassan

Revisiting the Performance Evaluation of Automated Approaches for the Identification of Duplicate Issue Reports Journal Article

The Transactions of Software Engineering (TSE) journal, 44 (12), pp. 1245–1268, 2017.

Files:

Abstract | BibTeX | Tags: Performance evaluation, Software engineering, Text analysis

@article{sami16tse,

title = {Revisiting the Performance Evaluation of Automated Approaches for the Identification of Duplicate Issue Reports},

author = {Mohamed Sami Rakha and Cor-Paul Bezemer and Ahmed E. Hassan},

year  = {2017},

date = {2017-09-21},

urldate = {2017-09-21},

journal = {The Transactions of Software Engineering (TSE) journal},

volume = {44},

number = {12},

pages = {1245--1268},

publisher = {IEEE},

abstract = {Issue tracking systems (ITSs), such as Bugzilla, are commonly used to track reported bugs, improvements and change requests for a software project. To avoid wasting developer resources on previously-reported (i.e., duplicate) issues, it is necessary to identify such duplicates as soon as they are reported. Several automated approaches have been proposed for retrieving duplicate reports, i.e., identifying the duplicate of a new issue report in a list of n candidates. These approaches rely on leveraging the textual, categorical, and contextual information in previously-reported issues to decide whether a newly-reported issue has previously been reported. In general, these approaches are evaluated using data that spans a relatively short period of time (i.e., the classical evaluation). However, in this paper, we show that the classical evaluation tends to overestimate the performance of automated approaches for retrieving duplicate issue reports. Instead, we propose a realistic evaluation using all the reports that are available in the ITS of a software project. We conduct experiments in which we evaluate two popular approaches for retrieving duplicate issues (BM25F and REP) using the classical and realistic evaluations. We find that for the issue tracking data of the Mozilla foundation, the Eclipse foundation and OpenOffice, the realistic evaluation shows that previously proposed approaches perform considerably lower than previously reported using the classical evaluation. As a result, we conclude that the reported performance of approaches for retrieving duplicate issue reports is significantly overestimated in literature. In order to improve the performance of the automated retrieval of duplicate issue reports, we propose to leverage the resolution field of issue reports. Our experiments show that a relative improvement in the performance of a median of 7-21.5% and a maximum of 19-60% can be achieved by leveraging the resolution field of issue reports for the automated retrieval of duplicates.},

keywords = {Performance evaluation, Software engineering, Text analysis},

pubstate = {published},

tppubtype = {article}

}

Close

Issue tracking systems (ITSs), such as Bugzilla, are commonly used to track reported bugs, improvements and change requests for a software project. To avoid wasting developer resources on previously-reported (i.e., duplicate) issues, it is necessary to identify such duplicates as soon as they are reported. Several automated approaches have been proposed for retrieving duplicate reports, i.e., identifying the duplicate of a new issue report in a list of n candidates. These approaches rely on leveraging the textual, categorical, and contextual information in previously-reported issues to decide whether a newly-reported issue has previously been reported. In general, these approaches are evaluated using data that spans a relatively short period of time (i.e., the classical evaluation). However, in this paper, we show that the classical evaluation tends to overestimate the performance of automated approaches for retrieving duplicate issue reports. Instead, we propose a realistic evaluation using all the reports that are available in the ITS of a software project. We conduct experiments in which we evaluate two popular approaches for retrieving duplicate issues (BM25F and REP) using the classical and realistic evaluations. We find that for the issue tracking data of the Mozilla foundation, the Eclipse foundation and OpenOffice, the realistic evaluation shows that previously proposed approaches perform considerably lower than previously reported using the classical evaluation. As a result, we conclude that the reported performance of approaches for retrieving duplicate issue reports is significantly overestimated in literature. In order to improve the performance of the automated retrieval of duplicate issue reports, we propose to leverage the resolution field of issue reports. Our experiments show that a relative improvement in the performance of a median of 7-21.5% and a maximum of 19-60% can be achieved by leveraging the resolution field of issue reports for the automated retrieval of duplicates.

Close

62.

Dayi Lin; Cor-Paul Bezemer; Ahmed E. Hassan

Studying the Urgent Updates of Popular Games on the Steam Platform Journal Article

The Empirical Software Engineering Journal (EMSE), 22 (4), pp. 2095–2126, 2017.

Files:

Abstract | BibTeX | Tags: Computer games, Steam, Update cycle, Update strategy, Urgent updates

@article{Lin16urgent,

title = {Studying the Urgent Updates of Popular Games on the Steam Platform},

author = {Dayi Lin and Cor-Paul Bezemer and Ahmed E. Hassan},

year  = {2017},

date = {2017-08-01},

urldate = {2017-08-01},

journal = {The Empirical Software Engineering Journal (EMSE)},

volume = {22},

number = {4},

pages = {2095--2126},

publisher = {Springer},

abstract = {The steadily increasing popularity of computer games has led to the rise of a multi-billion dollar industry. This increasing popularity is partly enabled by online digital distribution platforms for games, such as Steam. These platforms offer an insight into the development and test processes of game developers. In particular, we can extract the update cycle of a game and study what makes developers deviate from that cycle by releasing so-called urgent updates.

An urgent update is a software update that fixes problems that are deemed critical enough to not be left unfixed until a regular-cycle update. Urgent updates are made in a state of emergency and outside the regular development and test timelines which causes unnecessary stress on the development team. Hence, avoiding the need for an urgent update is important for game developers. We define urgent updates as 0-day updates (updates that are released on the same day), updates that are released faster than the regular cycle, or self-admitted hotfixes.

We conduct an empirical study of the urgent updates of the 50 most popular games from Steam, the dominant digital game delivery platform. As urgent updates are reflections of mistakes in the development and test processes, a better understanding of urgent updates can in turn stimulate the improvement of these processes, and eventually save resources for game developers. In this paper, we argue that the update strategy that is chosen by a game developer affects the number of urgent updates that are released. Although the choice of update strategy does not appear to have an impact on the percentage of updates that are released faster than the regular cycle or self-admitted hotfixes, games that use a frequent update strategy tend to have a higher proportion of 0-day updates than games that use a traditional update strategy.},

keywords = {Computer games, Steam, Update cycle, Update strategy, Urgent updates},

pubstate = {published},

tppubtype = {article}

}

Close

63.

Philipp Leitner; Cor-Paul Bezemer

An Exploratory Study of the State of Practice of Performance Testing in Java-based Open Source Projects Inproceedings

The International Conference on Performance Engineering (ICPE), pp. 373–384, ACM/SPEC, 2017.

Files:

Abstract | BibTeX | Tags: Empirical software engineering, Mining software repositories, Open source, Performance engineering, Performance testing

@inproceedings{leitner16oss,

title = {An Exploratory Study of the State of Practice of Performance Testing in Java-based Open Source Projects},

author = {Philipp Leitner and Cor-Paul Bezemer},

year  = {2017},

date = {2017-04-22},

urldate = {2017-04-22},

booktitle = {The International Conference on Performance Engineering (ICPE)},

pages = {373--384},

publisher = {ACM/SPEC},

abstract = {The usage of open source (OS) software is nowadays widespread across many industries and domains. While the functional quality of OS projects is considered to be up to par with that of closed-source software, much is unknown about the quality in terms of non-functional attributes, such as

performance. One challenge for OS developers is that, unlike for functional testing, there is a lack of accepted best practices for performance testing.

To reveal the state of practice of performance testing in OS projects, we conduct an exploratory study on 111 Java-based OS projects from GitHub. We study the performance tests of these projects from five perspectives: (1) the developers, (2) size, (3) organization and (4) types of performance tests

and (5) the tooling used for performance testing.

First, in a quantitative study we show that writing performance tests is not a popular task in OS projects: performance tests form only a small portion of the test suite, are rarely updated, and are usually maintained by a small group of core project developers. Second, we show through a qualitative study that even though many projects are aware that they need performance tests, developers appear to struggle implementing them. We argue that future performance testing frameworks should provider better support for low-friction testing, for instance via non-parameterized methods

or performance test generation, as well as focus on a tight integration with standard continuous integration tooling.},

keywords = {Empirical software engineering, Mining software repositories, Open source, Performance engineering, Performance testing},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

64.

Suhas Kabinna; Cor-Paul Bezemer; Weiyi Shang; Ahmed E. Hassan

Logging Library Migrations: A Case Study for the Apache Software Foundation Projects Inproceedings

International Conference on Mining Software Repositories (MSR), pp. 154–164, ACM, 2016.

Files:

Abstract | BibTeX | Tags:

65.

Tarek M. Ahmed; Cor-Paul Bezemer; Tse-Hsun Chen; Ahmed E. Hassan; Weiyi Shang

Studying the Effectiveness of Application Performance Management (APM) Tools for Detecting Performance Regressions for Web Applications: An Experience Report Inproceedings

International Conference on Mining Software Repositories (MSR), pp. 1–12, ACM, 2016.

Files:

Abstract | BibTeX | Tags:

@inproceedings{Ahmed16msr,

title = {Studying the Effectiveness of Application Performance Management (APM) Tools for Detecting Performance Regressions for Web Applications: An Experience Report},

author = {Tarek M. Ahmed and Cor-Paul Bezemer and Tse-Hsun Chen and Ahmed E. Hassan and Weiyi Shang},

year  = {2016},

date = {2016-05-14},

urldate = {2016-05-14},

booktitle = {International Conference on Mining Software Repositories (MSR)},

pages = {1--12},

publisher = {ACM},

abstract = {Performance regressions, such as a higher CPU utilization than in the previous version of an application, are caused by software application updates that negatively affect the performance of an application. Although a plethora of mining software repository research has been done to detect such regressions, research tools are generally not readily available to practitioners. Application Performance Management (APM) tools are commonly used in practice for detecting performance issues in the field by mining operational data.

In contrast to performance regression detection tools that assume a changing code base and a stable workload, APM tools mine operational data to detect performance anomalies caused by a changing workload in an otherwise stable code base. Although APM tools are widely used in practice, no research has been done to understand 1) whether APM tools can identify performance regressions caused by code changes and 2) how well these APM tools support diagnosing the root-cause of these regressions.

In this paper, we explore if the readily accessible APM tools can help practitioners detect performance regressions. We perform a case study using three commercial (AppDynamics, New Relic and Dynatrace) and one open source (Pinpoint) APM tools. In particular, we examine the effectiveness of leveraging these APM tools in detecting and diagnosing injected performance regressions (excessive memory usage, high CPU utilization and inefficient database queries) in three open source applications. We find that APM tools can detect most of the injected performance regressions, making them good candidates to detect performance regressions in practice. However, there is a gap between mining approaches that are proposed in state-of-the-art performance regression detection research and the ones used by APM tools. In addition, APM tools lack the ability to be extended, which makes it hard to enhance them when exploring novel mining approaches for detecting performance regressions.},

keywords = {},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Performance regressions, such as a higher CPU utilization than in the previous version of an application, are caused by software application updates that negatively affect the performance of an application. Although a plethora of mining software repository research has been done to detect such regressions, research tools are generally not readily available to practitioners. Application Performance Management (APM) tools are commonly used in practice for detecting performance issues in the field by mining operational data.
In contrast to performance regression detection tools that assume a changing code base and a stable workload, APM tools mine operational data to detect performance anomalies caused by a changing workload in an otherwise stable code base. Although APM tools are widely used in practice, no research has been done to understand 1) whether APM tools can identify performance regressions caused by code changes and 2) how well these APM tools support diagnosing the root-cause of these regressions.
In this paper, we explore if the readily accessible APM tools can help practitioners detect performance regressions. We perform a case study using three commercial (AppDynamics, New Relic and Dynatrace) and one open source (Pinpoint) APM tools. In particular, we examine the effectiveness of leveraging these APM tools in detecting and diagnosing injected performance regressions (excessive memory usage, high CPU utilization and inefficient database queries) in three open source applications. We find that APM tools can detect most of the injected performance regressions, making them good candidates to detect performance regressions in practice. However, there is a gap between mining approaches that are proposed in state-of-the-art performance regression detection research and the ones used by APM tools. In addition, APM tools lack the ability to be extended, which makes it hard to enhance them when exploring novel mining approaches for detecting performance regressions.

Close

66.

Suhas Kabinna; Cor-Paul Bezemer; Weiyi Shang; Ahmed E. Hassan

Examining the Stability of Logging Statements Inproceedings

IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 326-337, 2016.

Files:

Abstract | BibTeX | Tags: Log file stability, Log processing tools, Logging statements

@inproceedings{Kabinna16,

title = {Examining the Stability of Logging Statements},

author = {Suhas Kabinna and Cor-Paul Bezemer and Weiyi Shang and Ahmed E. Hassan},

year = {2016},

date = {2016-03-14},

urldate = {2016-03-14},

booktitle = {IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER)},

pages = {326-337},

abstract = {Logging statements (embedded in the source code) produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior work showcases the importance of logging statements in operating, understanding and improving software systems. The wide dependence on logs has lead to a new market of log processing and management tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing sudden failures of log processing tools and increasing the maintenance costs of such tools. We examine the stability of logging statements in four open source applications namely: Liferay, ActiveMQ, Camel and CloudStack. We find that 20â€“45% of their logging statements change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which their tools depend. In order to effectively mitigate the issues that are caused by unstable logging statements, we make an important first step towards determining whether a logging statement is likely to remain unchanged in the future. First, we use a random forest classifier to determine whether a just-introduced logging statement will change in the future, based solely on metrics that are calculated when it is introduced. Second, we examine whether a long-lived logging statement is likely to change based on its change history. We leverage Cox proportional hazards models (Cox models) to determine the change risk of long-lived logging statements in the source code. Through our case study on four open source applications, we show that our random forest classifier achieves a 83â€“91% precision, a 65â€“85% recall and a 0.95â€“0.96 AUC. We find that file ownership, developer experience, log density and SLOC are important metrics in our studied projects for determining the stability of logging statements in both our random forest classifiers and Cox models. Developers can use our approach to determine the risk of a logging statement changing in their own projects, to construct more robust log processing tools, by ensuring that these tools depend on logs that are generated by more stable logging statements.},

keywords = {Log file stability, Log processing tools, Logging statements},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

Logging statements (embedded in the source code) produce logs that assist in understanding system behavior, monitoring choke-points and debugging. Prior work showcases the importance of logging statements in operating, understanding and improving software systems. The wide dependence on logs has lead to a new market of log processing and management tools. However, logs are often unstable, i.e., the logging statements that generate logs are often changed without the consideration of other stakeholders, causing sudden failures of log processing tools and increasing the maintenance costs of such tools. We examine the stability of logging statements in four open source applications namely: Liferay, ActiveMQ, Camel and CloudStack. We find that 20â€“45% of their logging statements change throughout their lifetime. The median number of days between the introduction of a logging statement and the first change to that statement is between 1 and 17 in our studied applications. These numbers show that in order to reduce maintenance effort, developers of log processing tools must be careful when selecting the logging statements on which their tools depend. In order to effectively mitigate the issues that are caused by unstable logging statements, we make an important first step towards determining whether a logging statement is likely to remain unchanged in the future. First, we use a random forest classifier to determine whether a just-introduced logging statement will change in the future, based solely on metrics that are calculated when it is introduced. Second, we examine whether a long-lived logging statement is likely to change based on its change history. We leverage Cox proportional hazards models (Cox models) to determine the change risk of long-lived logging statements in the source code. Through our case study on four open source applications, we show that our random forest classifier achieves a 83â€“91% precision, a 65â€“85% recall and a 0.95â€“0.96 AUC. We find that file ownership, developer experience, log density and SLOC are important metrics in our studied projects for determining the stability of logging statements in both our random forest classifiers and Cox models. Developers can use our approach to determine the risk of a logging statement changing in their own projects, to construct more robust log processing tools, by ensuring that these tools depend on logs that are generated by more stable logging statements.

Close

67.

Ravjot Singh; Cor-Paul Bezemer; Weiyi Shang; Ahmed E. Hassan

Optimizing the Performance Configuration of Object-Relational mapping Frameworks Using a Multi-Objective Genetic Algorithm Inproceedings

ACM/SPEC International Conference on Performance Engineering (ICPE), pp. 309–320, 2016.

Files:

Abstract | BibTeX | Tags: Object-relational mapping performance, Performance configuration optimization

@inproceedings{Singh16,

title = {Optimizing the Performance Configuration of Object-Relational mapping Frameworks Using a Multi-Objective Genetic Algorithm},

author = {Ravjot Singh and Cor-Paul Bezemer and Weiyi Shang and Ahmed E. Hassan},

year  = {2016},

date = {2016-03-12},

urldate = {2016-03-12},

booktitle = {ACM/SPEC International Conference on Performance Engineering (ICPE)},

pages = {309--320},

abstract = {Object-relational mapping (ORM) frameworks map low-level database operations onto a high-level programming API that can be accessed from within object-oriented source code. ORM frameworks often provide configuration options to optimize the performance of such database operations. However, determining the set of optimal configuration options is a challenging task.

Through an exploratory study on two open source applications (Spring PetClinic and ZK), we find that the difference in execution time between two configurations can be large. In addition, both applications are not shipped with an ORM configuration that is related to performance: instead, they use the default values provided by the ORM framework. We show that in 89% of the 9 analyzed test cases for PetClinic and in 96% of the 54 analyzed test cases for ZK, the default configuration values supplied by the ORM framework performed significantly slower than the optimal configuration for that test case. Based on these observations, this paper proposes an approach for automatically finding an optimal ORM configuration using a multi-objective genetic algorithm. We evaluate our approach by conducting a case study of Spring PetClinic and ZK. We find that our approach finds near-optimal configurations in 360-450 seconds for PetClinic and in 9-12 hours for ZK. These execution times allow our approach to be executed to find an optimal configuration before each new release of an application.},

keywords = {Object-relational mapping performance, Performance configuration optimization},

pubstate = {published},

tppubtype = {inproceedings}

}

Close

68.

Cor-Paul Bezemer; Johan Pouwelse; Brendan Gregg

Understanding Software Performance Regressions Using Differential Flame Graphs Inproceedings

IEEE International Conference on Software Analysis, Evolution, and Reengineering (SANER), pp. 535–539, 2015.

Files:

Abstract | BibTeX | Tags:

69.

Jaap Kabbedijk; Cor-Paul Bezemer; Andy Zaidman; Slinger Jansen

Defining Multi-Tenancy: A Structured Mapping Study on the Academic and Industrial Perspective Journal Article

Journal of Systems and Software (JSS), 100 , pp. 139-148, 2015.

Files:

Abstract | BibTeX | Tags: Academic perspective, Definition, Industrial perspective, Multi-tenancy, Systematic mapping study

70.

Cor-Paul Bezemer; Elric Milon; Andy Zaidman; Johan Pouwelse

Detecting and Analyzing I/O Performance Regressions Journal Article

Journal of Software: Evolution and Process (JSEP), 26 (12), pp. 1193–1212, 2014.

Files:

Abstract | BibTeX | Tags: Performance analysis, Performance optimization, Performance regressions

71.

Cor-Paul Bezemer

Performance Optimization of Multi-Tenant Software Applications PhD Thesis

Delft University of Technology, 2014.

Files:

BibTeX | Tags:

72.

Cor-Paul Bezemer; Andy Zaidman

Performance Optimization of Deployed Software-as-a-service Applications Journal Article

Journal of Systems and Software (JSS), 87 , pp. 87-103, 2014.

Files:

Abstract | BibTeX | Tags: Performance analysis, Performance maintenance

73.

Riccardo Petrocco; Cor-Paul Bezemer; Johan Pouwelse; Dick Epema

Libswift: the PPSPP Reference Implementation Technical Report

Delft Univ. of Technology (PDS-2014-004), 2014.

Files:

BibTeX | Tags:

74.

Cor-Paul Bezemer; Andy Zaidman

Improving the Diagnostic Capabilities of a Performance Optimization Approach Technical Report

Delft Univ. of Technology (TUD-SERG-2013-015), 2013.

Files:

Abstract | BibTeX | Tags:

75.

Cor-Paul Bezemer; Andy Zaidman; Ad van Hoeven; Andre de Graaf; Maarten Wiertz; Remko Weijers

Locating Performance Improvement Opportunities in an Industrial Software-as-a-Service Application Inproceedings

International Conference on Software Maintenance (ICSM), pp. 547-556, 2012.

Files:

Abstract | BibTeX | Tags:

76.

Cor-Paul Bezemer; Andy Zaidman

Server Overload Detection and Prediction Using Pattern Classification Inproceedings

International Conference on Autonomous Computing (ICAC), pp. 163-164, 2011.

Files:

BibTeX | Tags: Performance

77.

Cor-Paul Bezemer; Andy Zaidman

Multi-tenant SaaS applications: maintenance dream or nightmare? Inproceedings

Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), pp. 88–92, ACM, 2010, ISBN: 978-1-4503-0128-2.

Files:

Abstract | BibTeX | Tags:

78.

Cor-Paul Bezemer; Andy Zaidman; Bart Platzbeecker; Toine Hurkmans; Aad 't Hart

Enabling Multi-tenancy: An Industrial Experience Report Inproceedings

International Conference on Software Maintenance (ICSM), pp. 1-8, 2010.

Files:

Abstract | BibTeX | Tags:

79.

Cor-Paul Bezemer; Ali Mesbah; Arie van Deursen

Automated Security Testing of Web Widget Interactions Inproceedings

European Software Engineering Conference/ACM SIGSOFT International Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 81-90, 2009.

Files:

Abstract | BibTeX | Tags: Security testing, Web applications

80.

Cor-Paul Bezemer

Automated Security Testing of AJAX Web Widgets Masters Thesis

Delft University of Technology, 2009.

Files:

Abstract | BibTeX | Tags: