Publications – Analytics of Software, GAmes And Repository Data (ASGAARD) Lab

( = Paper PDF, = Presentation slides, = Presentation video)

Show all

Md. Ahasanuzzaman; Safwat Hassan; Cor-Paul Bezemer; Ahmed E. Hassan

A Longitudinal Study of Popular Ad Libraries in the Google Play Store Journal Article

Empirical Software Engineering Journal (EMSE), 2019.

Files:

Abstract | BibTeX | Tags: Ad library, Android mobile apps, Google Play Store, Longitudinal study, Software engineering

@article{Ahsan2019adlibraries,

title = {A Longitudinal Study of Popular Ad Libraries in the Google Play Store},

author = {Md. Ahasanuzzaman and Safwat Hassan and Cor-Paul Bezemer and Ahmed E. Hassan},

year = {2019},

date = {2019-08-08},

urldate = {2019-08-08},

journal = {Empirical Software Engineering Journal (EMSE)},

abstract = {In-app advertisements have become an integral part of the revenue

model of mobile apps. To gain ad revenue, app developers integrate ad libraries into their apps. Such libraries are integrated to serve advertisements (ads) to users; developers then gain revenue based on the displayed ads and the usersâ€™ interactions with such ads. As a result, ad libraries have become an essential part of the mobile app ecosystem. However, little is known about how such ad libraries have evolved over time.

In this paper, we study the evolution of the 8 most popular ad libraries

(e.g., Google AdMob and Facebook Audience Network) over a period of 33 months (from April 2016 until December 2018). In particular, we look at their evolution in terms of size, the main drivers for releasing a new version, and their architecture. To identify popular ad libraries, we collect 35,462 updates of 1,840 top free-to-download apps in the Google Play Store. Then, we identify 63 ad libraries that are integrated into the studied popular apps. We observe that an ad library represents 10% of the binary size of mobile apps, and that the proportion of the ad library size compared to the app size has grown by 10% over our study period. By taking a closer look at the 8 most popular ad libraries, we find that ad libraries are continuously evolving with a median release interval of 34 days. In addition, we observe that some libraries have grown exponentially in size (e.g, Facebook Audience Network), while other libraries have attempted to reduce their size as they evolved. The libraries that reduced their size have done so through: (1) creating a lighter version of the ad library, (2) removing parts of the ad library, and (3) redesigning their architecture into a more modular one.

To identify the main drivers for releasing a new version, we manually analyze the release notes of the eight studied ad libraries. We observe that fixing issues that are related to displaying video ads is the main driver for releasing new versions. We also observe that ad library developers are constantly updating their libraries to support a wider range of Android platforms (i.e., to ensure that more devices can use the libraries without errors). Finally, we derive a reference architecture from the studied eight ad libraries, and we study how these libraries deviated from this architecture in the study period.

Our study is important for ad library developers as it provides the first in-depth look into how the important mobile app market segment of ad libraries has evolved. Our findings and the reference architecture are valuable for ad library developers who wish to learn about how other developers built and evolved their successful ad libraries. For example, our reference architecture provides a new ad library developer with a foundation for understanding the interactions between the most important components of an ad library.},

keywords = {Ad library, Android mobile apps, Google Play Store, Longitudinal study, Software engineering},

pubstate = {published},

tppubtype = {article}

}

In-app advertisements have become an integral part of the revenue
model of mobile apps. To gain ad revenue, app developers integrate ad libraries into their apps. Such libraries are integrated to serve advertisements (ads) to users; developers then gain revenue based on the displayed ads and the usersâ€™ interactions with such ads. As a result, ad libraries have become an essential part of the mobile app ecosystem. However, little is known about how such ad libraries have evolved over time.
In this paper, we study the evolution of the 8 most popular ad libraries
(e.g., Google AdMob and Facebook Audience Network) over a period of 33 months (from April 2016 until December 2018). In particular, we look at their evolution in terms of size, the main drivers for releasing a new version, and their architecture. To identify popular ad libraries, we collect 35,462 updates of 1,840 top free-to-download apps in the Google Play Store. Then, we identify 63 ad libraries that are integrated into the studied popular apps. We observe that an ad library represents 10% of the binary size of mobile apps, and that the proportion of the ad library size compared to the app size has grown by 10% over our study period. By taking a closer look at the 8 most popular ad libraries, we find that ad libraries are continuously evolving with a median release interval of 34 days. In addition, we observe that some libraries have grown exponentially in size (e.g, Facebook Audience Network), while other libraries have attempted to reduce their size as they evolved. The libraries that reduced their size have done so through: (1) creating a lighter version of the ad library, (2) removing parts of the ad library, and (3) redesigning their architecture into a more modular one.
To identify the main drivers for releasing a new version, we manually analyze the release notes of the eight studied ad libraries. We observe that fixing issues that are related to displaying video ads is the main driver for releasing new versions. We also observe that ad library developers are constantly updating their libraries to support a wider range of Android platforms (i.e., to ensure that more devices can use the libraries without errors). Finally, we derive a reference architecture from the studied eight ad libraries, and we study how these libraries deviated from this architecture in the study period.
Our study is important for ad library developers as it provides the first in-depth look into how the important mobile app market segment of ad libraries has evolved. Our findings and the reference architecture are valuable for ad library developers who wish to learn about how other developers built and evolved their successful ad libraries. For example, our reference architecture provides a new ad library developer with a foundation for understanding the interactions between the most important components of an ad library.

Mohamed Sami Rakha; Cor-Paul Bezemer; Ahmed E. Hassan

Revisiting the Performance Evaluation of Automated Approaches for the Identification of Duplicate Issue Reports Journal Article

The Transactions of Software Engineering (TSE) journal, 44 (12), pp. 1245–1268, 2017.

Files:

Abstract | BibTeX | Tags: Performance evaluation, Software engineering, Text analysis

@article{sami16tse,

title = {Revisiting the Performance Evaluation of Automated Approaches for the Identification of Duplicate Issue Reports},

author = {Mohamed Sami Rakha and Cor-Paul Bezemer and Ahmed E. Hassan},

year  = {2017},

date = {2017-09-21},

urldate = {2017-09-21},

journal = {The Transactions of Software Engineering (TSE) journal},

volume = {44},

number = {12},

pages = {1245--1268},

publisher = {IEEE},

abstract = {Issue tracking systems (ITSs), such as Bugzilla, are commonly used to track reported bugs, improvements and change requests for a software project. To avoid wasting developer resources on previously-reported (i.e., duplicate) issues, it is necessary to identify such duplicates as soon as they are reported. Several automated approaches have been proposed for retrieving duplicate reports, i.e., identifying the duplicate of a new issue report in a list of n candidates. These approaches rely on leveraging the textual, categorical, and contextual information in previously-reported issues to decide whether a newly-reported issue has previously been reported. In general, these approaches are evaluated using data that spans a relatively short period of time (i.e., the classical evaluation). However, in this paper, we show that the classical evaluation tends to overestimate the performance of automated approaches for retrieving duplicate issue reports. Instead, we propose a realistic evaluation using all the reports that are available in the ITS of a software project. We conduct experiments in which we evaluate two popular approaches for retrieving duplicate issues (BM25F and REP) using the classical and realistic evaluations. We find that for the issue tracking data of the Mozilla foundation, the Eclipse foundation and OpenOffice, the realistic evaluation shows that previously proposed approaches perform considerably lower than previously reported using the classical evaluation. As a result, we conclude that the reported performance of approaches for retrieving duplicate issue reports is significantly overestimated in literature. In order to improve the performance of the automated retrieval of duplicate issue reports, we propose to leverage the resolution field of issue reports. Our experiments show that a relative improvement in the performance of a median of 7-21.5% and a maximum of 19-60% can be achieved by leveraging the resolution field of issue reports for the automated retrieval of duplicates.},

keywords = {Performance evaluation, Software engineering, Text analysis},

pubstate = {published},

tppubtype = {article}

}

Issue tracking systems (ITSs), such as Bugzilla, are commonly used to track reported bugs, improvements and change requests for a software project. To avoid wasting developer resources on previously-reported (i.e., duplicate) issues, it is necessary to identify such duplicates as soon as they are reported. Several automated approaches have been proposed for retrieving duplicate reports, i.e., identifying the duplicate of a new issue report in a list of n candidates. These approaches rely on leveraging the textual, categorical, and contextual information in previously-reported issues to decide whether a newly-reported issue has previously been reported. In general, these approaches are evaluated using data that spans a relatively short period of time (i.e., the classical evaluation). However, in this paper, we show that the classical evaluation tends to overestimate the performance of automated approaches for retrieving duplicate issue reports. Instead, we propose a realistic evaluation using all the reports that are available in the ITS of a software project. We conduct experiments in which we evaluate two popular approaches for retrieving duplicate issues (BM25F and REP) using the classical and realistic evaluations. We find that for the issue tracking data of the Mozilla foundation, the Eclipse foundation and OpenOffice, the realistic evaluation shows that previously proposed approaches perform considerably lower than previously reported using the classical evaluation. As a result, we conclude that the reported performance of approaches for retrieving duplicate issue reports is significantly overestimated in literature. In order to improve the performance of the automated retrieval of duplicate issue reports, we propose to leverage the resolution field of issue reports. Our experiments show that a relative improvement in the performance of a median of 7-21.5% and a maximum of 19-60% can be achieved by leveraging the resolution field of issue reports for the automated retrieval of duplicates.