( = Paper PDF,
= Presentation slides,
= Presentation video)
Md Saeed Siddik; Cor-Paul Bezemer
Do Code Quality and Style Issues Differ Across (Non-)Machine Learning Notebooks? Yes! Inproceedings
23nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM), pp. 1–12, IEEE, 2023.
Abstract | BibTeX | Tags: Computational notebooks, Empirical software engineering, Mining software repositories
@inproceedings{SiddikSCAM2023,
title = {Do Code Quality and Style Issues Differ Across (Non-)Machine Learning Notebooks? Yes!},
author = {Md Saeed Siddik and Cor-Paul Bezemer},
year = {2023},
date = {2023-10-03},
urldate = {2023-10-03},
booktitle = {23nd IEEE International Working Conference on Source Code Analysis and Manipulation (SCAM)},
pages = {1--12},
publisher = {IEEE},
abstract = {The popularity of computational notebooks is
rapidly increasing because of their interactive code-output vi-
sualization and on-demand non-sequential code block execution.
These notebook features have made notebooks especially popular
with machine learning developers and data scientists. However,
as prior work shows, notebooks generally contain low quality
code. In this paper, we investigate whether the low quality code
is inherent to the programming style in notebooks, or whether
it is correlated with the use of machine learning techniques.
We present a large-scale empirical analysis of 246,599 open-
source notebooks to explore how machine learning code quality
in Jupyter Notebooks differs from non-machine learning code,
thereby focusing on code style issues. We explored code style
issues across the Error, Convention, Warning, and Refactoring
categories. We found that machine learning notebooks are of
lower quality regarding PEP-8 code standards than non-machine
learning notebooks, and their code quality distributions signifi-
cantly differ with a small effect size. We identified several code
style issues with large differences in occurrences between machine
learning and non-machine learning notebooks. For example,
package and import-related issues are more prevalent in machine
learning notebooks. Our study shows that code quality and code
style issues differ significantly across machine learning and non-
machine learning notebooks.},
keywords = {Computational notebooks, Empirical software engineering, Mining software repositories},
pubstate = {published},
tppubtype = {inproceedings}
}
rapidly increasing because of their interactive code-output vi-
sualization and on-demand non-sequential code block execution.
These notebook features have made notebooks especially popular
with machine learning developers and data scientists. However,
as prior work shows, notebooks generally contain low quality
code. In this paper, we investigate whether the low quality code
is inherent to the programming style in notebooks, or whether
it is correlated with the use of machine learning techniques.
We present a large-scale empirical analysis of 246,599 open-
source notebooks to explore how machine learning code quality
in Jupyter Notebooks differs from non-machine learning code,
thereby focusing on code style issues. We explored code style
issues across the Error, Convention, Warning, and Refactoring
categories. We found that machine learning notebooks are of
lower quality regarding PEP-8 code standards than non-machine
learning notebooks, and their code quality distributions signifi-
cantly differ with a small effect size. We identified several code
style issues with large differences in occurrences between machine
learning and non-machine learning notebooks. For example,
package and import-related issues are more prevalent in machine
learning notebooks. Our study shows that code quality and code
style issues differ significantly across machine learning and non-
machine learning notebooks.
Quang N. Vu; Cor-Paul Bezemer
An Empirical Study of the Characteristics of Popular Game Jams and Their High-ranking Submissions on itch.io Inproceedings
International Conference on the Foundations of Digital Games (FDG), pp. 1–12, 2020.
Abstract | BibTeX | Tags: Empirical software engineering, Game development, Game jams, itch.io, Mining software repositories
@inproceedings{Quang20,
title = {An Empirical Study of the Characteristics of Popular Game Jams and Their High-ranking Submissions on itch.io},
author = {Quang N. Vu and Cor-Paul Bezemer},
year = {2020},
date = {2020-04-14},
urldate = {2020-04-14},
booktitle = {International Conference on the Foundations of Digital Games (FDG)},
pages = {1--12},
abstract = {Game jams are hackathon-like events that allow participants to develop a playable game prototype within a time limit. They foster creativity and the exchange of ideas by letting developers with different skill sets collaborate. Having a high-ranking game is a great bonus to a beginning game developer’s résumé and their pursuit of a career in the game industry. However, participants often face time constraints set by jam hosts while balancing what aspects of their games should be emphasized to have the highest chance of winning. Similarly, hosts need to understand what to emphasize when organizing online jams so that their jams are more popular, in terms of submission rate. In this paper, we study 1,290 past game jams and their 3,752 submissions on itch.io to understand better what makes popular jams and high-ranking games perceived well by the audience. We find that a quality description has a positive contribution to both a jam’s popularity and a game’s ranking. Additionally, more manpower organizing a jam or developing a game increases a jam’s popularity and a game’s high-ranking likelihood. Highranking games tend to support Windows or macOS, and belong to the “Puzzleâ€, “Platformerâ€, “Interactive Fictionâ€, or “Action†genres. Also, shorter competitive jams tend to be more popular. Based on our findings, we suggest jam hosts and participants improve the description of their products and consider co-organizing or co-participating in a jam. Furthermore, jam participants should develop multi-platform multi-genre games. Finally, jam hosts should introduce a tighter time limit to increase their jam’s popularity.},
keywords = {Empirical software engineering, Game development, Game jams, itch.io, Mining software repositories},
pubstate = {published},
tppubtype = {inproceedings}
}
Philipp Leitner; Cor-Paul Bezemer
An Exploratory Study of the State of Practice of Performance Testing in Java-based Open Source Projects Inproceedings
The International Conference on Performance Engineering (ICPE), pp. 373–384, ACM/SPEC, 2017.
Abstract | BibTeX | Tags: Empirical software engineering, Mining software repositories, Open source, Performance engineering, Performance testing
@inproceedings{leitner16oss,
title = {An Exploratory Study of the State of Practice of Performance Testing in Java-based Open Source Projects},
author = {Philipp Leitner and Cor-Paul Bezemer},
year = {2017},
date = {2017-04-22},
urldate = {2017-04-22},
booktitle = {The International Conference on Performance Engineering (ICPE)},
pages = {373--384},
publisher = {ACM/SPEC},
abstract = {The usage of open source (OS) software is nowadays widespread across many industries and domains. While the functional quality of OS projects is considered to be up to par with that of closed-source software, much is unknown about the quality in terms of non-functional attributes, such as
performance. One challenge for OS developers is that, unlike for functional testing, there is a lack of accepted best practices for performance testing.
To reveal the state of practice of performance testing in OS projects, we conduct an exploratory study on 111 Java-based OS projects from GitHub. We study the performance tests of these projects from five perspectives: (1) the developers, (2) size, (3) organization and (4) types of performance tests
and (5) the tooling used for performance testing.
First, in a quantitative study we show that writing performance tests is not a popular task in OS projects: performance tests form only a small portion of the test suite, are rarely updated, and are usually maintained by a small group of core project developers. Second, we show through a qualitative study that even though many projects are aware that they need performance tests, developers appear to struggle implementing them. We argue that future performance testing frameworks should provider better support for low-friction testing, for instance via non-parameterized methods
or performance test generation, as well as focus on a tight integration with standard continuous integration tooling.},
keywords = {Empirical software engineering, Mining software repositories, Open source, Performance engineering, Performance testing},
pubstate = {published},
tppubtype = {inproceedings}
}
performance. One challenge for OS developers is that, unlike for functional testing, there is a lack of accepted best practices for performance testing.
To reveal the state of practice of performance testing in OS projects, we conduct an exploratory study on 111 Java-based OS projects from GitHub. We study the performance tests of these projects from five perspectives: (1) the developers, (2) size, (3) organization and (4) types of performance tests
and (5) the tooling used for performance testing.
First, in a quantitative study we show that writing performance tests is not a popular task in OS projects: performance tests form only a small portion of the test suite, are rarely updated, and are usually maintained by a small group of core project developers. Second, we show through a qualitative study that even though many projects are aware that they need performance tests, developers appear to struggle implementing them. We argue that future performance testing frameworks should provider better support for low-friction testing, for instance via non-parameterized methods
or performance test generation, as well as focus on a tight integration with standard continuous integration tooling.