( = Paper PDF,
= Presentation slides,
= Presentation video)
1.
Hao Li; Cor-Paul Bezemer
Bridging the language gap: an empirical study of bindings for open source machine learning libraries across software package ecosystems Journal Article
Empirical Software Engineering, 30 (6), 2024.
Abstract | BibTeX | Tags: Library bindings, Machine learning, SE4AI, SE4ML
@article{li_MLbindings,
title = {Bridging the language gap: an empirical study of bindings for open source machine learning libraries across software package ecosystems},
author = {Hao Li and Cor-Paul Bezemer},
year = {2024},
date = {2024-10-18},
urldate = {2024-10-18},
journal = {Empirical Software Engineering},
volume = {30},
number = {6},
abstract = {Open source machine learning (ML) libraries enable developers to
integrate advanced ML functionality into their own applications. However,
popular ML libraries, such as TensorFlow, are not available natively in all
programming languages and software package ecosystems. Hence, developers
who wish to use an ML library which is not available in their programming lan-
guage or ecosystem of choice, may need to resort to using a so-called binding
library (or binding). Bindings provide support across programming languages
and package ecosystems for reusing a host library. For example, the Keras
.NET binding provides support for the Keras library in the NuGet (.NET)
ecosystem even though the Keras library was written in Python. In this pa-
per, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13
software package ecosystems by using an approach called BindFind, which can
automatically identify bindings and link them to their host libraries. Further-
more, we conduct an in-depth study of 133 cross-ecosystem bindings and their
development for 40 popular open source ML libraries. Our findings reveal that
the majority of ML library bindings are maintained by the community, with
npm being the most popular ecosystem for these bindings. Our study also
indicates that most bindings cover only a limited range of the host library’s
releases, often experience considerable delays in supporting new releases, and
have widespread technical lag. Our findings highlight key factors to consider
for developers integrating bindings for ML libraries and open avenues for re-
searchers to further investigate bindings in software package ecosystems.},
keywords = {Library bindings, Machine learning, SE4AI, SE4ML},
pubstate = {published},
tppubtype = {article}
}
Open source machine learning (ML) libraries enable developers to
integrate advanced ML functionality into their own applications. However,
popular ML libraries, such as TensorFlow, are not available natively in all
programming languages and software package ecosystems. Hence, developers
who wish to use an ML library which is not available in their programming lan-
guage or ecosystem of choice, may need to resort to using a so-called binding
library (or binding). Bindings provide support across programming languages
and package ecosystems for reusing a host library. For example, the Keras
.NET binding provides support for the Keras library in the NuGet (.NET)
ecosystem even though the Keras library was written in Python. In this pa-
per, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13
software package ecosystems by using an approach called BindFind, which can
automatically identify bindings and link them to their host libraries. Further-
more, we conduct an in-depth study of 133 cross-ecosystem bindings and their
development for 40 popular open source ML libraries. Our findings reveal that
the majority of ML library bindings are maintained by the community, with
npm being the most popular ecosystem for these bindings. Our study also
indicates that most bindings cover only a limited range of the host library’s
releases, often experience considerable delays in supporting new releases, and
have widespread technical lag. Our findings highlight key factors to consider
for developers integrating bindings for ML libraries and open avenues for re-
searchers to further investigate bindings in software package ecosystems.
integrate advanced ML functionality into their own applications. However,
popular ML libraries, such as TensorFlow, are not available natively in all
programming languages and software package ecosystems. Hence, developers
who wish to use an ML library which is not available in their programming lan-
guage or ecosystem of choice, may need to resort to using a so-called binding
library (or binding). Bindings provide support across programming languages
and package ecosystems for reusing a host library. For example, the Keras
.NET binding provides support for the Keras library in the NuGet (.NET)
ecosystem even though the Keras library was written in Python. In this pa-
per, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13
software package ecosystems by using an approach called BindFind, which can
automatically identify bindings and link them to their host libraries. Further-
more, we conduct an in-depth study of 133 cross-ecosystem bindings and their
development for 40 popular open source ML libraries. Our findings reveal that
the majority of ML library bindings are maintained by the community, with
npm being the most popular ecosystem for these bindings. Our study also
indicates that most bindings cover only a limited range of the host library’s
releases, often experience considerable delays in supporting new releases, and
have widespread technical lag. Our findings highlight key factors to consider
for developers integrating bindings for ML libraries and open avenues for re-
searchers to further investigate bindings in software package ecosystems.
2.
Hao Li; Gopi Krishnan Rajbahadur; Cor-Paul Bezemer
Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality Journal Article
ACM Transactions on Software Engineering and Methodology, 2024.
Abstract | BibTeX | Tags: Library bindings, Machine learning, SE4AI, SE4ML, Software quality
@article{Li_BindingsQuality,
title = {Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality},
author = {Hao Li and Gopi Krishnan Rajbahadur and Cor-Paul Bezemer},
year = {2024},
date = {2024-07-07},
journal = {ACM Transactions on Software Engineering and Methodology},
abstract = {Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework’s functionality using a programming language different from the framework’s default language (usually Python). In this paper, we study the impact of using TensorFlow and PyTorch bindings in C#, Rust, Python and JavaScript on the software quality in terms of correctness (training and test accuracy) and time cost (training and inference time) when training and performing inference on five widely used deep learning models. Our experiments show that a model can be trained in one binding and used for inference in another binding for the same framework without losing accuracy. Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.},
keywords = {Library bindings, Machine learning, SE4AI, SE4ML, Software quality},
pubstate = {published},
tppubtype = {article}
}
Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework’s functionality using a programming language different from the framework’s default language (usually Python). In this paper, we study the impact of using TensorFlow and PyTorch bindings in C#, Rust, Python and JavaScript on the software quality in terms of correctness (training and test accuracy) and time cost (training and inference time) when training and performing inference on five widely used deep learning models. Our experiments show that a model can be trained in one binding and used for inference in another binding for the same framework without losing accuracy. Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.