( = Paper PDF,
= Presentation slides,
= Presentation video)
Hao Li; Cor-Paul Bezemer; Ahmed E. Hassan
Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models Inproceedings
International Conference on Software Engineering - Software Engineering in Practice (ICSE - SEIP) Track, 2025.
Abstract | BibTeX | Tags: FM4SE, Foundation models, SE4AI, SE4FM, SE4ML
@inproceedings{Li_SEFM_blogs,
title = {Software Engineering and Foundation Models: Insights from Industry Blogs Using a Jury of Foundation Models},
author = {Hao Li and Cor-Paul Bezemer and Ahmed E. Hassan},
year = {2025},
date = {2025-04-27},
booktitle = {International Conference on Software Engineering - Software Engineering in Practice (ICSE - SEIP) Track},
abstract = {Foundation models (FMs) such as large language
models (LLMs) have significantly impacted many fields, including
software engineering (SE). The interaction between SE and FMs
has led to the integration of FMs into SE practices (FM4SE)
and the application of SE methodologies to FMs (SE4FM). While
several literature surveys exist on academic contributions to these
trends, we are the first to provide a practitioner’s view. We
analyze 155 FM4SE and 997 SE4FM blog posts from leading
technology companies, leveraging an FM-powered surveying
approach to systematically label and summarize the discussed
activities and tasks. We observed that while code generation is the
most prominent FM4SE task, FMs are leveraged for many other
SE activities such as code understanding, summarization, and
API recommendation. The majority of blog posts on SE4FM are
about model deployment & operation, and system architecture
& orchestration. Although the emphasis is on cloud deployments,
there is a growing interest in compressing FMs and deploying
them on smaller devices such as edge or mobile devices. We
outline eight future research directions inspired by our gained
insights, aiming to bridge the gap between academic findings
and real-world applications. Our study not only enriches the
body of knowledge on practical applications of FM4SE and
SE4FM but also demonstrates the utility of FMs as a powerful
and efficient approach in conducting literature surveys within
technical and grey literature domains. Our dataset, results, code
and used prompts can be found in our online replication package
at https://zenodo.org/records/14563992.},
keywords = {FM4SE, Foundation models, SE4AI, SE4FM, SE4ML},
pubstate = {published},
tppubtype = {inproceedings}
}
models (LLMs) have significantly impacted many fields, including
software engineering (SE). The interaction between SE and FMs
has led to the integration of FMs into SE practices (FM4SE)
and the application of SE methodologies to FMs (SE4FM). While
several literature surveys exist on academic contributions to these
trends, we are the first to provide a practitioner’s view. We
analyze 155 FM4SE and 997 SE4FM blog posts from leading
technology companies, leveraging an FM-powered surveying
approach to systematically label and summarize the discussed
activities and tasks. We observed that while code generation is the
most prominent FM4SE task, FMs are leveraged for many other
SE activities such as code understanding, summarization, and
API recommendation. The majority of blog posts on SE4FM are
about model deployment & operation, and system architecture
& orchestration. Although the emphasis is on cloud deployments,
there is a growing interest in compressing FMs and deploying
them on smaller devices such as edge or mobile devices. We
outline eight future research directions inspired by our gained
insights, aiming to bridge the gap between academic findings
and real-world applications. Our study not only enriches the
body of knowledge on practical applications of FM4SE and
SE4FM but also demonstrates the utility of FMs as a powerful
and efficient approach in conducting literature surveys within
technical and grey literature domains. Our dataset, results, code
and used prompts can be found in our online replication package
at https://zenodo.org/records/14563992.
Tajkia Rahman Toma; Balreet Grewal; Cor-Paul Bezemer
Answering User Questions about Machine Learning Models through Standardized Model Cards Inproceedings
International Conference on Software Engineering (ICSE), 2025.
Abstract | BibTeX | Tags: Hugging Face, Q&A communities, Q&A websites, SE4AI, SE4FM, SE4ML
@inproceedings{Toma_UserQuestions,
title = {Answering User Questions about Machine Learning Models through Standardized Model Cards},
author = {Tajkia Rahman Toma and Balreet Grewal and Cor-Paul Bezemer },
year = {2025},
date = {2025-04-27},
booktitle = {International Conference on Software Engineering (ICSE)},
abstract = {Reusing pre-trained machine learning models is
becoming very popular due to model hubs such as Hugging Face
(HF). However, similar to when reusing software, many issues
may arise when reusing an ML model. In many cases, users
resort to asking questions on discussion forums such as the HF
community forum. In this paper, we study how we can reduce the
community’s workload in answering these questions and increase
the likelihood that questions receive a quick answer. We analyze
11,278 discussions from the HF model community that contain
user questions about ML models. We focus on the effort spent
handling questions, the high-level topics of discussions, and the
potential for standardizing responses in model cards based on
a model card template. Our findings indicate that there is not
much effort involved in responding to user questions, however,
40.1% of the questions remain open without any response. A
topic analysis shows that discussions are more centered around
technical details on model development and troubleshooting,
indicating that more input from model providers is required. We
show that 42.5% of the questions could have been answered if the
model provider followed a standard model card template for the
model card. Based on our analysis, we recommend that model
providers add more development-related details on the model’s
architecture, algorithm, data preprocessing and training code in
existing documentation (sub)sections and add new (sub)sections
to the template to address common questions about model usage
and hardware requirements.},
keywords = {Hugging Face, Q&A communities, Q&A websites, SE4AI, SE4FM, SE4ML},
pubstate = {published},
tppubtype = {inproceedings}
}
becoming very popular due to model hubs such as Hugging Face
(HF). However, similar to when reusing software, many issues
may arise when reusing an ML model. In many cases, users
resort to asking questions on discussion forums such as the HF
community forum. In this paper, we study how we can reduce the
community’s workload in answering these questions and increase
the likelihood that questions receive a quick answer. We analyze
11,278 discussions from the HF model community that contain
user questions about ML models. We focus on the effort spent
handling questions, the high-level topics of discussions, and the
potential for standardizing responses in model cards based on
a model card template. Our findings indicate that there is not
much effort involved in responding to user questions, however,
40.1% of the questions remain open without any response. A
topic analysis shows that discussions are more centered around
technical details on model development and troubleshooting,
indicating that more input from model providers is required. We
show that 42.5% of the questions could have been answered if the
model provider followed a standard model card template for the
model card. Based on our analysis, we recommend that model
providers add more development-related details on the model’s
architecture, algorithm, data preprocessing and training code in
existing documentation (sub)sections and add new (sub)sections
to the template to address common questions about model usage
and hardware requirements.
Hao Li; Cor-Paul Bezemer
Bridging the language gap: an empirical study of bindings for open source machine learning libraries across software package ecosystems Journal Article
Empirical Software Engineering, 30 (6), 2024.
Abstract | BibTeX | Tags: Library bindings, Machine learning, SE4AI, SE4ML
@article{li_MLbindings,
title = {Bridging the language gap: an empirical study of bindings for open source machine learning libraries across software package ecosystems},
author = {Hao Li and Cor-Paul Bezemer},
year = {2024},
date = {2024-10-18},
urldate = {2024-10-18},
journal = {Empirical Software Engineering},
volume = {30},
number = {6},
abstract = {Open source machine learning (ML) libraries enable developers to
integrate advanced ML functionality into their own applications. However,
popular ML libraries, such as TensorFlow, are not available natively in all
programming languages and software package ecosystems. Hence, developers
who wish to use an ML library which is not available in their programming lan-
guage or ecosystem of choice, may need to resort to using a so-called binding
library (or binding). Bindings provide support across programming languages
and package ecosystems for reusing a host library. For example, the Keras
.NET binding provides support for the Keras library in the NuGet (.NET)
ecosystem even though the Keras library was written in Python. In this pa-
per, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13
software package ecosystems by using an approach called BindFind, which can
automatically identify bindings and link them to their host libraries. Further-
more, we conduct an in-depth study of 133 cross-ecosystem bindings and their
development for 40 popular open source ML libraries. Our findings reveal that
the majority of ML library bindings are maintained by the community, with
npm being the most popular ecosystem for these bindings. Our study also
indicates that most bindings cover only a limited range of the host library’s
releases, often experience considerable delays in supporting new releases, and
have widespread technical lag. Our findings highlight key factors to consider
for developers integrating bindings for ML libraries and open avenues for re-
searchers to further investigate bindings in software package ecosystems.},
keywords = {Library bindings, Machine learning, SE4AI, SE4ML},
pubstate = {published},
tppubtype = {article}
}
integrate advanced ML functionality into their own applications. However,
popular ML libraries, such as TensorFlow, are not available natively in all
programming languages and software package ecosystems. Hence, developers
who wish to use an ML library which is not available in their programming lan-
guage or ecosystem of choice, may need to resort to using a so-called binding
library (or binding). Bindings provide support across programming languages
and package ecosystems for reusing a host library. For example, the Keras
.NET binding provides support for the Keras library in the NuGet (.NET)
ecosystem even though the Keras library was written in Python. In this pa-
per, we collect 2,436 cross-ecosystem bindings for 546 ML libraries across 13
software package ecosystems by using an approach called BindFind, which can
automatically identify bindings and link them to their host libraries. Further-
more, we conduct an in-depth study of 133 cross-ecosystem bindings and their
development for 40 popular open source ML libraries. Our findings reveal that
the majority of ML library bindings are maintained by the community, with
npm being the most popular ecosystem for these bindings. Our study also
indicates that most bindings cover only a limited range of the host library’s
releases, often experience considerable delays in supporting new releases, and
have widespread technical lag. Our findings highlight key factors to consider
for developers integrating bindings for ML libraries and open avenues for re-
searchers to further investigate bindings in software package ecosystems.
Hao Li; Gopi Krishnan Rajbahadur; Cor-Paul Bezemer
Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality Journal Article
ACM Transactions on Software Engineering and Methodology, 2024.
Abstract | BibTeX | Tags: Library bindings, Machine learning, SE4AI, SE4ML, Software quality
@article{Li_BindingsQuality,
title = {Studying the Impact of TensorFlow and PyTorch Bindings on Machine Learning Software Quality},
author = {Hao Li and Gopi Krishnan Rajbahadur and Cor-Paul Bezemer},
year = {2024},
date = {2024-07-07},
journal = {ACM Transactions on Software Engineering and Methodology},
abstract = {Bindings for machine learning frameworks (such as TensorFlow and PyTorch) allow developers to integrate a framework’s functionality using a programming language different from the framework’s default language (usually Python). In this paper, we study the impact of using TensorFlow and PyTorch bindings in C#, Rust, Python and JavaScript on the software quality in terms of correctness (training and test accuracy) and time cost (training and inference time) when training and performing inference on five widely used deep learning models. Our experiments show that a model can be trained in one binding and used for inference in another binding for the same framework without losing accuracy. Our study is the first to show that using a non-default binding can help improve machine learning software quality from the time cost perspective compared to the default Python binding while still achieving the same level of correctness.},
keywords = {Library bindings, Machine learning, SE4AI, SE4ML, Software quality},
pubstate = {published},
tppubtype = {article}
}
Tajkia Rahman Toma; Cor-Paul Bezemer
An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications Inproceedings
3rd IEEE/ACM International Conference on AI Engineering - Software Engineering for AI (CAIN), pp. 1–11, 2024.
Abstract | BibTeX | Tags: Data maintenance, SE4ML
@inproceedings{TomaCAIN2024,
title = {An Exploratory Study of Dataset and Model Management in Open Source Machine Learning Applications},
author = {Tajkia Rahman Toma and Cor-Paul Bezemer},
year = {2024},
date = {2024-01-17},
urldate = {2024-01-17},
booktitle = {3rd IEEE/ACM International Conference on AI Engineering - Software Engineering for AI (CAIN)},
pages = {1--11},
abstract = {Datasets and models are two key artifacts in machine learning
(ML) applications. Although there exist tools to support dataset
and model developers in managing ML artifacts, little is known
about how these datasets and models are integrated into ML ap-
plications. In this paper, we study how datasets and models in ML
applications are managed. In particular, we focus on how these
artifacts are stored and versioned alongside the applications. After
analyzing 93 repositories, we identified the most common storage
location to store datasets and models is the file system, which causes
availability issues. Notably, large data and model files, exceeding
approximately 60 MB, are stored exclusively in remote storage and
downloaded as needed. Most of the datasets and models lack proper
integration with the version control system, posing potential trace-
ability and reproducibility issues. Additionally, although datasets
and models are likely to evolve during the application development,
they are rarely updated in application repositories.},
keywords = {Data maintenance, SE4ML},
pubstate = {published},
tppubtype = {inproceedings}
}
(ML) applications. Although there exist tools to support dataset
and model developers in managing ML artifacts, little is known
about how these datasets and models are integrated into ML ap-
plications. In this paper, we study how datasets and models in ML
applications are managed. In particular, we focus on how these
artifacts are stored and versioned alongside the applications. After
analyzing 93 repositories, we identified the most common storage
location to store datasets and models is the file system, which causes
availability issues. Notably, large data and model files, exceeding
approximately 60 MB, are stored exclusively in remote storage and
downloaded as needed. Most of the datasets and models lack proper
integration with the version control system, posing potential trace-
ability and reproducibility issues. Additionally, although datasets
and models are likely to evolve during the application development,
they are rarely updated in application repositories.