( = Paper PDF,
= Presentation slides,
= Presentation video)
Markos Viggiato; Dale Paas; Cor-Paul Bezemer
Prioritizing Natural Language Test Cases Based on Highly-Used Game Features Inproceedings
Proceedings of the 31st Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), pp. 1–12, 2023.
Abstract | BibTeX | Tags: Computer games, Game development, Natural language processing, Testing
@inproceedings{ViggiatoFSE2023,
title = {Prioritizing Natural Language Test Cases Based on Highly-Used Game Features},
author = {Markos Viggiato and Dale Paas and Cor-Paul Bezemer },
year = {2023},
date = {2023-12-01},
urldate = {2023-12-01},
booktitle = {Proceedings of the 31st Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE)},
pages = {1--12},
abstract = {Software testing is still a manual activity in many industries, such
as the gaming industry. But manually executing tests becomes im-
practical as the system grows and resources are restricted, mainly
in a scenario with short release cycles. Test case prioritization is a
commonly used technique to optimize the test execution. However,
most prioritization approaches do not work for manual test cases
as they require source code information or test execution history,
which is often not available in a manual testing scenario. In this
paper, we propose a prioritization approach for manual test cases
written in natural language based on the tested application features
(in particular, highly-used application features). Our approach con-
sists of (1) identifying the tested features from natural language test
cases (with zero-shot classification techniques) and (2) prioritizing
test cases based on the features that they test. We leveraged the
NSGA-II genetic algorithm for the multi-objective optimization of
the test case ordering to maximize the coverage of highly-used
features while minimizing the cumulative execution time. Our find-
ings show that we can successfully identify the application features
covered by test cases using an ensemble of pre-trained models
with strong zero-shot capabilities (an F-score of 76.1%). Also, our
prioritization approaches can find test case orderings that cover
highly-used application features early in the test execution while
keeping the time required to execute test cases short. QA engineers
can use our approach to focus the test execution on test cases that
cover features that are relevant to users.},
keywords = {Computer games, Game development, Natural language processing, Testing},
pubstate = {published},
tppubtype = {inproceedings}
}
as the gaming industry. But manually executing tests becomes im-
practical as the system grows and resources are restricted, mainly
in a scenario with short release cycles. Test case prioritization is a
commonly used technique to optimize the test execution. However,
most prioritization approaches do not work for manual test cases
as they require source code information or test execution history,
which is often not available in a manual testing scenario. In this
paper, we propose a prioritization approach for manual test cases
written in natural language based on the tested application features
(in particular, highly-used application features). Our approach con-
sists of (1) identifying the tested features from natural language test
cases (with zero-shot classification techniques) and (2) prioritizing
test cases based on the features that they test. We leveraged the
NSGA-II genetic algorithm for the multi-objective optimization of
the test case ordering to maximize the coverage of highly-used
features while minimizing the cumulative execution time. Our find-
ings show that we can successfully identify the application features
covered by test cases using an ensemble of pre-trained models
with strong zero-shot capabilities (an F-score of 76.1%). Also, our
prioritization approaches can find test case orderings that cover
highly-used application features early in the test execution while
keeping the time required to execute test cases short. QA engineers
can use our approach to focus the test execution on test cases that
cover features that are relevant to users.
Finlay Macklon; Markos Viggiato; Natalia Romanova; Chris Buzon; Dale Paas; Cor-Paul Bezemer
A Taxonomy of Testable HTML5 Canvas Issues Journal Article
Transactions of Software Engineering (TSE), 49 (6), pp. 3647–3659, 2023.
Abstract | BibTeX | Tags: Testing, Web applications
@article{MacklonTSE2023,
title = {A Taxonomy of Testable HTML5 Canvas Issues},
author = {Finlay Macklon and Markos Viggiato and Natalia Romanova and Chris Buzon and Dale Paas and Cor-Paul Bezemer},
year = {2023},
date = {2023-06-01},
urldate = {2023-06-01},
journal = {Transactions of Software Engineering (TSE)},
volume = {49},
number = {6},
pages = {3647--3659},
abstract = {The HTML5 canvas is widely used to display high quality graphics in web applications. However, the combination of
web, GUI, and visual techniques that are required to build canvas applications, together with the lack of testing and debugging
tools, makes developing such applications very challenging. To help direct future research on testing canvas applications, in this
paper we present a taxonomy of testable canvas issues. First, we extracted 2,403 canvas related issue reports from 123 open
source GitHub projects that use the HTML5 canvas. Second, we constructed our taxonomy by manually classifying a random
sample of 332 issue reports. Our manual classification identified five broad categories of testable canvas issues, such as Visual
and Performance issues. We found that Visual issues are the most frequent (35%), while Performance issues are relatively infrequent
(5%). We also found that many testable canvas issues that present themselves visually on the canvas are actually caused by
other components of the web application. Our taxonomy of testable canvas issues can be used to steer future research into
canvas issues and testing.},
keywords = {Testing, Web applications},
pubstate = {published},
tppubtype = {article}
}
web, GUI, and visual techniques that are required to build canvas applications, together with the lack of testing and debugging
tools, makes developing such applications very challenging. To help direct future research on testing canvas applications, in this
paper we present a taxonomy of testable canvas issues. First, we extracted 2,403 canvas related issue reports from 123 open
source GitHub projects that use the HTML5 canvas. Second, we constructed our taxonomy by manually classifying a random
sample of 332 issue reports. Our manual classification identified five broad categories of testable canvas issues, such as Visual
and Performance issues. We found that Visual issues are the most frequent (35%), while Performance issues are relatively infrequent
(5%). We also found that many testable canvas issues that present themselves visually on the canvas are actually caused by
other components of the web application. Our taxonomy of testable canvas issues can be used to steer future research into
canvas issues and testing.
Markos Viggiato
Leveraging Natural Language Processing Techniques to Improve Manual Game Testing PhD Thesis
2023.
Abstract | BibTeX | Tags: Computer games, Game development, Natural language processing, Testing
@phdthesis{ViggiatoPhD,
title = {Leveraging Natural Language Processing Techniques to Improve Manual Game Testing},
author = {Markos Viggiato },
year = {2023},
date = {2023-01-17},
urldate = {2023-01-17},
abstract = {The gaming industry has experienced a sharp growth in recent years, surpassing other popular entertainment segments, such as the film industry. With the ever-increasing scale of the gaming industry and the fact that players are extremely difficult to satisfy, it has become extremely challenging to develop a successful game. In this context, the quality of games has become a critical issue. Game testing is a widely-performed activity to ensure that games meet the desired quality criteria. However, despite recent advancements in test automation, manual game testing is still prevalent in the gaming industry, with test cases often described in natural language only and consisting of one or more test steps that must be manually performed by the Quality Assurance (QA) engineer (i.e., the tester). This makes game testing challenging and costly. Issues such as redundancy (i.e., when different test cases have the same testing objective) and incompleteness (i.e., when test cases miss one or more steps) become a bigger concern in a manual game testing scenario. In addition, as games become bigger and the number of required test cases increases, it becomes impractical to execute all test cases in a scenario with short game release cycles, for example.
Prior work proposed several approaches to analyze and improve test cases with associated source code. However, there is little research on improving manual game testing. Having higher-quality test cases and optimizing test execution help to reduce wasted developer time and allow testers to use testing resources more effectively, which makes game testing more efficient and effective. In addition, even though players are extremely difficult to satisfy, their priorities are not considered during game testing. In this thesis, we investigate how to improve manual game testing from different perspectives.
In the first part of the thesis, we investigated how we can reduce redundancy in the test suite by identifying similar natural language test cases. We evaluated several unsupervised approaches using text embedding, text similarity, and cluster-
ing techniques and showed that we can successfully identify similar test cases with a high performance. We also investigated how we can improve test case descriptions to reduce the number of unclear, ambiguous, and incomplete test cases. We proposed and evaluated an automated framework that leverages statistical and neural language models and (1) provides recommendations to improve test case descriptions, (2) recommends potentially missing steps, and (3) suggests existing similar test cases.
In the second part of the thesis, we investigated how player priorities can be included in the game testing process. We first proposed an approach to prioritize test cases that cover the game features that players use the most, which helps to avoid bugs that could affect a very large number of players. Our approach (1) identifies the game features covered by test cases using an ensemble of zero-shot techniques with a high performance and (2) optimizes the test execution based on highly-used game features covered by test cases. Finally, we investigated how sentiment classifiers perform on game reviews and what issues affect those classifiers. High-performing classifiers can be used to obtain players' sentiments about games and guide testing based on the game features that players like or dislike. We show that, while traditional sentiment classifiers do not perform well, a modern classifier (the OPT-175B Large Language Model) presents a (far) better performance. The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches to support QA engineers and developers to improve manual game testing.},
keywords = {Computer games, Game development, Natural language processing, Testing},
pubstate = {published},
tppubtype = {phdthesis}
}
Prior work proposed several approaches to analyze and improve test cases with associated source code. However, there is little research on improving manual game testing. Having higher-quality test cases and optimizing test execution help to reduce wasted developer time and allow testers to use testing resources more effectively, which makes game testing more efficient and effective. In addition, even though players are extremely difficult to satisfy, their priorities are not considered during game testing. In this thesis, we investigate how to improve manual game testing from different perspectives.
In the first part of the thesis, we investigated how we can reduce redundancy in the test suite by identifying similar natural language test cases. We evaluated several unsupervised approaches using text embedding, text similarity, and cluster-
ing techniques and showed that we can successfully identify similar test cases with a high performance. We also investigated how we can improve test case descriptions to reduce the number of unclear, ambiguous, and incomplete test cases. We proposed and evaluated an automated framework that leverages statistical and neural language models and (1) provides recommendations to improve test case descriptions, (2) recommends potentially missing steps, and (3) suggests existing similar test cases.
In the second part of the thesis, we investigated how player priorities can be included in the game testing process. We first proposed an approach to prioritize test cases that cover the game features that players use the most, which helps to avoid bugs that could affect a very large number of players. Our approach (1) identifies the game features covered by test cases using an ensemble of zero-shot techniques with a high performance and (2) optimizes the test execution based on highly-used game features covered by test cases. Finally, we investigated how sentiment classifiers perform on game reviews and what issues affect those classifiers. High-performing classifiers can be used to obtain players' sentiments about games and guide testing based on the game features that players like or dislike. We show that, while traditional sentiment classifiers do not perform well, a modern classifier (the OPT-175B Large Language Model) presents a (far) better performance. The research work presented in this thesis provides deep insights, actionable recommendations, and effective and thoroughly evaluated approaches to support QA engineers and developers to improve manual game testing.
Finlay Macklon; Mohammad Reza Taesiri; Markos Viggiato; Stefan Antoszko; Natalia Romanova; Dale Paas; Cor-Paul Bezemer
Automatically Detecting Visual Bugs in HTML5 <canvas> Games Inproceedings
37th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2022.
BibTeX | Tags: Computer games, Game development, Gaming, Regression testing, Testing, Web applications
@inproceedings{finlay_ase2022,
title = {Automatically Detecting Visual Bugs in HTML5
Luisa Palechor
Characterizing (un)successful open source blockchain projects and their testing practices Masters Thesis
2022.
Abstract | BibTeX | Tags: blockchain, Smart contracts, Testing
@mastersthesis{luisa2022,
title = {Characterizing (un)successful open source blockchain projects and their testing practices},
author = {Luisa Palechor},
year = {2022},
date = {2022-09-26},
urldate = {2022-09-26},
abstract = {The most well-known blockchain applications are cryptocurrencies, e.g., Ether and Bitcoin, which both sum a market cap of more than 560 billion US dollars. Besides cryptocurrency applications, programmable blockchain allows the development of different applications, e.g., peer-to-peer selling of renewable energy from smart grids, digital rights management, and supply chain tracking and operation. These applications can be developed and deployed on the blockchain through smart contracts, which are small programs that run on the blockchain under particular conditions. As bugs in blockchain applications (in particular, cryptocurrencies) can have large financial impact, it is important to ensure that these applications are well-developed and well-tested. However, currently software development and testing practices of blockchain projects are largely unstudied. In this thesis, we study data from GitHub and CoinMarketCap to understand the characteristics of successful and unsuccessful blockchain projects and reveal the testing practices in Solidity projects with the aim of helping developers to identify projects from which they can learn, or should contribute to. In the first part of the thesis, we study data from CoinMarketCap and GitHub to gain knowledge about the characteristics of successful and unsuccessful blockchain projects. We build a random forest classifier with 320 labelled projects and metrics from 3 dimensions (activity, popularity, and complexity). We found that a large number of stars and a project’s age can help distinguish between successful and unsuccessful projects. Additionally, we found that code cloning practices tend to be common in unsuccessful projects written in Python, C++, Java and Solidity. In the second part of the thesis, we explore how quality is addressed in blockchain applications by studying how 139 open source Solidity projects are tested. We show that core development team members are the developers who usually contribute to
testing files, leaving external contributions rare. In addition, our results indicate that only functional testing is practiced among the majority of Solidity projects, with Truffle and Hardhat being the tools commonly used to test Solidity smart contracts. Moreover, security testing is a practice rarely conducted, and performance testing is not conducted at all. We finally found that audits by a third party are common in several smart contracts. Future researchers and developers can use our findings to understand what characterizes successful and unsuccessful blockchain projects and be aware of the testing practices developers conduct in open source blockchain projects.},
keywords = {blockchain, Smart contracts, Testing},
pubstate = {published},
tppubtype = {mastersthesis}
}
testing files, leaving external contributions rare. In addition, our results indicate that only functional testing is practiced among the majority of Solidity projects, with Truffle and Hardhat being the tools commonly used to test Solidity smart contracts. Moreover, security testing is a practice rarely conducted, and performance testing is not conducted at all. We finally found that audits by a third party are common in several smart contracts. Future researchers and developers can use our findings to understand what characterizes successful and unsuccessful blockchain projects and be aware of the testing practices developers conduct in open source blockchain projects.
Markos Viggiato; Dale Paas; Chris Buzon; Cor-Paul Bezemer
Identifying Similar Test Cases That Are Specified in Natural Language Journal Article
Transactions of Software Engineering (TSE), 2022.
Abstract | BibTeX | Tags: Game development, Testing
@article{ViggiatoTSE2022,
title = {Identifying Similar Test Cases That Are Specified in Natural Language},
author = {Markos Viggiato and Dale Paas and Chris Buzon and Cor-Paul Bezemer},
year = {2022},
date = {2022-04-21},
urldate = {2022-04-21},
journal = {Transactions of Software Engineering (TSE)},
abstract = {Software testing is still a manual process in many industries, despite the recent improvements in automated testing
techniques. As a result, test cases (which consist of one or more test steps that need to be executed manually by the tester) are often
specified in natural language by different employees and many redundant test cases might exist in the test suite. This increases the
(already high) cost of test execution. Manually identifying similar test cases is a time-consuming and error-prone task. Therefore, in this
paper, we propose an unsupervised approach to identify similar test cases. Our approach uses a combination of text embedding, text
similarity and clustering techniques to identify similar test cases. We evaluate five different text embedding techniques, two text
similarity metrics, and two clustering techniques to cluster similar test steps and three techniques to identify similar test cases from the
test step clusters. Through an evaluation in an industrial setting, we showed that our approach achieves a high performance to cluster
test steps (an F-score of 87.39%) and identify similar test cases (an F-score of 83.47%). Furthermore, a validation with developers
indicates several different practical usages of our approach (such as identifying redundant test cases), which help to reduce the testing
manual effort and time.},
keywords = {Game development, Testing},
pubstate = {published},
tppubtype = {article}
}
techniques. As a result, test cases (which consist of one or more test steps that need to be executed manually by the tester) are often
specified in natural language by different employees and many redundant test cases might exist in the test suite. This increases the
(already high) cost of test execution. Manually identifying similar test cases is a time-consuming and error-prone task. Therefore, in this
paper, we propose an unsupervised approach to identify similar test cases. Our approach uses a combination of text embedding, text
similarity and clustering techniques to identify similar test cases. We evaluate five different text embedding techniques, two text
similarity metrics, and two clustering techniques to cluster similar test steps and three techniques to identify similar test cases from the
test step clusters. Through an evaluation in an industrial setting, we showed that our approach achieves a high performance to cluster
test steps (an F-score of 87.39%) and identify similar test cases (an F-score of 83.47%). Furthermore, a validation with developers
indicates several different practical usages of our approach (such as identifying redundant test cases), which help to reduce the testing
manual effort and time.
Luisa Palechor; Cor-Paul Bezemer
How are Solidity smart contracts tested in open source projects? An exploratory study Inproceedings
3rd IEEE/ACM International Conference on Automation of Software Test (AST), 2022.
Abstract | BibTeX | Tags: Smart contracts, Testing
@inproceedings{PalechorAST2022,
title = {How are Solidity smart contracts tested in open source projects? An exploratory study},
author = {Luisa Palechor and Cor-Paul Bezemer},
year = {2022},
date = {2022-03-10},
urldate = {2022-03-10},
booktitle = {3rd IEEE/ACM International Conference on Automation of Software Test (AST)},
abstract = {Smart contracts are self-executing programs that are stored on the
blockchain. Once a smart contract is compiled and deployed on
the blockchain, it cannot be modified. Therefore, having a bug-
free smart contract is vital. To ensure a bug-free smart contract,
it must be tested thoroughly. However, little is known about how
developers test smart contracts in practice. Our study explores 139
open source smart contract projects that are written in Solidity
to investigate the state of smart contract testing from three di-
mensions: (1) the developers working on the tests, (2) the used
testing frameworks and testnets and (3) the type of tests that are
conducted. We found that mostly core developers of a project are
responsible for testing the contracts. Second, developers typically
use only functional testing frameworks to test a smart contract,
with Truffle being the most popular one. Finally, our results show
that functional testing is conducted in most of the studied projects
(93%), security testing is only performed in a few projects (9.4%) and
traditional performance testing is conducted in none. In addition,
we found 25 projects that mentioned or published external audit
reports.},
keywords = {Smart contracts, Testing},
pubstate = {published},
tppubtype = {inproceedings}
}
blockchain. Once a smart contract is compiled and deployed on
the blockchain, it cannot be modified. Therefore, having a bug-
free smart contract is vital. To ensure a bug-free smart contract,
it must be tested thoroughly. However, little is known about how
developers test smart contracts in practice. Our study explores 139
open source smart contract projects that are written in Solidity
to investigate the state of smart contract testing from three di-
mensions: (1) the developers working on the tests, (2) the used
testing frameworks and testnets and (3) the type of tests that are
conducted. We found that mostly core developers of a project are
responsible for testing the contracts. Second, developers typically
use only functional testing frameworks to test a smart contract,
with Truffle being the most popular one. Finally, our results show
that functional testing is conducted in most of the studied projects
(93%), security testing is only performed in a few projects (9.4%) and
traditional performance testing is conducted in none. In addition,
we found 25 projects that mentioned or published external audit
reports.