( = Paper PDF, = Presentation slides, = Presentation video)
1.
Mohammad Reza Taesiri; Tianjun Feng; Anh Nguyen; Cor-Paul Bezemer
GlitchBench: Can large multimodal models detect video game glitches? Inproceedings
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
Abstract | BibTeX | Tags: Computer games, Foundation models, Game development, Gameplay videos, LLM
@inproceedings{TaesiriCVPR2024,
title = {GlitchBench: Can large multimodal models detect video game glitches?},
author = {Mohammad Reza Taesiri and Tianjun Feng and Anh Nguyen and Cor-Paul Bezemer},
year = {2024},
date = {2024-06-15},
urldate = {2024-03-15},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
abstract = {Large multimodal models (LMMs) have evolved from
large language models (LLMs) to integrate multiple input
modalities, such as visual inputs. This integration augments
the capacity of LLMs for tasks requiring visual comprehen-
sion and reasoning. However, the extent and limitations of
their enhanced abilities are not fully understood, especially
when it comes to real-world tasks. To address this gap, we
introduce GlitchBench, a novel benchmark derived from
video game quality assurance tasks, to test and evaluate the
reasoning capabilities of LMMs. Our benchmark is curated
from a variety of unusual and glitched scenarios from video
games and aims to challenge both the visual and linguis-
tic reasoning powers of LMMs in detecting and interpreting
out-of-the-ordinary events. We evaluate multiple state-of-
the-art LMMs, and we show that GlitchBench presents a
new challenge for these models. Code and data are avail-
able at: https://glitchbench.github.io/},
keywords = {Computer games, Foundation models, Game development, Gameplay videos, LLM},
pubstate = {published},
tppubtype = {inproceedings}
}
Large multimodal models (LMMs) have evolved from
large language models (LLMs) to integrate multiple input
modalities, such as visual inputs. This integration augments
the capacity of LLMs for tasks requiring visual comprehen-
sion and reasoning. However, the extent and limitations of
their enhanced abilities are not fully understood, especially
when it comes to real-world tasks. To address this gap, we
introduce GlitchBench, a novel benchmark derived from
video game quality assurance tasks, to test and evaluate the
reasoning capabilities of LMMs. Our benchmark is curated
from a variety of unusual and glitched scenarios from video
games and aims to challenge both the visual and linguis-
tic reasoning powers of LMMs in detecting and interpreting
out-of-the-ordinary events. We evaluate multiple state-of-
the-art LMMs, and we show that GlitchBench presents a
new challenge for these models. Code and data are avail-
able at: https://glitchbench.github.io/
large language models (LLMs) to integrate multiple input
modalities, such as visual inputs. This integration augments
the capacity of LLMs for tasks requiring visual comprehen-
sion and reasoning. However, the extent and limitations of
their enhanced abilities are not fully understood, especially
when it comes to real-world tasks. To address this gap, we
introduce GlitchBench, a novel benchmark derived from
video game quality assurance tasks, to test and evaluate the
reasoning capabilities of LMMs. Our benchmark is curated
from a variety of unusual and glitched scenarios from video
games and aims to challenge both the visual and linguis-
tic reasoning powers of LMMs in detecting and interpreting
out-of-the-ordinary events. We evaluate multiple state-of-
the-art LMMs, and we show that GlitchBench presents a
new challenge for these models. Code and data are avail-
able at: https://glitchbench.github.io/
2.
Balreet Grewal; Wentao Lu; Sarah Nadi; Cor-Paul Bezemer
Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects Inproceedings
International Conference on Mining Software Repositories (MSR), 2024.
Abstract | BibTeX | Tags: Code reuse, LLM, SE4AI
@inproceedings{GrewalMSR2024,
title = {Analyzing Developer Use of ChatGPT Generated Code in Open Source GitHub Projects},
author = {Balreet Grewal and Wentao Lu and Sarah Nadi and Cor-Paul Bezemer },
year = {2024},
date = {2024-04-14},
urldate = {2024-04-14},
booktitle = {International Conference on Mining Software Repositories (MSR)},
abstract = {The rapid development of large language models such as ChatGPT
have made them particularly useful to developers in generating
code snippets for their projects. To understand how ChatGPT’s
generated code is leveraged by developers, we conducted an em-
pirical study of 3,044 ChatGPT-generated code snippets integrated
within GitHub projects. A median of 54% of the generated lines of
code is found in the project’s code and this code typically remains
unchanged once added. The modifications of the 76 code snippets
that changed in a subsequent commit, consisted of minor function-
ality changes and code reorganizations that were made within a
day. Our findings offer insights that help drive the development
of AI-assisted programming tools. We highlight the importance
of making changes in ChatGPT code before integrating it into a
project.},
keywords = {Code reuse, LLM, SE4AI},
pubstate = {published},
tppubtype = {inproceedings}
}
The rapid development of large language models such as ChatGPT
have made them particularly useful to developers in generating
code snippets for their projects. To understand how ChatGPT’s
generated code is leveraged by developers, we conducted an em-
pirical study of 3,044 ChatGPT-generated code snippets integrated
within GitHub projects. A median of 54% of the generated lines of
code is found in the project’s code and this code typically remains
unchanged once added. The modifications of the 76 code snippets
that changed in a subsequent commit, consisted of minor function-
ality changes and code reorganizations that were made within a
day. Our findings offer insights that help drive the development
of AI-assisted programming tools. We highlight the importance
of making changes in ChatGPT code before integrating it into a
project.
have made them particularly useful to developers in generating
code snippets for their projects. To understand how ChatGPT’s
generated code is leveraged by developers, we conducted an em-
pirical study of 3,044 ChatGPT-generated code snippets integrated
within GitHub projects. A median of 54% of the generated lines of
code is found in the project’s code and this code typically remains
unchanged once added. The modifications of the 76 code snippets
that changed in a subsequent commit, consisted of minor function-
ality changes and code reorganizations that were made within a
day. Our findings offer insights that help drive the development
of AI-assisted programming tools. We highlight the importance
of making changes in ChatGPT code before integrating it into a
project.