Do Large Language Models Generate Similar Codes from Mutated Prompts?: A Case Study of Gemini Pro (FSE 2024 - Posters)

Who

Hetvi Patel, Kevin Amit Shah, Shouvick Mondal

Track

FSE 2024 Posters

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Jul 2024 10:30 - 11:00 at Lounge - Poster Session 1

Abstract

In this work, we delve into the domain of source code similarity detection using Large Language Models (LLMs). Our investigation is motivated by the necessity to identify similarities among different pieces of source code, a critical aspect for tasks such as plagiarism detection and code reuse. We specifically focus on exploring the effectiveness of leveraging LLMs for this purpose. To achieve this, we utilized the LLMSecEval dataset, comprising 150 NL prompts for code generation across two languages - C and Python, and employed radamsa, a mutation-based input generator, to create 27 different mutations per NL prompt. Subsequently, using Gemini Pro, we generated code for the original and mutated NL prompts to study code similarity using CodeBERT. Our experiment aims to uncover the extent to which LLMs can consistently generate similar code despite mutations in the input NL prompts, providing insights into the robustness and generalization capabilities of LLMs in understanding and comparing programming code structures and semantics.

Link to Preprint

https://drive.google.com/file/d/1NsG7LEj4gDmM6Bcz31TAcv999kyUKfAH/view?usp=sharing

DOI

https://doi.org/10.1145/3663529.3663873

File attachments

Poster (FSE_2024_LLM_Sim.pdf)	632KiB

Hetvi Patel

IIT Gandhinagar

Kevin Amit Shah

IIT Gandhinagar

Shouvick Mondal

IIT Gandhinagar

India

Artifacts

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

10:30 - 11:00	Poster Session 1Posters at Lounge

10:30 30m Poster		MicroSensor: Towards an Extensible Tool for the Static Analysis of Microservices Systems in Continuous Integration Posters Edson Soares Instituto Atlantico & State University of Ceara (UECE), Matheus Paixao State University of Ceará, Allysson Allex Araújo Federal University of Cariri
10:30 30m Poster		SORBET: A Framework to Evaluate the Robustness of LiDAR 3D Object Detection and Its Impacts on Autonomous Driving Posters Tri Minh-Triet Pham Concordia University, Jinqiu Yang Concordia University
10:30 30m Poster		An Analysis of the Costs and Benefits of Autocomplete in IDEs Posters Shaokang Jiang University of California, San Diego, Michael Coblenz University of California, San Diego
10:30 30m Poster		Go the Extra Mile: Fixing Propagated Error-Handling Bugs Posters Haoran Liu National University of Defense Technology, Zhouyang Jia National University of Defense Technology, Huiping Zhou National University of Defense Technology, Haifang Zhou National University of Defense Technology, Shanshan Li National University of Defense Technology
10:30 30m Poster		Hybrid Regression Test Selection by Synergizing File and Method Call Dependences Posters Luyao Liu College of Computer, National University of Defense Technology, Guofeng Zhang College of Computer, National University of Defense Technology, Zhenbang Chen College of Computer, National University of Defense Technology, Ji Wang School of Computer, National University of Defense Technology, China
10:30 30m Poster		Do Large Language Models Generate Similar Codes from Mutated Prompts?: A Case Study of Gemini Pro Posters Hetvi Patel IIT Gandhinagar, Kevin Amit Shah IIT Gandhinagar, Shouvick Mondal IIT Gandhinagar DOI Pre-print Media Attached File Attached
10:30 30m Poster		Towards Realistic SATD Identification Through Machine Learning Models: Ongoing Research and Preliminary Results Posters Eliakim Gama State University of Ceará, Matheus Paixao State University of Ceará, Mariela I. Cortés State University of Ceará, Lucas Monteiro State University of Ceará DOI Pre-print
10:30 30m Poster		Building Software Engineering Capacity through a University Open Source Program Office Posters Ekaterina Holdener Saint Louis University, Daniel Shown Saint Louis University
10:30 30m Poster		Inferring Natural Preconditions via Program Transformation Posters Elizabeth Dinella Bryn Mawr College, Shuvendu K. Lahiri Microsoft Research, Mayur Naik UPenn
10:30 30m Poster		RFNIT: Robotic Framework for Non-Invasive Testing Posters Davi Simoes Freitas Centro de Informática at Universidade Federal de Pernambuco, Breno Miranda Centro de Informática at Universidade Federal de Pernambuco, Juliano Iyoda Centro de Informática at Universidade Federal de Pernambuco

Information for Participants

Wed 17 Jul 2024 10:30 - 11:00 at Lounge - Poster Session 1

Info for room Lounge:

This room is conjoined with the Foyer to provide additional space for the coffee break, and hold poster presentations throughout the event.