Empirical software engineering research on production systems has brought forth a better understanding of the software engineering process for practitioners and researchers alike. However, only a small subset of production systems is studied, limiting the impact of this research. While software engineering practitioners benefit from replicating research on their own data, this poses its own set of challenges, since performing replications requires a deep understanding of research methodologies and subtle nuances in software engineering data. Given that large language models (LLMs), such as GPT-4, show promise in tackling both software engineering and science-related tasks, these models could help democratize empirical software engineering research.
In this paper, we examine GPT-4’s abilities to perform replications of empirical software engineering research on new data. We specifically study their ability to surface assumptions made in empirical software engineering research methodologies, as well as their ability to plan and generate code for analysis pipelines on seven empirical software engineering papers. We perform a user study with 14 participants with software engineering research expertise, who evaluate GPT-4-generated assumptions and analysis plans (i.e., a list of module specifications) from the papers. We find that GPT-4 is able to surface correct assumptions, but struggle to generate ones that reflect common knowledge about software engineering data. In a manual analysis of the generated code, we find that the GPT-4-generated code contains the correct high-level logic, given a subset of the methodology. However, the code contains many small implementation-level errors, reflecting a lack of software engineering knowledge. Our findings have implications for leveraging LLMs for software engineering research as well as practitioner data scientists in software teams.
Thu 18 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
15:30 - 16:00 | |||
15:30 30mPoster | Can GPT-4 Replicate Empirical Software Engineering Research? Posters Jenny T. Liang Carnegie Mellon University, Carmen Badea Microsoft Research, Christian Bird Microsoft Research, Robert DeLine Microsoft Research, Denae Ford Microsoft Research, Nicole Forsgren Microsoft Research, Thomas Zimmermann Microsoft Research | ||
15:30 30mPoster | Evaluating Directed Fuzzers: Are We Heading in the Right Direction? Posters Tae Eun Kim KAIST, Jaeseung Choi Sogang University, Seongjae Im KAIST, Kihong Heo KAIST, Sang Kil Cha KAIST Link to publication Media Attached File Attached | ||
15:30 30mPoster | Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection Posters Yuxi Li Huazhong University of Science and Technology, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Ying Zhang Virginia Tech, Wenjia Song Virginia Tech, Ling Shi Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Yuekang Li The University of New South Wales, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology | ||
15:30 30mPoster | Do Words Have Power? Understanding and Fostering Civility in Code Review Discussion Posters Md Shamimur Rahman University of Saskatchewan, Canada, Zadia Codabux University of Saskatchewan, Chanchal K. Roy University of Saskatchewan, Canada | ||
15:30 30mPoster | CodePlan: Repository-level Coding using LLMs and Planning Posters Ramakrishna Bairi Microsoft Research, India, Atharv Sonwane Microsoft Research, India, Aditya Kanade Microsoft Research, India, Vageesh D C Microsoft Research, India, Arun Iyer Microsoft Research, India, Suresh Parthasarathy Microsoft Research, India, Sriram Rajamani Microsoft Research Indua, B. Ashok Microsoft Research. India, Shashank Shet Microsoft Research. India | ||
15:30 30mPoster | Understanding and Detecting Annotation-induced Faults of Static Analyzers Posters Huaien Zhang The Hong Kong Polytechnic University, Southern University of Science and Technology, Yu Pei The Hong Kong Polytechnic University, Shuyun Liang Southern University of Science and Technology, Shin Hwei Tan Concordia University | ||
15:30 30mPoster | Partial Solution Based Constraint Solving Cache in Symbolic Execution Posters Ziqi Shuai School of Computer, National University of Defense Technology, China, Zhenbang Chen College of Computer, National University of Defense Technology, Kelin Ma School of Computer, National University of Defense Technology, China, Kunlin Liu School of Computer, National University of Defense Technology, China, Yufeng Zhang Hunan University, Jun Sun School of Information Systems, Singapore Management University, Singapore, Ji Wang School of Computer, National University of Defense Technology, China | ||
15:30 30mPoster | Characterizing Python Library Migrations Posters Mohayeminul Islam University of Alberta, Ajay Jha North Dakota State University, Ildar Akhmetov Northeastern University, Sarah Nadi New York University Abu Dhabi, University of Alberta File Attached | ||
15:30 30mPoster | DeSQL: Interactive Debugging of SQL in Data-Intensive Scalable Computing Posters | ||
15:30 30mPoster | BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection Posters | ||
15:30 30mPoster | Less Cybersickness, Please: Demystifying and Detecting Stereoscopic Visual Inconsistencies in Virtual Reality Applications Posters Shuqing Li The Chinese University of Hong Kong, Cuiyun Gao Harbin Institute of Technology, Jianping Zhang The Chinese University of Hong Kong, Yujia Zhang Harbin Institute of Technology, Yepang Liu Southern University of Science and Technology, Jiazhen Gu The Chinese University of Hong Kong, Yun Peng The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong | ||
15:30 30mPoster | CORE: Resolving Code Quality Issues Using LLMs Posters Nalin Wadhwa Microsoft Research, India, Jui Pradhan Microsoft Research, India, Atharv Sonwane Microsoft Research, India, Surya Prakash Sahu Microsoft Research, India, Nagarajan Natarajan Microsoft Research India, Aditya Kanade Microsoft Research, India, Suresh Parthasarathy Microsoft Research, India, Sriram Rajamani Microsoft Research Indua | ||
15:30 30mPoster | Abstraction-Aware Inference of Metamorphic Relations Posters Agustin Nolasco University of Rio Cuarto, Facundo Molina IMDEA Software Institute, Renzo Degiovanni Luxembourg Institute of Science and Technology, Alessandra Gorla IMDEA Software Institute, Diego Garbervetsky Departamento de Computación, FCEyN, UBA, Mike Papadakis University of Luxembourg, Sebastian Uchitel Imperial College and University of Buenos Aires, Nazareno Aguirre University of Rio Cuarto and CONICET, Marcelo F. Frias Dept. of Software Engineering Instituto Tecnológico de Buenos Aires | ||
15:30 30mPoster | State Reconciliation Defects in Infrastructure as Code Posters Md Mahadi Hassan Auburn University, John Salvador Auburn University, Shubhra Kanti Karmaker Santu Auburn University, Akond Rahman Auburn University |
This room is conjoined with the Foyer to provide additional space for the coffee break, and hold poster presentations throughout the event.