Learning to Detect and Localize Multilingual Bugs (FSE 2024 - Research Papers)

Mon 15 - Fri 19 July 2024 Porto de Galinhas, Brazil, Brazil

Who

Haoran Yang, Yu Nong, Tao Zhang, Xiapu Luo, Haipeng Cai

Track

FSE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 Jul 2024 11:18 - 11:36 at Pitomba - AI4SE 4 Chair(s): Wesley Assunção

Abstract

Increasing studies have shown bugs in multi-language software as a critical loophole in modern software quality assurance, especially those induced by language interactions (i.e., multilingual bugs). Yet existing tool support for bug detection/localization remains largely limited to single-language software, despite the long-standing prevalence of multi-language systems in various real-world software domains. Extant static/dynamic analysis and deep learning (DL) based approaches all face major challenges in addressing multilingual bugs. In this paper, we present xLoc, a DL-based technique/tool for detecting and localizing multilingual bugs. Motivated by results of our bug-characteristics study on top locations of multilingual bugs, xLoc first learns the general knowledge relevant to differentiating various multilingual control-flow structures. This is achieved by pre-training a Transformer model with customized position encoding against novel objectives. Then, xLoc learns task-specific knowledge for the task of multilingual bug detection/localization, through another new position encoding scheme (based on cross-language API vicinity) that allows for the model to attend particularly to control-flow constructs that bear most multilingual bugs during fine-tuning. We have implemented xLoc for Python-C software and curated a dataset of 3,770 buggy and 15,884 non-buggy Python-C samples, which enabled our extensive evaluation of xLoc against two state-of-the-art baselines: fine-tuned CodeT5 and zero-shot ChatGPT. Our results show that xLoc achieved 94.98% F1 and 87.24%@Top-1 accuracy, which are significantly (up to 162.88% and 511.75%) higher than the baselines. Ablation studies further confirmed significant contributions of each of the novel design elements in xLoc. With respective bug-location characteristics and labeled bug datasets for fine-tuning, our design may be applied to other language combinations beyond Python-C.

Link to Preprint

https://chapering.github.io/pubs/fse24haoran.pdf

DOI

https://doi.org/10.1145/3660804

Haoran Yang

Washington State University

United States

Yu Nong

Washington State University

Tao Zhang

Macau University of Science and Technology

China

Xiapu Luo

The Hong Kong Polytechnic University

China

Haipeng Cai

Washington State University

United States

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 19 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30	AI4SE 4 Research Papers at Pitomba Chair(s): Wesley Assunção North Carolina State University

11:00 18m Talk		Improving the Learning of Code Review Successive Tasks with Cross-Task Knowledge Distillation Research Papers Oussama Ben Sghaier DIRO, Université de Montréal, Houari Sahraoui DIRO, Université de Montréal
11:18 18m Talk		Learning to Detect and Localize Multilingual Bugs Research Papers Haoran Yang Washington State University, Yu Nong Washington State University, Tao Zhang Macau University of Science and Technology, Xiapu Luo The Hong Kong Polytechnic University, Haipeng Cai Washington State University DOI Pre-print
11:36 18m Talk		Mining Action Rules for Defect Reduction Planning Research Papers Khouloud Oueslati Polytechnique Montréal, Canada, Gabriel Laberge Polytechnique Montréal, Canada, Maxime Lamothe Polytechnique Montreal, Foutse Khomh Polytechnique Montréal
11:54 18m Talk		Predicting Failures of Autoscaling Distributed Applications Research Papers Giovanni Denaro University of Milano - Bicocca, Noura El Moussa USI Università della Svizzera Italiana & SIT Schaffhausen Institute of Technology, Rahim Heydarov USI Università della Svizzera Italiana, Francesco Lomio SIT Schaffhausen Institute of Technology, Mauro Pezze USI Università della Svizzera Italiana & SIT Schaffhausen Institute of Technology, Ketai Qiu USI Università della Svizzera Italiana DOI Pre-print
12:12 18m Talk		RavenBuild: Context, Relevance, and Dependency Aware Build Outcome Prediction Research Papers Gengyi Sun University of Waterloo, Sarra Habchi Ubisoft Montréal, Shane McIntosh University of Waterloo