Do Code Generation Models Think Like Us? - A Study of Attention Alignment between Large Language Models and Human Programmers (FSE 2024 - Posters)

Who

Bonan Kou, Shengmai Chen, Zhijie Wang, Lei Ma, Tianyi Zhang

Track

FSE 2024 Posters

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 Jul 2024 10:30 - 11:00 at Lounge - Poster Session 4

Abstract

Large Language Models (LLMs) have been demonstrated effective for code generation. Due to the complexity and opacity of LLMs, little is known about how these models generate code. We made the first attempt to bridge this knowledge gap by investigating whether LLMs attend to the same parts of a task description as human programmers during code generation. An analysis of six LLMs, including GPT-4, on two popular code generation benchmarks revealed a consistent misalignment between LLMs’ and programmers’ attention. Furthermore, an in-depth analysis of 273 incorrect LLM-generated code snippets showed that 35% of the errors can be explained by two attention patterns—incorrect attention and semantic misunderstanding. Finally, through a quantitative experiment and a user study, we demonstrated that model attention computed by a perturbation-based method is most aligned with human attention and is often favored by human programmers. Our findings deepen the understanding of the code generation models and highlight the need for human-aligned LLMs for better interpretability and programmer trust.

Bonan Kou

Purdue University

Shengmai Chen

Purdue University

Zhijie Wang

University of Alberta

Canada

Lei Ma

The University of Tokyo & University of Alberta

Japan

Tianyi Zhang

Purdue University

United States

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Fri 19 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

10:30 - 11:00	Poster Session 4Posters at Lounge

10:30 30m Poster		Understanding the Impact of APIs Behavioral Breaking Changes on Client Applications Posters Dhanushka Jayasuriya University of Auckland, Valerio Terragni University of Auckland, Jens Dietrich Victoria University of Wellington, Kelly Blincoe University of Auckland
10:30 30m Poster		Your Code Secret Belongs to Me: Neural Code Completion Tools Can Memorize Hard-coded Credentials Posters Yizhan Huang The Chinese University of Hong Kong, Yichen LI The Chinese University of Hong Kong, Weibin Wu Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong
10:30 30m Poster		Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models Posters Yan Wang Central University of Finance and Economics, Xiaoning Li Central University of Finance and Economics, Tien N. Nguyen University of Texas at Dallas, Shaohua Wang Central University of Finance and Economics, Chao Ni School of Software Technology, Zhejiang University, Ling Ding Central University of Finance and Economics
10:30 30m Poster		PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages Posters Kai Gao University of Science and Technology Beijing, Weiwei Xu Peking University, Wenhao Yang Peking University, Minghui Zhou Peking University
10:30 30m Poster		"The Law Doesn’t Work Like a Computer": Exploring Software Licensing Issues Faced by Legal Practitioners Posters Nathan Wintersgill William & Mary, Trevor Stalnaker William & Mary, Laura A. Heymann William & Mary, Oscar Chaparro William & Mary, Denys Poshyvanyk William & Mary
10:30 30m Poster		RavenBuild: Context, Relevance, and Dependency Aware Build Outcome Prediction Posters Gengyi Sun University of Waterloo, Sarra Habchi Ubisoft Montréal, Shane McIntosh University of Waterloo
10:30 30m Poster		MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual Predictions Posters Ying Xiao King's College London / Southern University of Science and Technology, Jie M. Zhang King's College London, Yepang Liu Southern University of Science and Technology, Mohammad Reza Mousavi King's College London, Sicen Liu Southern University of Science and Technology, Dingyuan Xue Southern University of Science and Technology
10:30 30m Poster		Do Code Generation Models Think Like Us? - A Study of Attention Alignment between Large Language Models and Human Programmers Posters Bonan Kou Purdue University, Shengmai Chen Purdue University, Zhijie Wang University of Alberta, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University
10:30 30m Poster		Dependency-Induced Waste in Continuous Integration: An Empirical Study on NPM Dependencies Posters Nimmi Weeraddana University of Waterloo, Mahmoud Alfadel University of Waterloo, Shane McIntosh University of Waterloo
10:30 30m Poster		A Miss Is as Good as A Mile: Metamorphic Testing for Deep Learning Operators Posters Jinyin Chen Zhejiang University of Technology, Chengyu Jia Zhejiang University of Technology, Yunjie Yan Zhejiang University of Technology, Jie Ge Zhejiang University of Technology, haibin zheng Zhejiang University of Technology, Yao Cheng TÜV SÜD Asia Pacific Pte. Ltd.
10:30 30m Poster		Investigating Documented Privacy Changes in Android OS Posters Chuan Yan University of Queensland, Mark Huasong Meng National University of Singapore, Fuman Xie University of Queensland, Guangdong Bai University of Queensland
10:30 30m Poster		Analyzing Quantum Programs with LintQ: A Static Analysis Framework for Qiskit Posters Matteo Paltenghi University of Stuttgart, Michael Pradel University of Stuttgart
10:30 30m Poster		Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions Posters Tao Xiao Nara Institute of Science and Technology, Hideaki Hata Shinshu University, Christoph Treude Singapore Management University, Kenichi Matsumoto Nara Institute of Science and Technology
10:30 30m Poster		Bloat beneath Python's Scales: A Fine-Grained Inter-Project Dependency Analysis Posters Georgios-Petros Drosos ETH Zurich, Thodoris Sotiropoulos ETH Zurich, Diomidis Spinellis Athens University of Economics and Business & Delft University of Technology, Dimitris Mitropoulos University of Athens

Information for Participants

Fri 19 Jul 2024 10:30 - 11:00 at Lounge - Poster Session 4

Info for room Lounge:

This room is conjoined with the Foyer to provide additional space for the coffee break, and hold poster presentations throughout the event.