Do Code Generation Models Think Like Us? - A Study of Attention Alignment between Large Language Models and Human Programmers (FSE 2024 - Research Papers)

Mon 15 - Fri 19 July 2024 Porto de Galinhas, Brazil, Brazil

Who

Bonan Kou, Shengmai Chen, Zhijie Wang, Lei Ma, Tianyi Zhang

Track

FSE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 Jul 2024 11:18 - 11:36 at Mandacaru - Human Aspects 2 Chair(s): Bianca Trinkenreich

Abstract

Large Language Models (LLMs) have recently been widely used for code generation. Due to the complexity and opacity of LLMs, little is known about how these models generate code. We made the first attempt to bridge this knowledge gap by investigating whether LLMs attend to the same parts of a task description as human programmers during code generation. An analysis of six LLMs, including GPT-4, on two popular code generation benchmarks revealed a consistent misalignment between LLMs’ and programmers’ attention. We manually analyzed 211 incorrect code snippets and found five attention patterns that can be used to explain many code generation errors. Finally, a user study showed that model attention computed by a perturbation-based method is often favored by human programmers. Our findings highlight the need for human-aligned LLMs for better interpretability and programmer trust.

Link to Preprint

https://arxiv.org/abs/2306.01220

Bonan Kou

Purdue University

Shengmai Chen

Purdue University

Zhijie Wang

University of Alberta

Canada

Lei Ma

The University of Tokyo & University of Alberta

Japan

Tianyi Zhang

Purdue University

United States

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 18 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30	Human Aspects 2Research Papers at Mandacaru Chair(s): Bianca Trinkenreich Colorado State University

11:00 18m Talk		Can GPT-4 Replicate Empirical Software Engineering Research? Research Papers Jenny T. Liang Carnegie Mellon University, Carmen Badea Microsoft Research, Christian Bird Microsoft Research, Robert DeLine Microsoft Research, Denae Ford Microsoft Research, Nicole Forsgren Microsoft Research, Thomas Zimmermann Microsoft Research Pre-print
11:18 18m Talk		Do Code Generation Models Think Like Us? - A Study of Attention Alignment between Large Language Models and Human Programmers Research Papers Bonan Kou Purdue University, Shengmai Chen Purdue University, Zhijie Wang University of Alberta, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University Pre-print
11:36 18m Talk		Do Words Have Power? Understanding and Fostering Civility in Code Review Discussion Research Papers Md Shamimur Rahman University of Saskatchewan, Canada, Zadia Codabux University of Saskatchewan, Chanchal K. Roy University of Saskatchewan, Canada
11:54 18m Talk		Effective Teaching through Code Reviews: Patterns and Anti-Patterns Research Papers Anita Sarma Oregon State University, Nina Chen Google DOI
12:12 18m Talk		An empirical study on code review activity prediction in practice Research Papers Doriane Olewicki Queen's University, Sarra Habchi Ubisoft Montréal, Bram Adams Queen's University Pre-print