Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-context Learning (FSE 2024 - Research Papers)

Who

Yubo Mai, Zhipeng Gao, Xing Hu, Lingfeng Bao , Yu Liu, JianLing Sun

Track

FSE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Jul 2024 14:00 - 14:18 at Pitomba - AI4SE 1 Chair(s): Mauro Pezze

Abstract

Nowadays, more and more developers resort to Stack Overflow for solutions (e.g., code snippets) when they encounter technical problems. Although domain experts provide huge amounts of valuable solutions in Stack Overflow, these code snippets are often difficult to reuse directly. Developers have to digest the information within relevant posts and make necessary modifications, and the whole solution-seeking process can be time-consuming and tedious. To facilitate the reuse of Stack Overflow code snippets, Terragni et al. first explored transforming a code snippet in Stack Overflow into a well-formed method API (Application Program Interface) by using a rule-based approach, named APIzator. The reported performance of their approach is promising, however, after our in-depth analysis of their experiment results, we find that (1) 92.5% of APIs generated by APIzator are pointless and thus are difficult to use in practice. This is because the method name generated by APIzator (verb + object) can rarely represent the method’s functionality, which can hardly be claimed as meaningful/reusable APIs. (2) The authors manually summarized a number of rules to identify parameter variables and return statements for Java methods. These hand-crafted rules are extremely complex and sophisticated, and the manual rule design process is labor-intensive and error-prone. Moreover, since these rules are designed for Java, they can hardly be extended to other programming languages.

Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Specifically, Code2API guides the LLMs through well-designed prompts to generate well-formed APIs for the given code snippet. To elicit knowledge and logical reasoning from LLMs, we used chain-of-thought (CoT) reasoning and few-shot in-context learning, which can help the LLMs fully understand the APIzation task and solve it step by step in a manner similar to a developer. Our evaluations show that Code2API achieves a remarkable accuracy in identifying method parameters (65%) and return statements (66%) equivalent to human-generated ones, surpassing the current state-of-the-art approach, APIzator, by 15.0% and 16.5% respectively. Moreover, compared with APIzator, our user study demonstrates that Code2API exhibits superior performance in generating meaningful method names, even surpassing the human-level performance, and developers are more willing to use APIs generated by our approach, highlighting the applicability of our tool in practice. Finally, we successfully extend our framework to the Python dataset, achieving a comparable performance with Java, which verifies the generalizability of our tool.

Yubo Mai

Zhejiang University

Zhipeng Gao

Shanghai Institute for Advanced Study - Zhejiang University

China

Xing Hu

Zhejiang University

China

Lingfeng Bao

Zhejiang University

China

Yu Liu

Zhejiang University

JianLing Sun

Zhejiang University

China

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30	AI4SE 1Research Papers at Pitomba Chair(s): Mauro Pezze USI Università della Svizzera Italiana & SIT Schaffhausen Institute of Technology

14:00 18m Talk		Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-context Learning Research Papers Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Xing Hu Zhejiang University, Lingfeng Bao Zhejiang University, Yu Liu Zhejiang University, JianLing Sun Zhejiang University
14:18 18m Talk		CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking Research Papers Zian Su Purdue University, Xiangzhe Xu Purdue University, Ziyang Huang Purdue University, Zhuo Zhang Purdue University, Yapeng Ye Purdue University, Jianjun Huang Renmin University of China, Xiangyu Zhang Purdue University
14:36 18m Talk		Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence Graphs Research Papers Yanfu Yan William & Mary, Nathan Cooper William & Mary, Kevin Moran University of Central Florida, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Denys Poshyvanyk William & Mary, Steve Rich Cisco Systems
14:54 18m Talk		Exploring and Unleashing the Power of Large Language Models in Automated Code Translation Research Papers Zhen Yang Shandong University, Fang Liu Beihang University, Zhongxing Yu Shandong University, Jacky Keung City University of Hong Kong, Jia Li Peking University, Shuo Liu City University of Hong Kong, Hong Yifan City University of Hong Kong, Xiaoxue Ma City University of Hong Kong, Zhi Jin Peking University, Ge Li Peking University Pre-print
15:12 18m Talk		Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection Research Papers Yuxi Li Huazhong University of Science and Technology, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Ying Zhang Virginia Tech, Wenjia Song Virginia Tech, Ling Shi Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Yuekang Li The University of New South Wales, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology