Wed 17 Jul 2024 14:00 - 14:18 at Pitomba - AI4SE 1 Chair(s): Mauro Pezze

Nowadays, more and more developers resort to Stack Overflow for solutions (e.g., code snippets) when they encounter technical problems. Although domain experts provide huge amounts of valuable solutions in Stack Overflow, these code snippets are often difficult to reuse directly. Developers have to digest the information within relevant posts and make necessary modifications, and the whole solution-seeking process can be time-consuming and tedious. To facilitate the reuse of Stack Overflow code snippets, Terragni et al. first explored transforming a code snippet in Stack Overflow into a well-formed method API (Application Program Interface) by using a rule-based approach, named APIzator. The reported performance of their approach is promising, however, after our in-depth analysis of their experiment results, we find that (1) 92.5% of APIs generated by APIzator are pointless and thus are difficult to use in practice. This is because the method name generated by APIzator (verb + object) can rarely represent the method’s functionality, which can hardly be claimed as meaningful/reusable APIs. (2) The authors manually summarized a number of rules to identify parameter variables and return statements for Java methods. These hand-crafted rules are extremely complex and sophisticated, and the manual rule design process is labor-intensive and error-prone. Moreover, since these rules are designed for Java, they can hardly be extended to other programming languages.

Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Specifically, Code2API guides the LLMs through well-designed prompts to generate well-formed APIs for the given code snippet. To elicit knowledge and logical reasoning from LLMs, we used chain-of-thought (CoT) reasoning and few-shot in-context learning, which can help the LLMs fully understand the APIzation task and solve it step by step in a manner similar to a developer. Our evaluations show that Code2API achieves a remarkable accuracy in identifying method parameters (65%) and return statements (66%) equivalent to human-generated ones, surpassing the current state-of-the-art approach, APIzator, by 15.0% and 16.5% respectively. Moreover, compared with APIzator, our user study demonstrates that Code2API exhibits superior performance in generating meaningful method names, even surpassing the human-level performance, and developers are more willing to use APIs generated by our approach, highlighting the applicability of our tool in practice. Finally, we successfully extend our framework to the Python dataset, achieving a comparable performance with Java, which verifies the generalizability of our tool.

Wed 17 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
AI4SE 1Research Papers at Pitomba
Chair(s): Mauro Pezze USI Università della Svizzera Italiana & SIT Schaffhausen Institute of Technology
14:00
18m
Talk
Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-context Learning
Research Papers
Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Xing Hu Zhejiang University, Lingfeng Bao Zhejiang University, Yu Liu Zhejiang University, JianLing Sun Zhejiang University
14:18
18m
Talk
CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking
Research Papers
Zian Su Purdue University, Xiangzhe Xu Purdue University, Ziyang Huang Purdue University, Zhuo Zhang Purdue University, Yapeng Ye Purdue University, Jianjun Huang Renmin University of China, Xiangyu Zhang Purdue University
14:36
18m
Talk
Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence Graphs
Research Papers
Yanfu Yan William & Mary, Nathan Cooper William & Mary, Kevin Moran University of Central Florida, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Denys Poshyvanyk William & Mary, Steve Rich Cisco Systems
14:54
18m
Talk
Exploring and Unleashing the Power of Large Language Models in Automated Code Translation
Research Papers
Zhen Yang Shandong University, Fang Liu Beihang University, Zhongxing Yu Shandong University, Jacky Keung City University of Hong Kong, Jia Li Peking University, Shuo Liu City University of Hong Kong, Hong Yifan City University of Hong Kong, Xiaoxue Ma City University of Hong Kong, Zhi Jin Peking University, Ge Li Peking University
Pre-print
15:12
18m
Talk
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection
Research Papers
Yuxi Li Huazhong University of Science and Technology, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Ying Zhang Virginia Tech, Wenjia Song Virginia Tech, Ling Shi Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Yuekang Li The University of New South Wales, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology