Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-context Learning
Nowadays, more and more developers resort to Stack Overflow for solutions (e.g., code snippets) when they encounter technical problems. Although domain experts provide huge amounts of valuable solutions in Stack Overflow, these code snippets are often difficult to reuse directly. Developers have to digest the information within relevant posts and make necessary modifications, and the whole solution-seeking process can be time-consuming and tedious. To facilitate the reuse of Stack Overflow code snippets, Terragni et al. first explored transforming a code snippet in Stack Overflow into a well-formed method API (Application Program Interface) by using a rule-based approach, named APIzator. The reported performance of their approach is promising, however, after our in-depth analysis of their experiment results, we find that (1) 92.5% of APIs generated by APIzator are pointless and thus are difficult to use in practice. This is because the method name generated by APIzator (verb + object) can rarely represent the method’s functionality, which can hardly be claimed as meaningful/reusable APIs. (2) The authors manually summarized a number of rules to identify parameter variables and return statements for Java methods. These hand-crafted rules are extremely complex and sophisticated, and the manual rule design process is labor-intensive and error-prone. Moreover, since these rules are designed for Java, they can hardly be extended to other programming languages.
Inspired by the great potential of Large Language Models (LLMs) for solving complex coding tasks, in this paper, we propose a novel approach, named Code2API, to automatically perform APIzation for Stack Overflow code snippets. Code2API does not require additional model training or any manual crafting rules and can be easily deployed on personal computers without relying on other external tools. Specifically, Code2API guides the LLMs through well-designed prompts to generate well-formed APIs for the given code snippet. To elicit knowledge and logical reasoning from LLMs, we used chain-of-thought (CoT) reasoning and few-shot in-context learning, which can help the LLMs fully understand the APIzation task and solve it step by step in a manner similar to a developer. Our evaluations show that Code2API achieves a remarkable accuracy in identifying method parameters (65%) and return statements (66%) equivalent to human-generated ones, surpassing the current state-of-the-art approach, APIzator, by 15.0% and 16.5% respectively. Moreover, compared with APIzator, our user study demonstrates that Code2API exhibits superior performance in generating meaningful method names, even surpassing the human-level performance, and developers are more willing to use APIs generated by our approach, highlighting the applicability of our tool in practice. Finally, we successfully extend our framework to the Python dataset, achieving a comparable performance with Java, which verifies the generalizability of our tool.
Wed 17 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
14:00 - 15:30 | AI4SE 1Research Papers at Pitomba Chair(s): Mauro Pezze USI Università della Svizzera Italiana & SIT Schaffhausen Institute of Technology | ||
14:00 18mTalk | Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-context Learning Research Papers Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Xing Hu Zhejiang University, Lingfeng Bao Zhejiang University, Yu Liu Zhejiang University, JianLing Sun Zhejiang University | ||
14:18 18mTalk | CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking Research Papers Zian Su Purdue University, Xiangzhe Xu Purdue University, Ziyang Huang Purdue University, Zhuo Zhang Purdue University, Yapeng Ye Purdue University, Jianjun Huang Renmin University of China, Xiangyu Zhang Purdue University | ||
14:36 18mTalk | Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence Graphs Research Papers Yanfu Yan William & Mary, Nathan Cooper William & Mary, Kevin Moran University of Central Florida, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Denys Poshyvanyk William & Mary, Steve Rich Cisco Systems | ||
14:54 18mTalk | Exploring and Unleashing the Power of Large Language Models in Automated Code Translation Research Papers Zhen Yang Shandong University, Fang Liu Beihang University, Zhongxing Yu Shandong University, Jacky Keung City University of Hong Kong, Jia Li Peking University, Shuo Liu City University of Hong Kong, Hong Yifan City University of Hong Kong, Xiaoxue Ma City University of Hong Kong, Zhi Jin Peking University, Ge Li Peking University Pre-print | ||
15:12 18mTalk | Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection Research Papers Yuxi Li Huazhong University of Science and Technology, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Ying Zhang Virginia Tech, Wenjia Song Virginia Tech, Ling Shi Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Yuekang Li The University of New South Wales, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology |