Exploring and Unleashing the Power of Large Language Models in Automated Code Translation
Automated code translation tools, namely transpilers, are developed for source-to-source translation (e.g., Java to Python) in an automated fashion. Current state-of-the-art learning-based transpilers (e.g., TransCoder) have demonstrated impressive enhancement in both translation accuracy and readability compared with rule-based counterparts (e.g., j2py). This is largely attributed to their employment of task-specific pre-training on extensive monolingual corpora. Nonetheless, despite these advancements, their current performance remains unsatisfactory for practical deployment, and the associated training resources are also prohibitively expensive. Large Language Models (LLMs), pre-trained on huge amounts of human-written code/text, have shown remarkable performance in many software engineering fields (e.g., code generation and program repair) due to their powerful generality, even without task-specific re-training/fine-tuning. Thus, LLMs can potentially circumvent the above limitations, but they have not been exhaustively explored yet.
In this paper, we perform the first extensive study on five LLMs and three state-of-the-art learning-based transpilers for automated code translation tasks between Python, Java, and C++. Our investigation finds that, although certain LLMs have outperformed current transpilers, they still have some accuracy issues. Taking GPT-3.5, one of the state-of-the-art LLMs, as an example, we carry out an in-depth analysis and categorization of its failures. Results demonstrate most of the failures are induced by (1) a lack of comprehension of the source programs (38.51%), (2) missing clear instructions on Input/Output (I/O) types in translation (14.94%), and (3) ignoring the discrepancies between source and target programs (41.38%).
Enlightened by the above findings, we further propose \textbf{UniTrans}, an \textbf{Uni}fied code \textbf{Trans}lation framework, applicable to various LLMs, for unleashing their power in this field. Specifically, \textbf{UniTrans} first craft a series of test cases for target programs with the assistance of the source programs. Next, as test cases imply requirements of programs for comprehension and carry the specific I/O type instructions, \textbf{UniTrans} harnesses the above test cases to augment the code translation and then evaluate their correctness via execution. Afterward, to alleviate failures brought by discrepancy ignorance, \textbf{UniTrans} further repairs incorrectly translated programs prompted by test case execution results, where an option of iterative repair is also provided for practitioners. Extensive experiments are conducted on six settings of translation datasets between Python, Java, and C++. Three state-of-the-art LLMs of diverse sizes, including GPT-3.5, LLaMA-13B, and LLaMA-7B, are tested with \textbf{UniTrans}, and all achieve substantial improvements in terms of Computational Accuracy (CA) and Exact Match Accuracy (EM Acc) among almost all translation settings, showing the universal effectiveness of \textbf{UniTrans} in practice.
Wed 17 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
14:00 - 15:30 | AI4SE 1Research Papers at Pitomba Chair(s): Mauro Pezze USI Università della Svizzera Italiana & SIT Schaffhausen Institute of Technology | ||
14:00 18mTalk | Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-context Learning Research Papers Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Xing Hu Zhejiang University, Lingfeng Bao Zhejiang University, Yu Liu Zhejiang University, JianLing Sun Zhejiang University | ||
14:18 18mTalk | CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking Research Papers Zian Su Purdue University, Xiangzhe Xu Purdue University, Ziyang Huang Purdue University, Zhuo Zhang Purdue University, Yapeng Ye Purdue University, Jianjun Huang Renmin University of China, Xiangyu Zhang Purdue University | ||
14:36 18mTalk | Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence Graphs Research Papers Yanfu Yan William & Mary, Nathan Cooper William & Mary, Kevin Moran University of Central Florida, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Denys Poshyvanyk William & Mary, Steve Rich Cisco Systems | ||
14:54 18mTalk | Exploring and Unleashing the Power of Large Language Models in Automated Code Translation Research Papers Zhen Yang Shandong University, Fang Liu Beihang University, Zhongxing Yu Shandong University, Jacky Keung City University of Hong Kong, Jia Li Peking University, Shuo Liu City University of Hong Kong, Hong Yifan City University of Hong Kong, Xiaoxue Ma City University of Hong Kong, Zhi Jin Peking University, Ge Li Peking University Pre-print | ||
15:12 18mTalk | Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection Research Papers Yuxi Li Huazhong University of Science and Technology, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Ying Zhang Virginia Tech, Wenjia Song Virginia Tech, Ling Shi Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Yuekang Li The University of New South Wales, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology |