Wed 17 Jul 2024 14:54 - 15:12 at Pitomba - AI4SE 1 Chair(s): Mauro Pezze

Automated code translation tools, namely transpilers, are developed for source-to-source translation (e.g., Java to Python) in an automated fashion. Current state-of-the-art learning-based transpilers (e.g., TransCoder) have demonstrated impressive enhancement in both translation accuracy and readability compared with rule-based counterparts (e.g., j2py). This is largely attributed to their employment of task-specific pre-training on extensive monolingual corpora. Nonetheless, despite these advancements, their current performance remains unsatisfactory for practical deployment, and the associated training resources are also prohibitively expensive. Large Language Models (LLMs), pre-trained on huge amounts of human-written code/text, have shown remarkable performance in many software engineering fields (e.g., code generation and program repair) due to their powerful generality, even without task-specific re-training/fine-tuning. Thus, LLMs can potentially circumvent the above limitations, but they have not been exhaustively explored yet.

In this paper, we perform the first extensive study on five LLMs and three state-of-the-art learning-based transpilers for automated code translation tasks between Python, Java, and C++. Our investigation finds that, although certain LLMs have outperformed current transpilers, they still have some accuracy issues. Taking GPT-3.5, one of the state-of-the-art LLMs, as an example, we carry out an in-depth analysis and categorization of its failures. Results demonstrate most of the failures are induced by (1) a lack of comprehension of the source programs (38.51%), (2) missing clear instructions on Input/Output (I/O) types in translation (14.94%), and (3) ignoring the discrepancies between source and target programs (41.38%).

Enlightened by the above findings, we further propose \textbf{UniTrans}, an \textbf{Uni}fied code \textbf{Trans}lation framework, applicable to various LLMs, for unleashing their power in this field. Specifically, \textbf{UniTrans} first craft a series of test cases for target programs with the assistance of the source programs. Next, as test cases imply requirements of programs for comprehension and carry the specific I/O type instructions, \textbf{UniTrans} harnesses the above test cases to augment the code translation and then evaluate their correctness via execution. Afterward, to alleviate failures brought by discrepancy ignorance, \textbf{UniTrans} further repairs incorrectly translated programs prompted by test case execution results, where an option of iterative repair is also provided for practitioners. Extensive experiments are conducted on six settings of translation datasets between Python, Java, and C++. Three state-of-the-art LLMs of diverse sizes, including GPT-3.5, LLaMA-13B, and LLaMA-7B, are tested with \textbf{UniTrans}, and all achieve substantial improvements in terms of Computational Accuracy (CA) and Exact Match Accuracy (EM Acc) among almost all translation settings, showing the universal effectiveness of \textbf{UniTrans} in practice.

Wed 17 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
AI4SE 1Research Papers at Pitomba
Chair(s): Mauro Pezze USI Università della Svizzera Italiana & SIT Schaffhausen Institute of Technology
14:00
18m
Talk
Are Human Rules Necessary? Generating Reusable APIs with CoT Reasoning and In-context Learning
Research Papers
Yubo Mai Zhejiang University, Zhipeng Gao Shanghai Institute for Advanced Study - Zhejiang University, Xing Hu Zhejiang University, Lingfeng Bao Zhejiang University, Yu Liu Zhejiang University, JianLing Sun Zhejiang University
14:18
18m
Talk
CodeArt: Better Code Models by Attention Regularization When Symbols Are Lacking
Research Papers
Zian Su Purdue University, Xiangzhe Xu Purdue University, Ziyang Huang Purdue University, Zhuo Zhang Purdue University, Yapeng Ye Purdue University, Jianjun Huang Renmin University of China, Xiangyu Zhang Purdue University
14:36
18m
Talk
Enhancing Code Understanding for Impact Analysis by Combining Transformers and Program Dependence Graphs
Research Papers
Yanfu Yan William & Mary, Nathan Cooper William & Mary, Kevin Moran University of Central Florida, Gabriele Bavota Software Institute @ Università della Svizzera Italiana, Denys Poshyvanyk William & Mary, Steve Rich Cisco Systems
14:54
18m
Talk
Exploring and Unleashing the Power of Large Language Models in Automated Code Translation
Research Papers
Zhen Yang Shandong University, Fang Liu Beihang University, Zhongxing Yu Shandong University, Jacky Keung City University of Hong Kong, Jia Li Peking University, Shuo Liu City University of Hong Kong, Hong Yifan City University of Hong Kong, Xiaoxue Ma City University of Hong Kong, Zhi Jin Peking University, Ge Li Peking University
Pre-print
15:12
18m
Talk
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection
Research Papers
Yuxi Li Huazhong University of Science and Technology, Yi Liu Nanyang Technological University, Gelei Deng Nanyang Technological University, Ying Zhang Virginia Tech, Wenjia Song Virginia Tech, Ling Shi Nanyang Technological University, Kailong Wang Huazhong University of Science and Technology, Yuekang Li The University of New South Wales, Yang Liu Nanyang Technological University, Haoyu Wang Huazhong University of Science and Technology