Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning
Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function names. We propose Epitome, a framework that enhances function name prediction using votes-based name tokenization and multi-task learning, specifically tailored for different compilation optimization binaries. Epitome learns comprehensive function semantics by pre-trained assembly language model and graph neural network, incorporating function semantics similarity prediction task, to maximize the similarity of function semantics in the context of different compilation optimization levels. In addition, we present two data preprocessing methods to improve the comprehensibility of function names. We evaluate the performance of Epitome using 2,597,346 functions extracted from binaries compiled with 5 optimizations (O0-Os) for 4 architectures (x64, x86, ARM, and MIPS). Epitome outperforms the state-of-the-art function name prediction tool by up to 41.41%, 66.87%, and 54.34% in precision, recall, and F1 score, while also exhibiting superior generalizability. Finally, through case studies involving firmware images, we show the practical applications of Epitome in real-world scenarios.
Wed 17 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Software Maintenance and Comprehension 1Research Papers / Ideas, Visions and Reflections / Demonstrations at Acerola Chair(s): Wesley Assunção North Carolina State University | ||
11:00 18mTalk | Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning Research Papers Xiaoling Zhang Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,, Zhengzi Xu Nanyang Technological University, shouguo yang Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, Zhi Li Institute of Information Engineering, Chinese Academy of Sciences, China, Zhiqiang Shi Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,, Limin Sun Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences, DOI Pre-print | ||
11:18 18mTalk | Only diff is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language Model Research Papers Jiawei Li University of California, Irvine, David Faragó Innoopract GmbH & QPR Technologies, Christian Petrov Innoopract GmbH, Iftekhar Ahmed University of California, Irvine | ||
11:36 18mTalk | Towards Efficient Build Ordering for Incremental Builds with Multiple Configurations Research Papers Jun Lyu Nanjing University, Shanshan Li Software Institute, Nanjing University, He Zhang Nanjing University, Lanxin Yang Nanjing University, Bohan Liu Nanjing University, Manuel Rigger National University of Singapore | ||
11:54 18mTalk | Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example Research Papers Malinda Dilhara University of Colorado Boulder, Abhiram Bellur University of Colorado Boulder, Timofey Bryksin JetBrains Research, Danny Dig University of Colorado Boulder, JetBrains Research Pre-print | ||
12:12 9mTalk | Variability-Aware Differencing with DiffDetectiveBest Demo Paper Demonstrations Paul Maximilian Bittner Paderborn University, Alexander Schultheiß Paderborn University, Benjamin Moosherr University of Ulm, Timo Kehrer University of Bern, Thomas Thüm Paderborn University Pre-print Media Attached | ||
12:21 9mTalk | From Models to Practice: Enhancing OSS Project Sustainability with Evidence-Based Advice Ideas, Visions and Reflections Nafiz Imtiaz Khan Department of Computer Science, University of California, Davis, Vladimir Filkov University of California at Davis, USA Link to publication DOI |