Wed 17 Jul 2024 11:00 - 11:18 at Acerola - Software Maintenance and Comprehension 1 Chair(s): Wesley Assunção

Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function names. We propose Epitome, a framework that enhances function name prediction using votes-based name tokenization and multi-task learning, specifically tailored for different compilation optimization binaries. Epitome learns comprehensive function semantics by pre-trained assembly language model and graph neural network, incorporating function semantics similarity prediction task, to maximize the similarity of function semantics in the context of different compilation optimization levels. In addition, we present two data preprocessing methods to improve the comprehensibility of function names. We evaluate the performance of Epitome using 2,597,346 functions extracted from binaries compiled with 5 optimizations (O0-Os) for 4 architectures (x64, x86, ARM, and MIPS). Epitome outperforms the state-of-the-art function name prediction tool by up to 41.41%, 66.87%, and 54.34% in precision, recall, and F1 score, while also exhibiting superior generalizability. Finally, through case studies involving firmware images, we show the practical applications of Epitome in real-world scenarios.

Wed 17 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30
Software Maintenance and Comprehension 1Research Papers / Ideas, Visions and Reflections / Demonstrations at Acerola
Chair(s): Wesley Assunção North Carolina State University
11:00
18m
Talk
Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning
Research Papers
Xiaoling Zhang Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,, Zhengzi Xu Nanyang Technological University, shouguo yang Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, Zhi Li Institute of Information Engineering, Chinese Academy of Sciences, China, Zhiqiang Shi Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,, Limin Sun Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,
DOI Pre-print
11:18
18m
Talk
Only diff is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language ModelDistinguished Paper Award
Research Papers
Jiawei Li University of California, Irvine, David Faragó Innoopract GmbH & QPR Technologies, Christian Petrov Innoopract GmbH, Iftekhar Ahmed University of California, Irvine
11:36
18m
Talk
Towards Efficient Build Ordering for Incremental Builds with Multiple Configurations
Research Papers
Jun Lyu Nanjing University, Shanshan Li Software Institute, Nanjing University, He Zhang Nanjing University, Lanxin Yang Nanjing University, Bohan Liu Nanjing University, Manuel Rigger National University of Singapore
11:54
18m
Talk
Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example
Research Papers
Malinda Dilhara University of Colorado Boulder, Abhiram Bellur University of Colorado Boulder, Timofey Bryksin JetBrains Research, Danny Dig University of Colorado Boulder, JetBrains Research
Pre-print
12:12
9m
Talk
Variability-Aware Differencing with DiffDetectiveBest Demo Paper
Demonstrations
Paul Maximilian Bittner Paderborn University, Alexander Schultheiß Paderborn University, Benjamin Moosherr University of Ulm, Timo Kehrer University of Bern, Thomas Thüm Paderborn University
Pre-print Media Attached
12:21
9m
Talk
From Models to Practice: Enhancing OSS Project Sustainability with Evidence-Based Advice
Ideas, Visions and Reflections
Nafiz Imtiaz Khan Department of Computer Science, University of California, Davis, Vladimir Filkov University of California at Davis, USA
Link to publication DOI