Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning (FSE 2024 - Research Papers)

Who

Xiaoling Zhang, Zhengzi Xu, shouguo yang, Zhi Li, Zhiqiang Shi, Limin Sun

Track

FSE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 17 Jul 2024 11:00 - 11:18 at Acerola - Software Maintenance and Comprehension 1 Chair(s): Wesley Assunção

Abstract

Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function names. We propose Epitome, a framework that enhances function name prediction using votes-based name tokenization and multi-task learning, specifically tailored for different compilation optimization binaries. Epitome learns comprehensive function semantics by pre-trained assembly language model and graph neural network, incorporating function semantics similarity prediction task, to maximize the similarity of function semantics in the context of different compilation optimization levels. In addition, we present two data preprocessing methods to improve the comprehensibility of function names. We evaluate the performance of Epitome using 2,597,346 functions extracted from binaries compiled with 5 optimizations (O0-Os) for 4 architectures (x64, x86, ARM, and MIPS). Epitome outperforms the state-of-the-art function name prediction tool by up to 41.41%, 66.87%, and 54.34% in precision, recall, and F1 score, while also exhibiting superior generalizability. Finally, through case studies involving firmware images, we show the practical applications of Epitome in real-world scenarios.

Link to Preprint

https://arxiv.org/abs/2405.09112

DOI

https://doi.org/10.1145/3660782

Xiaoling Zhang

Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,

Zhengzi Xu

Nanyang Technological University

Singapore

shouguo yang

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China

Zhi Li

Institute of Information Engineering, Chinese Academy of Sciences, China

Zhiqiang Shi

Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,

Limin Sun

Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,

China

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 17 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30	Software Maintenance and Comprehension 1Research Papers / Ideas, Visions and Reflections / Demonstrations at Acerola Chair(s): Wesley Assunção North Carolina State University

11:00 18m Talk		Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning Research Papers Xiaoling Zhang Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,, Zhengzi Xu Nanyang Technological University, shouguo yang Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, Zhi Li Institute of Information Engineering, Chinese Academy of Sciences, China, Zhiqiang Shi Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences,, Limin Sun Institute of Information Engineering, Chinese Academy of Sciences, School of Cyber Security, University of Chinese Academy of Sciences, DOI Pre-print
11:18 18m Talk		Only diff is Not Enough: Generating Commit Messages Leveraging Reasoning and Action of Large Language Model Research Papers Jiawei Li University of California, Irvine, David Faragó Innoopract GmbH & QPR Technologies, Christian Petrov Innoopract GmbH, Iftekhar Ahmed University of California, Irvine
11:36 18m Talk		Towards Efficient Build Ordering for Incremental Builds with Multiple Configurations Research Papers Jun Lyu Nanjing University, Shanshan Li Software Institute, Nanjing University, He Zhang Nanjing University, Lanxin Yang Nanjing University, Bohan Liu Nanjing University, Manuel Rigger National University of Singapore
11:54 18m Talk		Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example Research Papers Malinda Dilhara University of Colorado Boulder, Abhiram Bellur University of Colorado Boulder, Timofey Bryksin JetBrains Research, Danny Dig University of Colorado Boulder, JetBrains Research Pre-print
12:12 9m Talk		Variability-Aware Differencing with DiffDetectiveBest Demo Paper Demonstrations Paul Maximilian Bittner Paderborn University, Alexander Schultheiß Paderborn University, Benjamin Moosherr University of Ulm, Timo Kehrer University of Bern, Thomas Thüm Paderborn University Pre-print Media Attached
12:21 9m Talk		From Models to Practice: Enhancing OSS Project Sustainability with Evidence-Based Advice Ideas, Visions and Reflections Nafiz Imtiaz Khan Department of Computer Science, University of California, Davis, Vladimir Filkov University of California at Davis, USA Link to publication DOI