Bin2Summary: Beyond Function Name Prediction in Stripped Binaries with Functionality-specific Code Embeddings (FSE 2024 - Research Papers)

Mon 15 - Fri 19 July 2024 Porto de Galinhas, Brazil, Brazil

Who

Zirui Song, Jiongyi Chen, Kehuan Zhang

Track

FSE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Fri 19 Jul 2024 14:00 - 14:18 at Mandacaru - Program Analysis and Performance 3 Chair(s): Shaukat Ali

Abstract

Nowadays, closed-source software only with stripped binaries still dominates the ecosystem, which brings obstacles to understanding the functionalities of the software and further conducting the security analysis. With such an urgent need, research has traditionally focused on predicting function names, which can only provide fragmented and abbreviated information about functionality. To advance the state-of-the-art, this paper presents Bin2Summary to automatically summarize the functionality of the function in stripped binaries with natural language sentences. Specifically, the proposed framework includes a functionality-specific code embedding module to facilitate fine-grained similarity detection and an attention-based seq2seq model to generate summaries in natural language. Based on 16 widely-used projects (e.g., Coreutils), we have evaluated Bin2Summary with 38,167 functions, which are filtered from 162,406 functions, and all of them have a high-quality comment. Bin2Summary achieves 0.728 in precision and 0.729 in recall on our datasets, and the functionality-specific embedding module can improve the existing assembly language model by up to 109.5% and 109.9% in precision and recall. Meanwhile, the experiments demonstrated that Bin2Summary has outstanding transferability in analyzing the cross-architecture (i.e., in x64 and x86) and cross-environment (i.e., in Cygwin and MSYS2) binaries. Finally, the case study illustrates how Bin2Summary outperforms the existing works in providing functionality summaries with abundant semantics beyond function names.

Zirui Song

The Chinese University of Hong Kong

Jiongyi Chen

National University of Defense Technology

Kehuan Zhang