Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models
Pre-trained Large Language Models (LLM) have achieved remarkable successes in several domains. However, code-oriented LLMs are often heavy in computational complexity, and quadratically with the length of the input code sequence. Toward simplifying the input program of an LLM, the state-of-the-art approach has the strategies to filter the input code tokens based on the attention scores given by the LLM. The decision to simplify the input program should not rely on the attention patterns of an LLM, as these patterns are influenced by both the model architecture and the pre-training dataset. Since the model and dataset are part of the solution domain, not the problem domain where the input program belongs, the outcome may differ when the model is trained on a different dataset. We propose SlimCode, a model-agnostic code simplification solution for LLMs that depends on the nature of input code tokens. As an empirical study on the LLMs including CodeBERT, CodeT5, and GPT-4 for two main tasks: code search and summarization. We reported that 1) the reduction ratio of code has a linear-like relation with the saving ratio on training time, 2) the impact of categorized tokens on code simplification can vary significantly, 3) the impact of categorized tokens on code simplification is task-specific but model-agnostic, and 4) the above findings hold for the paradigm–prompt engineering and interactive in-context learning and this study can save reduce the cost of invoking GPT-4 by 24%per API query. Importantly, SlimCode simplifies the input code with its greedy strategy and can obtain at most 133 times faster than the state-of-the-art technique with a significant improvement. This paper calls for a new direction on code-based, model-agnostic code simplification solutions to further empower LLMs.
Fri 19 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
10:30 - 11:00 | |||
10:30 30mPoster | Understanding the Impact of APIs Behavioral Breaking Changes on Client Applications Posters Dhanushka Jayasuriya University of Auckland, Valerio Terragni University of Auckland, Jens Dietrich Victoria University of Wellington, Kelly Blincoe University of Auckland | ||
10:30 30mPoster | Your Code Secret Belongs to Me: Neural Code Completion Tools Can Memorize Hard-coded Credentials Posters Yizhan Huang The Chinese University of Hong Kong, Yichen LI The Chinese University of Hong Kong, Weibin Wu Sun Yat-sen University, Jianping Zhang The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong | ||
10:30 30mPoster | Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models Posters Yan Wang Central University of Finance and Economics, Xiaoning Li Central University of Finance and Economics, Tien N. Nguyen University of Texas at Dallas, Shaohua Wang Central University of Finance and Economics, Chao Ni School of Software Technology, Zhejiang University, Ling Ding Central University of Finance and Economics | ||
10:30 30mPoster | PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages Posters Kai Gao University of Science and Technology Beijing, Weiwei Xu Peking University, Wenhao Yang Peking University, Minghui Zhou Peking University | ||
10:30 30mPoster | "The Law Doesn’t Work Like a Computer": Exploring Software Licensing Issues Faced by Legal Practitioners Posters Nathan Wintersgill William & Mary, Trevor Stalnaker William & Mary, Laura A. Heymann William & Mary, Oscar Chaparro William & Mary, Denys Poshyvanyk William & Mary | ||
10:30 30mPoster | RavenBuild: Context, Relevance, and Dependency Aware Build Outcome Prediction Posters Gengyi Sun University of Waterloo, Sarra Habchi Ubisoft Montréal, Shane McIntosh University of Waterloo | ||
10:30 30mPoster | MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual Predictions Posters Ying Xiao King's College London / Southern University of Science and Technology, Jie M. Zhang King's College London, Yepang Liu Southern University of Science and Technology, Mohammad Reza Mousavi King's College London, Sicen Liu Southern University of Science and Technology, Dingyuan Xue Southern University of Science and Technology | ||
10:30 30mPoster | Do Code Generation Models Think Like Us? - A Study of Attention Alignment between Large Language Models and Human Programmers Posters Bonan Kou Purdue University, Shengmai Chen Purdue University, Zhijie Wang University of Alberta, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University | ||
10:30 30mPoster | Dependency-Induced Waste in Continuous Integration: An Empirical Study on NPM Dependencies Posters Nimmi Weeraddana University of Waterloo, Mahmoud Alfadel University of Waterloo, Shane McIntosh University of Waterloo | ||
10:30 30mPoster | A Miss Is as Good as A Mile: Metamorphic Testing for Deep Learning Operators Posters Jinyin Chen Zhejiang University of Technology, Chengyu Jia Zhejiang University of Technology, Yunjie Yan Zhejiang University of Technology, Jie Ge Zhejiang University of Technology, haibin zheng Zhejiang University of Technology, Yao Cheng TÜV SÜD Asia Pacific Pte. Ltd. | ||
10:30 30mPoster | Investigating Documented Privacy Changes in Android OS Posters Chuan Yan University of Queensland, Mark Huasong Meng National University of Singapore, Fuman Xie University of Queensland, Guangdong Bai University of Queensland | ||
10:30 30mPoster | Analyzing Quantum Programs with LintQ: A Static Analysis Framework for Qiskit Posters | ||
10:30 30mPoster | Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions Posters Tao Xiao Nara Institute of Science and Technology, Hideaki Hata Shinshu University, Christoph Treude Singapore Management University, Kenichi Matsumoto Nara Institute of Science and Technology | ||
10:30 30mPoster | Bloat beneath Python's Scales: A Fine-Grained Inter-Project Dependency Analysis Posters Georgios-Petros Drosos ETH Zurich, Thodoris Sotiropoulos ETH Zurich, Diomidis Spinellis Athens University of Economics and Business & Delft University of Technology, Dimitris Mitropoulos University of Athens |
This room is conjoined with the Foyer to provide additional space for the coffee break, and hold poster presentations throughout the event.