Fri 19 Jul 2024 14:00 - 14:18 at Acerola - Security and Privacy 2 Chair(s): Kihong Heo

In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Within the surge of LCGM proposals, a critical aspect of code generation research involves effectively benchmarking the programming capabilities of each model. Benchmarking LCGMs necessitates the creation of a diverse programming problem set, comprising the prompt, canonical solution, and test inputs. The existing methods for constructing such a problem set can be categorized into two main types: manually-based and perturbation-based. However, both these methods exhibit major limitations. Firstly, manually-based methods require substantial human effort and are not easily scalable. Moreover, programming problem sets created manually struggle to maintain long-term data integrity due to the greedy training data collection mechanism in LCGMs. On the other hand, perturbation-based approaches primarily produce semantically homogeneous problems, resulting in generated programming problems with identical Canonical Solutions to the seed problem. These methods also tend to introduce typos to the prompt, easily detectable by IDEs, rendering them unrealistic. Addressing the aforementioned limitations presents several challenges: (1) How to automatically generate semantically diverse Canonical Solutions, (2) how to ensure long-term data integrity, and (3) how to generate grammatically correct programming problems. To tackle the first challenge, our key insight stems from viewing a program as a mapping from the input domain to the output domain. The output of one program can be utilized as the input for another. Building on this insight, we propose programming problem merging, which combines two existing programming problems to create semantically diverse ones. In addressing the second challenge, we introduce randomness to our programming problem generation process. By defining a large random search space, our tool can probabilistically guarantee no data repetition with two random trials with high confidence. To tackle the third challenge, we propose the concept of a Lambda Programming Problem, comprising a concise one-sentence task description in natural language accompanied by a corresponding program implementation. As the proposed task description is grammatically correct, our tool ensures the new program prompt is also grammatically correct. Additionally, the tool leverages return value type analysis to verify the correctness of newly created Canonical Solutions. In our empirical evaluation, we utilize our tool on two widely-used datasets and compare it against six baseline methods using eight code generation models. The results vividly demonstrate the effectiveness of our tool in generating challenging, diverse, and natural coding problems, surpassing the baselines.

Fri 19 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
Security and Privacy 2Industry Papers / Research Papers at Acerola
Chair(s): Kihong Heo KAIST
14:00
18m
Talk
PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models
Research Papers
Simin Chen University of Texas at Dallas, XiaoNing Feng Taiyuan University of Technology, Xiaohong Han Taiyuan University of Technology, Cong Liu University of California, Riverside, Wei Yang University of Texas at Dallas
14:18
18m
Talk
Demystifying Invariant Effectiveness for Securing Smart Contracts
Research Papers
Zhiyang Chen University of Toronto, Ye Liu Nanyang Technological University, Sidi Mohamed Beillahi University of Toronto, Yi Li Nanyang Technological University, Fan Long University of Toronto
Link to publication Pre-print Media Attached
14:36
18m
Talk
Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We?Distinguished Paper Award
Research Papers
Kaixuan Li East China Normal University, Yue Xue Metatrust Labs, Sen Chen Tianjin University, Han Liu East China Normal University, Kairan Sun Nanyang Technological University, Ming Hu Singapore Management University, Haijun Wang Xi'an Jiaotong University, Yang Liu Nanyang Technological University, Yixiang Chen East China Normal University
Pre-print
14:54
18m
Talk
On the Contents and Utility of IoT Cybersecurity Guidelines
Research Papers
Jesse Chen University of Arizona, Dharun Anandayuvaraj Purdue University, James C. Davis Purdue University, Sazzadur Rahaman University of Arizona
DOI Pre-print
15:12
18m
Talk
CVECenter: Industry Practice of Automated Vulnerability Management for Linux Distribution Community
Industry Papers
Jing Luo Central South University, Heyuan Shi Central South University, Yongchao Zhang Alibaba, Runzhe Wang Alibaba Group, Yuheng Shen Tsinghua University, Yuao Chen Alibaba, Rongkai Liu Central South University, Xiaohai Shi Alibaba Group, Chao Hu Central South University, Yu Jiang Tsinghua University