PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models
In recent times, a plethora of Large Code Generation Models (LCGMs) have been proposed, showcasing significant potential in assisting developers with complex programming tasks. Within the surge of LCGM proposals, a critical aspect of code generation research involves effectively benchmarking the programming capabilities of each model. Benchmarking LCGMs necessitates the creation of a diverse programming problem set, comprising the prompt, canonical solution, and test inputs. The existing methods for constructing such a problem set can be categorized into two main types: manually-based and perturbation-based. However, both these methods exhibit major limitations. Firstly, manually-based methods require substantial human effort and are not easily scalable. Moreover, programming problem sets created manually struggle to maintain long-term data integrity due to the greedy training data collection mechanism in LCGMs. On the other hand, perturbation-based approaches primarily produce semantically homogeneous problems, resulting in generated programming problems with identical Canonical Solutions to the seed problem. These methods also tend to introduce typos to the prompt, easily detectable by IDEs, rendering them unrealistic. Addressing the aforementioned limitations presents several challenges: (1) How to automatically generate semantically diverse Canonical Solutions, (2) how to ensure long-term data integrity, and (3) how to generate grammatically correct programming problems. To tackle the first challenge, our key insight stems from viewing a program as a mapping from the input domain to the output domain. The output of one program can be utilized as the input for another. Building on this insight, we propose programming problem merging, which combines two existing programming problems to create semantically diverse ones. In addressing the second challenge, we introduce randomness to our programming problem generation process. By defining a large random search space, our tool can probabilistically guarantee no data repetition with two random trials with high confidence. To tackle the third challenge, we propose the concept of a Lambda Programming Problem, comprising a concise one-sentence task description in natural language accompanied by a corresponding program implementation. As the proposed task description is grammatically correct, our tool ensures the new program prompt is also grammatically correct. Additionally, the tool leverages return value type analysis to verify the correctness of newly created Canonical Solutions. In our empirical evaluation, we utilize our tool on two widely-used datasets and compare it against six baseline methods using eight code generation models. The results vividly demonstrate the effectiveness of our tool in generating challenging, diverse, and natural coding problems, surpassing the baselines.
Fri 19 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
14:00 - 15:30 | |||
14:00 18mTalk | PPM: Automated Generation of Diverse Programming Problems for Benchmarking Code Generation Models Research Papers Simin Chen University of Texas at Dallas, XiaoNing Feng Taiyuan University of Technology, Xiaohong Han Taiyuan University of Technology, Cong Liu University of California, Riverside, Wei Yang University of Texas at Dallas | ||
14:18 18mTalk | Demystifying Invariant Effectiveness for Securing Smart Contracts Research Papers Zhiyang Chen University of Toronto, Ye Liu Nanyang Technological University, Sidi Mohamed Beillahi University of Toronto, Yi Li Nanyang Technological University, Fan Long University of Toronto Link to publication Pre-print Media Attached | ||
14:36 18mTalk | Static Application Security Testing (SAST) Tools for Smart Contracts: How Far Are We? Research Papers Kaixuan Li East China Normal University, Yue Xue Metatrust Labs, Sen Chen Tianjin University, Han Liu East China Normal University, Kairan Sun Nanyang Technological University, Ming Hu Singapore Management University, Haijun Wang Xi'an Jiaotong University, Yang Liu Nanyang Technological University, Yixiang Chen East China Normal University Pre-print | ||
14:54 18mTalk | On the Contents and Utility of IoT Cybersecurity Guidelines Research Papers Jesse Chen University of Arizona, Dharun Anandayuvaraj Purdue University, James C. Davis Purdue University, Sazzadur Rahaman University of Arizona DOI Pre-print | ||
15:12 18mTalk | CVECenter: Industry Practice of Automated Vulnerability Management for Linux Distribution Community Industry Papers Jing Luo Central South University, Heyuan Shi Central South University, Yongchao Zhang Alibaba, Runzhe Wang Alibaba Group, Yuheng Shen Tsinghua University, Yuao Chen Alibaba, Rongkai Liu Central South University, Xiaohai Shi Alibaba Group, Chao Hu Central South University, Yu Jiang Tsinghua University |