Natural Symbolic Execution-based Testing for Big Data Analytics
Symbolic execution is an automated test input generation technique that models individual program paths as logical constraints. However, the realism of concrete test inputs generated by SMT solvers often comes into question. Existing symbolic execution tools only seek arbitrary solutions for given path constraints. These constraints do not incorporate the naturalness of inputs that observe statistical distributions, range constraints, or preferred string constants. This results in unnatural-looking inputs that fail to emulate real-world data.
In this paper, we extend symbolic execution with consideration for incorporating naturalness. Our key insight is that users typically understand the semantics of program inputs, such as the distribution of height or possible values of zipcode, which can be leveraged to advance the ability of symbolic execution to produce natural test inputs. We instantiate this idea in NaturalSym, a symbolic execution-based test generation tool for data-intensive scalable computing (DISC) applications. NaturalSym generates natural-looking data that mimics real-world distributions by utilizing user-provided input semantics to drastically enhance the naturalness of input, while preserving strong bug-finding potential.
On DISC applications and commercial big data test benchmarks, NaturalSym achieves a higher degree of realism- as evidenced by perplexity score 35.1 points lower on median, and detects 1.29 injected faults compared to the state-of-the-art symbolic executor for DISC, BigTest. This is because BigTest draws inputs purely based on the satisfiability of path constraints constructed from branch predicates, while NaturalSym is able to draw natural concrete values based on user-specified semantics and prioritize using these values in input generation. Our empirical results demonstrate that NaturalSym finds injected faults 47.8× more than NaturalFuzz (a coverage-guided fuzzer), and 19.1× more than ChatGPT. Meanwhile, TestMiner (a mining-based approach) fails to detect any injected faults. NaturalSym is the first symbolic executor that combines the notion of input naturalness in symbolic path constraints during SMT-based input generation. We make our code available at https://github.com/UCLA-SEAL/NaturalSym.
Fri 19 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Testing 4Research Papers / Industry Papers at Pitanga Chair(s): Antonia Bertolino National Research Council, Italy | ||
11:00 18mTalk | Partial Solution Based Constraint Solving Cache in Symbolic Execution Research Papers Ziqi Shuai School of Computer, National University of Defense Technology, China, Zhenbang Chen College of Computer, National University of Defense Technology, Kelin Ma School of Computer, National University of Defense Technology, China, Kunlin Liu School of Computer, National University of Defense Technology, China, Yufeng Zhang Hunan University, Jun Sun School of Information Systems, Singapore Management University, Singapore, Ji Wang School of Computer, National University of Defense Technology, China Pre-print | ||
11:18 18mTalk | Natural Symbolic Execution-based Testing for Big Data Analytics Research Papers Yaoxuan Wu UCLA, Ahmad Humayun Virginia Tech, Muhammad Ali Gulzar Virginia Tech, Miryung Kim UCLA and Amazon Web Services Pre-print | ||
11:36 18mTalk | MTAS: A Reference-Free Approach for Evaluating Abstractive Summarization Systems Research Papers Xiaoyan Zhu Zhejiang Sci-Tech University, Mingyue Jiang Zhejiang Sci-Tech University, Xiao-Yi Zhang University of Science and Technology Beijing, Liming Nie Nanyang Technological University, Zuohua Ding Zhejiang Sci-Tech University | ||
11:54 18mTalk | Observation-based unit test generation at Meta Industry Papers Mark Harman Meta Platforms, Inc. and UCL, Rotem Tal Meta platforms, Alexandru Marginean Meta platforms, Eddy Wang Meta platforms, Nadia Alshahwan Meta Platforms | ||
12:12 18mTalk | Property-based Testing for Validating User Privacy-Related Functionalities in Social Media Apps Industry Papers Jingling Sun University of Electronic Science and Technology of China, Ting Su East China Normal University, Jun Sun School of Information Systems, Singapore Management University, Singapore, Jianwen Li East China Normal University, China, Mengfei Wang ByteDance, Geguang Pu East China Normal University, China |