DeSQL: Interactive Debugging of SQL in Data-Intensive Scalable Computing
Data-intensive scalable computing (DISC) frameworks, such as Apache Spark, support runtimes in many popular languages. Yet, SQL is still the most commonly used front-end language for DISC applications due to its broad presence in new and legacy workflows and shallow learning curve. However, DISC-backed SQL introduces several layers of abstraction that significantly reduce the visibility and transparency of workflows, making it challenging for developers to find and fix errors in a query. When a query returns incorrect outputs, it takes a non-trivial, manual effort to comprehend every stage of the query execution and find the root cause of bugs among the input data and complex SQL query. We aim to bring the benefits of step-through interactive debugging to DISC-powered SQL with DeSQL. When a SQL query is executed on a DISC system, DeSQL automatically decomposes it into subqueries and closely monitors the execution to identify the precise intermediate data corresponding to every constituent subquery. This enables a complete interactive debugging experience with full access to the intermediate query states. We evaluate DeSQL’s scalability, overhead, and efficiency against two baselines. The experiment results show that DeSQL can provide a complete debugging view in 13% less time than the original job time while incurring an average overhead of 10% in addition to retaining Apache Spark’s scale-out and scale-up properties. Through a user study comprising 10 participants engaged in two debugging tasks, we find that participants utilizing DeSQL identify the root cause behind a wrong query output in 75% less time than the de-facto, manual debugging.
Thu 18 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 18:00 | Log Analysis and DebuggingResearch Papers / Industry Papers at Acerola Chair(s): Domenico Bianculli University of Luxembourg | ||
16:00 18mTalk | Go Static: Contextualized Logging Statement Generation Research Papers Yichen LI The Chinese University of Hong Kong, Yintong Huo The Chinese University of Hong Kong, Renyi Zhong The Chinese University of Hong Kong, Zhihan Jiang The Chinese University of Hong Kong, Jinyang Liu The Chinese University of Hong Kong, Junjie Huang The Chinese University of Hong Kong, Jiazhen Gu The Chinese University of Hong Kong, Pinjia He Chinese University of Hong Kong, Shenzhen, Michael Lyu The Chinese University of Hong Kong | ||
16:18 18mTalk | DeSQL: Interactive Debugging of SQL in Data-Intensive Scalable Computing Research Papers | ||
16:36 18mTalk | DTD: Comprehensive and Scalable Testing for Debuggers Research Papers Hongyi Lu Southern University of Science and Technology/Hong Kong University of Science and Technology, Zhibo Liu The Hong Kong University of Science and Technology, Shuai Wang The Hong Kong University of Science and Technology, Fengwei Zhang Southern University of Science and Technology | ||
16:54 9mTalk | Decoding Anomalies! Unraveling Operational Challenges in Human-in-the-Loop Anomaly Validation Industry Papers Dong Jae Kim Concordia University, Steven Locke , Tse-Hsun (Peter) Chen Concordia University, Andrei Toma ERA Environmental Management Solutions, Sarah Sajedi ERA Environmental Management Solutions, Steve Sporea , Laura Weinkam | ||
17:03 18mTalk | A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques Research Papers Max Landauer AIT Austrian Institute of Technology, Florian Skopik AIT Austrian Institute of Technology, Markus Wurzenberger AIT Austrian Institute of Technology | ||
17:21 18mResearch paper | LILAC: Log Parsing using LLMs with Adaptive Parsing Cache Research Papers Zhihan Jiang The Chinese University of Hong Kong, Jinyang Liu The Chinese University of Hong Kong, Zhuangbin Chen School of Software Engineering, Sun Yat-sen University, Yichen LI The Chinese University of Hong Kong, Junjie Huang The Chinese University of Hong Kong, Yintong Huo The Chinese University of Hong Kong, Pinjia He Chinese University of Hong Kong, Shenzhen, Jiazhen Gu The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong Kong DOI Pre-print | ||
17:39 18mTalk | TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State Research Papers Haiyu Huang Sun Yat-sen University, Xiaoyu Zhang HUAWEI CLOUD COMPUTING TECHNOLOGIES CO. LTD., Pengfei Chen Sun Yat-sen University, Zilong He Sun Yat-sen University, Zhiming Chen Sun Yat-sen University, Guangba Yu Sun Yat-sen University, Hongyang Chen Sun Yat-sen University, Chen Sun Huawei Pre-print |