DeSQL: Interactive Debugging of SQL in Data-Intensive Scalable Computing
Data-intensive scalable computing (DISC) frameworks, such as Apache Spark, support runtimes in many popular languages. Yet, SQL is still the most commonly used front-end language for DISC applications due to its broad presence in new and legacy workflows and shallow learning curve. However, DISC-backed SQL introduces several layers of abstraction that significantly reduce the visibility and transparency of workflows, making it challenging for developers to find and fix errors in a query. When a query returns incorrect outputs, it takes a non-trivial, manual effort to comprehend every stage of the query execution and find the root cause of bugs among the input data and complex SQL query. We aim to bring the benefits of step-through interactive debugging to DISC-powered SQL with DeSQL. When a SQL query is executed on a DISC system, DeSQL automatically decomposes it into subqueries and closely monitors the execution to identify the precise intermediate data corresponding to every constituent subquery. This enables a complete interactive debugging experience with full access to the intermediate query states. We evaluate DeSQL’s scalability, overhead, and efficiency against two baselines. The experiment results show that DeSQL can provide a complete debugging view in 13% less time than the original job time while incurring an average overhead of 10% in addition to retaining Apache Spark’s scale-out and scale-up properties. Through a user study comprising 10 participants engaged in two debugging tasks, we find that participants utilizing DeSQL identify the root cause behind a wrong query output in 75% less time than the de-facto, manual debugging.
Thu 18 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
| 16:00 - 18:00 | Log Analysis and DebuggingResearch Papers / Industry Papers at Acerola Chair(s): Domenico Bianculli University of Luxembourg | ||
| 16:0018m Talk | Go Static: Contextualized Logging Statement Generation Research Papers Yichen LI The Chinese University of Hong Kong, Yintong Huo The Chinese University of Hong Kong, Renyi Zhong The Chinese University of Hong Kong, Zhihan Jiang The Chinese University of Hong Kong, Jinyang Liu The Chinese University of Hong Kong, Junjie Huang The Chinese University of Hong Kong, Jiazhen Gu The Chinese University of Hong Kong, Pinjia He Chinese University of Hong Kong, Shenzhen, Michael Lyu The Chinese University of Hong Kong | ||
| 16:1818m Talk | DeSQL: Interactive Debugging of SQL in Data-Intensive Scalable Computing Research Papers | ||
| 16:3618m Talk | DTD: Comprehensive and Scalable Testing for Debuggers Research Papers Hongyi Lu Southern University of Science and Technology/Hong Kong University of Science and Technology, Zhibo Liu The Hong Kong University of Science and Technology, Shuai Wang The Hong Kong University of Science and Technology, Fengwei Zhang Southern University of Science and Technology | ||
| 16:549m Talk | Decoding Anomalies! Unraveling Operational Challenges in Human-in-the-Loop Anomaly Validation Industry Papers Dong Jae Kim Concordia University, Steven Locke , Tse-Hsun (Peter) Chen Concordia University, Andrei Toma ERA Environmental Management Solutions, Sarah Sajedi ERA Environmental Management Solutions, Steve Sporea , Laura Weinkam  | ||
| 17:0318m Talk | A Critical Review of Common Log Data Sets Used for Evaluation of Sequence-based Anomaly Detection Techniques Research Papers Max Landauer AIT Austrian Institute of Technology, Florian Skopik AIT Austrian Institute of Technology, Markus Wurzenberger AIT Austrian Institute of Technology | ||
| 17:2118m Research paper | LILAC: Log Parsing using LLMs with Adaptive Parsing Cache Research Papers Zhihan Jiang The Chinese University of Hong Kong, Jinyang Liu The Chinese University of Hong Kong, Zhuangbin Chen School of Software Engineering, Sun Yat-sen University, Yichen LI The Chinese University of Hong Kong, Junjie Huang The Chinese University of Hong Kong, Yintong Huo The Chinese University of Hong Kong, Pinjia He Chinese University of Hong Kong, Shenzhen, Jiazhen Gu The Chinese University of Hong Kong, Michael Lyu The Chinese University of Hong KongDOI Pre-print | ||
| 17:3918m Talk | TraStrainer: Adaptive Sampling for Distributed Traces with System Runtime State Research Papers Haiyu Huang Sun Yat-sen University, Xiaoyu Zhang HUAWEI CLOUD COMPUTING TECHNOLOGIES CO. LTD., Pengfei Chen Sun Yat-sen University, Zilong He Sun Yat-sen University, Zhiming Chen Sun Yat-sen University, Guangba  Yu Sun Yat-sen University, Hongyang Chen Sun Yat-sen University, Chen Sun HuaweiPre-print | ||

