ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems
In large-scale online service systems, the occurrence of software changes is inevitable and frequent. Despite rigorous pre-deployment testing practices, the presence of defective software changes in the online environment cannot be completely eliminated. Consequently, there is a pressing need for automated techniques that can effectively identify these defective changes. However, the current abnormal change detection (ACD) approaches fall short in accurately pinpointing defective changes, primarily due to their disregard for the propagation of faults. To address the limitations of ACD, we propose a novel concept called root cause change analysis (RCCA) to identify the underlying root causes of change-inducing incidents. In order to apply the RCCA concept to practical scenarios, we have devised an intelligent RCCA framework named ChangeRCA. This framework aims to localize the defective change associated with change-inducing incidents among multiple changes. To assess the effectiveness of ChangeRCA, we have conducted an extensive evaluation utilizing a real-world dataset from WeChat and a simulated dataset encompassing 81 diverse defective changes. The evaluation results demonstrate that ChangeRCA outperforms state-of-the-art, achieving an impressive Top-1 Hit Rate of 85% and significantly reducing the time required to identify defective changes.
Wed 17 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 18:00 | Fault Diagnosis and Root Cause Analysis 1Demonstrations / Research Papers / Industry Papers at Sapoti Chair(s): Muhammad Ali Gulzar Virginia Tech | ||
16:00 18mTalk | A Quantitative and Qualitative Evaluation of LLM-based Explainable Fault Localization Research Papers Sungmin Kang Korea Advanced Institute of Science and Technology, Gabin An Korea Advanced Institute of Science and Technology, Shin Yoo Korea Advanced Institute of Science and Technology Pre-print | ||
16:18 18mTalk | BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection Research Papers Pre-print | ||
16:36 18mTalk | Fault Diagnosis for Test Alarms in Microservices Through Multi-source Data Industry Papers Shenglin Zhang Nankai University, Jun Zhu Nankai University, Bowen Hao Nankai University, Yongqian Sun Nankai University, Xiaohui Nie CNIC, CAS, Jingwen Zhu Nankai University, Xilin Liu Huawei Cloud, Xiaoqian Li Huawei Cloud, Yuchi Ma Huawei Cloud Computing Technologies CO., LTD., Dan Pei Tsinghua University | ||
16:54 18mTalk | Costs and Benefits of Machine Learning Software Defect Prediction: Industrial Case Study Industry Papers Szymon Stradowski Wroclaw University of Science and Technology & NOKIA, Lech Madeyski Wroclaw University of Science and Technology | ||
17:12 18mTalk | Chain-of-Event: Interpretable Root Cause Analysis for Microservices through Automatically Learning Weighted Event Causal Graph Industry Papers Zhenhe Yao Tsinghua University, Changhua Pei Computer Network Information Center at Chinese Academy of Sciences, Wenxiao Chen Tsinghua University, Hanzhang Wang Walmart Global Tech, Liangfei Su eBay, USA, Huai Jiang eBay, USA, Zhe Xie Tsinghua University, Xiaohui Nie CNIC, CAS, Dan Pei Tsinghua University | ||
17:30 18mTalk | ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems Research Papers Guangba Yu Sun Yat-sen University, Pengfei Chen Sun Yat-sen University, Zilong He Sun Yat-sen University, Qiuyu Yan Tencent, Yu Luo Tencent, Fangyuan Li Tencent, Zibin Zheng Sun Yat-sen University DOI Pre-print | ||
17:48 9mTalk | MineCPP: Mining Bug Fix Pairs and Their Structures Demonstrations DOI Pre-print Media Attached |