Wed 17 Jul 2024 16:36 - 16:54 at Sapoti - Fault Diagnosis and Root Cause Analysis 1 Chair(s): Muhammad Ali Gulzar

Nowadays the testing of large-scale microservices could produce an enormous number of test alarms daily. Manually diagnosing these alarms is time-consuming and laborious for the testers. Automatic fault diagnosis with fault classification and localization can help testers efficiently handle the increasing volume of failed test cases. However, the current methods for diagnosing test alarms struggle to deal with the complex and frequently updated microservices. In this paper, we introduce SynthoDiag, a novel fault diagnosis framework for test alarms in microservices through multi-source logs (execution logs, trace logs, and test case information) organized with a knowledge graph. An Entity Fault Association and Position Value (EFA-PV) algorithm is proposed to localize the fault-indicative log entries. Additionally, an efficient block-based differentiation approach is used to filter out fault-irrelevant entries in the test cases, significantly improving the overall performance of fault diagnosis. At last, SynthoDiag is systematically evaluated with a large-scale real-world dataset from a top-tier global cloud service provider, Huawei Cloud, which provides services for more than three million users. The results show the Micro-F1 and Macro-F1 scores improvement of SynthoDiag over baseline methods in fault classification are 21% and 30%, respectively, and its top-5 accuracy of fault localization is 81.9%, significantly surpassing the previous methods.

Wed 17 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

16:00 - 18:00
Fault Diagnosis and Root Cause Analysis 1Demonstrations / Research Papers / Industry Papers at Sapoti
Chair(s): Muhammad Ali Gulzar Virginia Tech
16:00
18m
Talk
A Quantitative and Qualitative Evaluation of LLM-based Explainable Fault Localization
Research Papers
Sungmin Kang Korea Advanced Institute of Science and Technology, Gabin An Korea Advanced Institute of Science and Technology, Shin Yoo Korea Advanced Institute of Science and Technology
Pre-print
16:18
18m
Talk
BARO: Robust Root Cause Analysis for Microservices via Multivariate Bayesian Online Change Point Detection
Research Papers
Luan Pham RMIT University, Huong Ha RMIT University, Hongyu Zhang Chongqing University
Pre-print
16:36
18m
Talk
Fault Diagnosis for Test Alarms in Microservices Through Multi-source Data
Industry Papers
Shenglin Zhang Nankai University, Jun Zhu Nankai University, Bowen Hao Nankai University, Yongqian Sun Nankai University, Xiaohui Nie CNIC, CAS, Jingwen Zhu Nankai University, Xilin Liu Huawei Cloud, Xiaoqian Li Huawei Cloud, Yuchi Ma Huawei Cloud Computing Technologies CO., LTD., Dan Pei Tsinghua University
16:54
18m
Talk
Costs and Benefits of Machine Learning Software Defect Prediction: Industrial Case Study
Industry Papers
Szymon Stradowski Wroclaw University of Science and Technology & NOKIA, Lech Madeyski Wroclaw University of Science and Technology
17:12
18m
Talk
Chain-of-Event: Interpretable Root Cause Analysis for Microservices through Automatically Learning Weighted Event Causal Graph
Industry Papers
Zhenhe Yao Tsinghua University, Changhua Pei Computer Network Information Center at Chinese Academy of Sciences, Wenxiao Chen Tsinghua University, Hanzhang Wang Walmart Global Tech, Liangfei Su eBay, USA, Huai Jiang eBay, USA, Zhe Xie Tsinghua University, Xiaohui Nie CNIC, CAS, Dan Pei Tsinghua University
17:30
18m
Talk
ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems
Research Papers
Guangba  Yu Sun Yat-sen University, Pengfei Chen Sun Yat-sen University, Zilong He Sun Yat-sen University, Qiuyu Yan Tencent, Yu Luo Tencent, Fangyuan Li Tencent, Zibin Zheng Sun Yat-sen University
DOI Pre-print
17:48
9m
Talk
MineCPP: Mining Bug Fix Pairs and Their Structures
Demonstrations
Sai Krishna Avula IIT Gandhinagar, Shouvick Mondal IIT Gandhinagar
DOI Pre-print Media Attached