LM-PACE: Confidence Estimation by Large Language Models for Effective Root Causing of Cloud Incidents
Major cloud providers have employed advanced AI-based solutions like large language models to aid humans in identifying the root causes of cloud incidents. Even though AI-driven assistants are be- coming more common in the process of analyzing root causes, their usefulness in supporting on-call engineers is limited by their unstable accuracy. This limitation arises from the fundamental challenges of the task, the tendency of language model-based methods to produce hallucinate information, and the difficulty in distinguishing these well-disguised hallucinations. To address this challenge, we propose a novel confidence estimation method to assign reliable confidence scores to root cause recommendations, aiding on-call engineers in deciding whether to trust the model’s predictions. We made re- training-free confidence estimation on out-of-domain tasks possible via retrieval augmentation. To elicit better-calibrated confidence es- timates, we adopt a two-stage prompting procedure and a learnable transformation, which reduces the estimated calibration error (ECE) to 31% of the direct prompting baseline on a dataset comprising over 100,000 incidents from Microsoft. Additionally, we demonstrate that our method is applicable across various root cause prediction models. Our study takes an important move towards reliably and effectively embedding LLMs into cloud incident management systems
Thu 18 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
| 16:00 - 18:00 | AI4SE 3Industry Papers / Demonstrations / Journal First / Research Papers at Pitomba Chair(s): Maliheh Izadi Delft University of Technology | ||
| 16:0018m Talk | Rethinking Software Engineering in the Era of Foundation Models Industry Papers Ahmed E. Hassan Queen’s University, Dayi Lin Centre for Software Excellence, Huawei Canada, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Keheliya Gallaba Centre for Software Excellence, Huawei Canada, Filipe Cogo Centre for Software Excellence, Huawei Canada, Boyuan Chen Centre for Software Excellence, Huawei Canada, Haoxiang Zhang Huawei, Kishanthan Thangarajah Centre for Software Excellence, Huawei Canada, Gustavo Oliva Centre for Software Excellence, Huawei Canada, Jiahuei (Justina) Lin Centre for Software Excellence, Huawei Canada, Wali Mohammad Abdullah Centre for Software Excellence, Huawei Canada, Zhen Ming (Jack) Jiang York University  | ||
| 16:1818m Talk | LM-PACE: Confidence Estimation by Large Language Models for Effective Root Causing of Cloud Incidents Industry Papers Shizhuo Zhang University of Illinois Urbana-Champaign, Xuchao Zhang Microsoft, Chetan Bansal Microsoft Research, Pedro Las-Casas Microsoft, Rodrigo Fonseca Microsoft Research, Saravan Rajmohan Microsoft | ||
| 16:3618m Talk | Application of Quantum Extreme Learning Machines for QoS Prediction of Elevators' Software in an Industrial Context Industry Papers Xinyi Wang Simula Research Laboratory and University of Oslo, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Aitor Arrieta Mondragon University, Paolo Arcaini National Institute of Informatics
, Maite Arratibel Orona | ||
| 16:5418m Talk | X-lifecycle Learning for Cloud Incident Management using LLMs Industry Papers Drishti Goel Microsoft, Fiza Husain Microsoft, Aditya Kumar Singh Microsoft, Supriyo Ghosh Microsoft, Anjaly Parayil Microsoft, Chetan Bansal Microsoft Research, Xuchao Zhang Microsoft, Saravan Rajmohan MicrosoftMedia Attached | ||
| 17:1218m Talk | Neat: Mobile App Layout Similarity Comparison based on Graph Convolutional Networks Industry Papers Zhu Tao ByteDance, Yongqiang Gao ByteDance, Jiayi Qi ByteDance, Chao Peng ByteDance, China, Qinyun Wu Bytedance Ltd., Xiang Chen ByteDance, Ping Yang Bytedance Network Technology | ||
| 17:3018m Talk | Transformers and Meta-Tokenization in Sentiment Analysis for Software Engineering Journal First Nathan Cassee Eindhoven University of Technology, Andrei Agaronian Eindhoven University of Technology, Eleni Constantinou University of Cyprus, Nicole Novielli University of Bari, Alexander Serebrenik Eindhoven University of Technology | ||
| 17:489m Talk | EM-Assist: Safe automated ExtractMethod refactoring with LLMs Demonstrations Dorin Pomian University of Colorado Boulder, Abhiram Bellur University of Colorado Boulder, Malinda Dilhara University of Colorado Boulder, Zarina Kurbatova JetBrains Research, Egor Bogomolov JetBrains Research, Andrey Sokolov JetBrains Research, Timofey Bryksin JetBrains Research, Danny Dig University of Colorado Boulder, JetBrains ResearchPre-print | ||
