Test input prioritization for Machine Learning Classifiers (FSE 2024 - Journal First)

Who

Xueqi Dang, Yinghua Li, Mike Papadakis, Jacques Klein, Tegawendé F. Bissyandé, Yves Le Traon

Track

FSE 2024 Journal First

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 Jul 2024 16:45 - 17:03 at Mandacaru - SE4AI 2 Chair(s): Wei Yang

Abstract

Machine learning has achieved remarkable success across diverse domains. Nevertheless, concerns about interpretability in black-box models, especially within Deep Neural Networks (DNNs), have become pronounced in safety-critical fields like healthcare and finance. Classical machine learning (ML) classifiers, known for their higher interpretability, are preferred in these domains. Similar to DNNs, classical ML classifiers can exhibit bugs that could lead to severe consequences in practice. Test input prioritization has emerged as a promising approach to ensure the quality of an ML system, which prioritizes potentially misclassified tests so that such tests can be identified earlier with limited manual labeling costs. However, when applying to classical ML classifiers, existing DNN test prioritization methods are constrained from three perspectives: 1) Coverage-based methods are inefficient and time-consuming; 2) Mutation-based methods cannot be adapted to classical ML models due to mismatched model mutation rules; 3) Confidence-based methods are restricted to a single dimension when applying to binary ML classifiers, solely depending on the model’s prediction probability for one class. To overcome the challenges, we propose MLPrior, a test prioritization approach specifically tailored for classical ML models. MLPrior leverages the characteristics of classical ML classifiers (i.e., interpretable models and carefully engineered attribute features) to prioritize test inputs. The foundational principles are: 1) tests more sensitive to mutations are more likely to be misclassified, and 2) tests closer to the model’s decision boundary are more likely to be misclassified. Building on the first concept, we design mutation rules to generate two types of mutation features (i.e., model mutation features and input mutation features) for each test. Drawing from the second notion, MLPrior generates attribute features of each test based on its attribute values, which can indirectly reveal the proximity between the test and the decision boundary. For each test, MLPrior combines all three types of features of it into a final vector. Subsequently, MLPrior employs a pre-trained ranking model to predict the misclassification probability of each test based on its final vector and ranks tests accordingly. We conducted an extensive study to evaluate MLPrior based on 305 subjects, encompassing natural datasets, mixed noisy datasets, and fairness datasets. The results demonstrate that MLPrior outperforms all the compared test prioritization approaches, with an average improvement of 14.74%~66.93% on natural datasets, 18.55%~67.73% on mixed noisy datasets, and 15.34%~62.72% on fairness datasets.

Xueqi Dang

University of Luxembourg

Luxembourg

Yinghua Li

University of Luxembourg

Luxembourg

Mike Papadakis

University of Luxembourg

Luxembourg

Jacques Klein

University of Luxembourg

Luxembourg

Tegawendé F. Bissyandé

University of Luxembourg

Luxembourg

Yves Le Traon

University of Luxembourg, Luxembourg

Luxembourg

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 18 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

16:00 - 18:00	SE4AI 2Research Papers / Industry Papers / Demonstrations / Journal First at Mandacaru Chair(s): Wei Yang University of Texas at Dallas

16:00 18m Talk		Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models Research Papers Yan Wang Central University of Finance and Economics, Xiaoning Li Central University of Finance and Economics, Tien N. Nguyen University of Texas at Dallas, Shaohua Wang Central University of Finance and Economics, Chao Ni School of Software Technology, Zhejiang University, Ling Ding Central University of Finance and Economics Pre-print Media Attached File Attached
16:18 18m Talk		On Reducing Undesirable Behavior in Deep-Reinforcement-Learning-Based Software Research Papers Ophir Carmel The Hebrew University of Jerusalem, Guy Katz The Hebrew University of Jerusalem
16:36 9m Talk		Decide: Knowledge-based Version Incompatibility Detection in Deep Learning Stacks Demonstrations Zihan Zhou The University of Hong Kong, Zhongkai Zhao National University of Singapore, Bonan Kou Purdue University, Tianyi Zhang Purdue University DOI Pre-print Media Attached
16:45 18m Talk		Test input prioritization for Machine Learning Classifiers Journal First Xueqi Dang University of Luxembourg, Yinghua Li University of Luxembourg, Mike Papadakis University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg
17:03 18m Talk		How Far Are We with Automated Machine Learning? Characterization and Challenges of AutoML Toolkits Journal First Md Abdullah Al Alamin University of Calgary, Gias Uddin York University, Canada
17:21 18m Talk		Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4 Industry Papers Xuchao Zhang Microsoft, Supriyo Ghosh Microsoft, Chetan Bansal Microsoft Research, Rujia Wang Microsoft, Minghua Ma Microsoft Research, Yu Kang Microsoft Research, Saravan Rajmohan Microsoft
17:39 18m Talk		Exploring LLM-based Agents for Root Cause Analysis Industry Papers Devjeet Roy Washington State University, Xuchao Zhang Microsoft, Rashi Bhave Microsoft Research, Chetan Bansal Microsoft Research, Pedro Las-Casas Microsoft, Rodrigo Fonseca Microsoft Research, Saravan Rajmohan Microsoft