On Reducing Undesirable Behavior in Deep-Reinforcement-Learning-Based Software (FSE 2024 - Research Papers) - FSE 2024

Mon 15 - Fri 19 July 2024 Porto de Galinhas, Brazil, Brazil

Who

Ophir Carmel, Guy Katz

Track

FSE 2024 Research Papers

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

When

Thu 18 Jul 2024 16:18 - 16:36 at Mandacaru - SE4AI 2 Chair(s): Wei Yang

Abstract

Deep reinforcement learning (DRL) has proven extremely useful in a large variety of application domains. However, even successful DRL-based software can exhibit highly undesirable behavior. This is due to DRL training being based on maximizing a reward function, which typically captures general trends but cannot precisely capture, or rule out, certain behaviors of the system. In this paper, we propose a novel framework aimed at drastically reducing the undesirable behavior of DRL-based software, while maintaining its excellent performance. In addition, our framework can assist in providing engineers with a comprehensible characterization of such undesirable behavior. Under the hood, our approach is based on extracting decision tree classifiers from erroneous state-action pairs, and then integrating these trees into the DRL training loop, penalizing the system whenever it performs an error. We provide a proof-of-concept implementation of our approach, and use it to evaluate the technique on three significant case studies. We find that our approach can extend existing frameworks in a straightforward manner, and incurs only a slight overhead in training time. Further, it incurs only a very slight hit to performance, or even in some cases - improves it, while significantly reducing the frequency of undesirable behavior.

Ophir Carmel

The Hebrew University of Jerusalem

Guy Katz

The Hebrew University of Jerusalem

Israel

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Session Program

Thu 18 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

	16:00 - 18:00	SE4AI 2Research Papers / Industry Papers / Demonstrations / Journal First at Mandacaru Chair(s): Wei Yang University of Texas at Dallas

	16:00 18m Talk		Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models Research Papers Yan Wang Central University of Finance and Economics, Xiaoning Li Central University of Finance and Economics, Tien N. Nguyen University of Texas at Dallas, Shaohua Wang Central University of Finance and Economics, Chao Ni School of Software Technology, Zhejiang University, Ling Ding Central University of Finance and Economics Pre-print Media Attached File Attached
	16:18 18m Talk		On Reducing Undesirable Behavior in Deep-Reinforcement-Learning-Based Software Research Papers Ophir Carmel The Hebrew University of Jerusalem, Guy Katz The Hebrew University of Jerusalem
	16:36 9m Talk		Decide: Knowledge-based Version Incompatibility Detection in Deep Learning Stacks Demonstrations Zihan Zhou The University of Hong Kong, Zhongkai Zhao National University of Singapore, Bonan Kou Purdue University, Tianyi Zhang Purdue University DOI Pre-print Media Attached
	16:45 18m Talk		Test input prioritization for Machine Learning Classifiers Journal First Xueqi Dang University of Luxembourg, Yinghua Li University of Luxembourg, Mike Papadakis University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg
	17:03 18m Talk		How Far Are We with Automated Machine Learning? Characterization and Challenges of AutoML Toolkits Journal First Md Abdullah Al Alamin University of Calgary, Gias Uddin York University, Canada
	17:21 18m Talk		Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4 Industry Papers Xuchao Zhang Microsoft, Supriyo Ghosh Microsoft, Chetan Bansal Microsoft Research, Rujia Wang Microsoft, Minghua Ma Microsoft Research, Yu Kang Microsoft Research, Saravan Rajmohan Microsoft
	17:39 18m Talk		Exploring LLM-based Agents for Root Cause Analysis Industry Papers Devjeet Roy Washington State University, Xuchao Zhang Microsoft, Rashi Bhave Microsoft Research, Chetan Bansal Microsoft Research, Pedro Las-Casas Microsoft, Rodrigo Fonseca Microsoft Research, Saravan Rajmohan Microsoft