On Reducing Undesirable Behavior in Deep-Reinforcement-Learning-Based Software
Deep reinforcement learning (DRL) has proven extremely useful in a large variety of application domains. However, even successful DRL-based software can exhibit highly undesirable behavior. This is due to DRL training being based on maximizing a reward function, which typically captures general trends but cannot precisely capture, or rule out, certain behaviors of the system. In this paper, we propose a novel framework aimed at drastically reducing the undesirable behavior of DRL-based software, while maintaining its excellent performance. In addition, our framework can assist in providing engineers with a comprehensible characterization of such undesirable behavior. Under the hood, our approach is based on extracting decision tree classifiers from erroneous state-action pairs, and then integrating these trees into the DRL training loop, penalizing the system whenever it performs an error. We provide a proof-of-concept implementation of our approach, and use it to evaluate the technique on three significant case studies. We find that our approach can extend existing frameworks in a straightforward manner, and incurs only a slight overhead in training time. Further, it incurs only a very slight hit to performance, or even in some cases - improves it, while significantly reducing the frequency of undesirable behavior.
Thu 18 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
16:00 - 18:00 | SE4AI 2Research Papers / Industry Papers / Demonstrations / Journal First at Mandacaru Chair(s): Wei Yang University of Texas at Dallas | ||
16:00 18mTalk | Natural Is The Best: Model-Agnostic Code Simplification for Pre-trained Large Language Models Research Papers Yan Wang Central University of Finance and Economics, Xiaoning Li Central University of Finance and Economics, Tien N. Nguyen University of Texas at Dallas, Shaohua Wang Central University of Finance and Economics, Chao Ni School of Software Technology, Zhejiang University, Ling Ding Central University of Finance and Economics Pre-print Media Attached File Attached | ||
16:18 18mTalk | On Reducing Undesirable Behavior in Deep-Reinforcement-Learning-Based Software Research Papers | ||
16:36 9mTalk | Decide: Knowledge-based Version Incompatibility Detection in Deep Learning Stacks Demonstrations Zihan Zhou The University of Hong Kong, Zhongkai Zhao National University of Singapore, Bonan Kou Purdue University, Tianyi Zhang Purdue University DOI Pre-print Media Attached | ||
16:45 18mTalk | Test input prioritization for Machine Learning Classifiers Journal First Xueqi Dang University of Luxembourg, Yinghua LI University of Luxembourg, Mike Papadakis University of Luxembourg, Jacques Klein University of Luxembourg, Tegawendé F. Bissyandé University of Luxembourg, Yves Le Traon University of Luxembourg, Luxembourg | ||
17:03 18mTalk | How Far Are We with Automated Machine Learning? Characterization and Challenges of AutoML Toolkits Journal First | ||
17:21 18mTalk | Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4 Industry Papers Xuchao Zhang Microsoft, Supriyo Ghosh Microsoft, Chetan Bansal Microsoft Research, Rujia Wang Microsoft, Minghua Ma Microsoft Research, Yu Kang Microsoft Research, Saravan Rajmohan Microsoft | ||
17:39 18mTalk | Exploring LLM-based Agents for Root Cause Analysis Industry Papers Devjeet Roy Washington State University, Xuchao Zhang Microsoft, Rashi Bhave Microsoft Research, Chetan Bansal Microsoft Research, Pedro Las-Casas Microsoft, Rodrigo Fonseca Microsoft Research, Saravan Rajmohan Microsoft |