Software companies have widely used online A/B testing to evaluate the impact of a new technology by offering it to groups of users and comparing it against the unmodified product. However, running online A/B testing needs not only efforts in design, implementation, and stakeholders’ approval to be served in production but also several weeks to collect the data in iterations. To address these issues, a recently emerging topic, called offline A/B testing, is getting increasing attention, intending to conduct the offline evaluation of new technologies by estimating historical logged data. Although this approach is promising due to lower implementation effort, faster turnaround time, and no potential user harm, for it to be effectively prioritized as requirements in practice, several limitations need to be addressed, including its discrepancy with online A/B test results, and lack of systematic updates on varying data and parameters. In response, in this vision paper, I introduce AutoOffAB, an idea to automatically run variants of offline A/B testing against recent logging and update the offline evaluation results, which are used to make decisions on requirements more reliably and systematically.

Jie “JW” Wu is currently a Postdoctoral Research Fellow at the University of British Columbia, working at the intersection of Software Engineering and AI. He got his PhD in Systems Engineering at George Washington University. He received his M.Sc. degree and B.Sc. degree (ACM Class, an elite CS program) in Computer Science at Shanghai Jiao Tong University. He worked as a software engineer in the industry for nearly a decade at Snap Inc., Microsoft, and startup (ArcSite).