How Far Are We with Automated Machine Learning? Characterization and Challenges of AutoML Toolkits (FSE 2024 - Journal First)

Mon 15 - Fri 19 July 2024 Porto de Galinhas, Brazil, Brazil

Who

Md Abdullah Al Alamin, Gias Uddin

Track

FSE 2024 Journal First

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 18 Jul 2024 17:03 - 17:21 at Mandacaru - SE4AI 2 Chair(s): Wei Yang

Abstract

Automated Machine Learning aka AutoML toolkits are low/no-code software that aim to democratize ML system application development by ensuring rapid prototyping of ML models and by enabling collaboration across different stakeholders in ML system design (e.g., domain experts, data scientists, etc.). It is thus important to know the state of current AutoML toolkits and the challenges ML practitioners face while using those toolkits. In this paper, we first offer a characterization of currently available AutoML toolits by analyzing 37 top AutoML tools and platforms. We find that the top AutoML platforms are mostly cloud-based. Most of the tools are optimized for the adoption of shallow ML models. Second, we present an empirical study of 14.3K AutoML related posts from Stack Overflow (SO) that we analyzed using topic modelling algorithm LDA (Latent Dirichlet Allocation) to understand the challenges of ML practitioners while using the AutoML toolkits. We find 13 topics in the AutoML related discussions in SO. The 13 topics are grouped into four categories: MLOps (43% of all questions), Model (28% questions), Data (27% questions), and Documentation (2% questions). Most questions are asked during Model training (29%) and Data preparation (25%) phases. AutoML practitioners find the MLOps topic category most challenging. Topics related to the MLOps category are the most prevalent and popular for cloud-based AutoML toolkits. Based on our study findings, we provide 15 recommendations to improve the adoption and development of AutoML toolkits.

Md Abdullah Al Alamin

University of Calgary

Canada

Gias Uddin

York University, Canada