Thu 18 Jul 2024 17:30 - 17:48 at Pitomba - AI4SE 3 Chair(s): Maliheh Izadi

Abstract

Sentiment analysis has been used to study aspects of software engineering, such as issue resolution, toxicity, and self-admitted technical debt. The automatic classification of software engineering texts into three different polarity classes (negative, neutral, and positive) makes it possible to understand how developers communicate. To address the peculiarities of software engineering texts, sentiment analysis tools often consider the specific technical lingo practitioners use. With the emergence of more advanced deep-learning models, it has become increasingly important to understand the performance and limitations of sentiment analysis tools when applied to software engineering data. This is especially true because existing replications of software engineering studies that apply sentiment analysis tools show that tool choice can influence the conclusions obtained. Moreover, we believe that it is important to assess the performance of newer deep-learning tools and models and compare their performance to that of existing tools.

Therefore, we validated two existing recommendations made in software engineering literature: The recommendation to use pre-trained transformer models to classify sentiment and the recommendation to replace non-natural language elements with meta-tokens. 
The recommendations were validated in a set of rigorous benchmarks. 
We picked five different sentiment analysis tools, paying attention to select a diverse set of tools of two pre-trained transformer models and three machine learning tools. 
Because recent benchmarks show that ChatGPT is not competitive in sentiment analysis compared to fine-tuned tools, we do not select it for these benchmarks.
To train and evaluate the selected tools we use two state-of-the-art, manually labeled datasets sampled from GitHub and StackOverflow to evaluate the performance of the sentiment analysis tools. 

Based on the results of the benchmarks we conclude that these ``common-knowledge'' guidelines actually do not work as previously believed. 
We find that pre-trained transformers outperform the best machine learning tool on one of the two datasets, 
and that the performance difference is a few percentage points. 
Therefore, we recommend that software engineering researchers should not just consider predictive performance when selecting a sentiment analysis tool because the best-performing sentiment analysis tools perform very similarly to each other (within 4 percentage points). 
Additionally, we find that meta-tokenization, or the practice of pre-processing datasets to remove more non-natural language elements, does not further improve the predictive performance of sentiment analysis tools.  
These findings are relevant to researchers who apply sentiment analysis tools to software engineering data, as this information can help them select the appropriate tool. 
These findings also help tool builders of sentiment analysis tools we seek to further adapt software engineering specific sentiment analysis tools to software engineering. 

Information

The article was accepted for publication by the Springer Journal of Empirical Software Engineering (EMSE) on the 23rd of February 2024. Currently the article has not yet been published by EMSE itself, however, the camera-ready version submitted to the Author Services can be accessed online.\footnote{\url{https://cassee.dev/files/meta-tokenization-transformers.pdf}} The article is an original, journal-first article. It has not been presented, nor is it under consideration, for any other journal-first program or other conferences.

Thu 18 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

16:00 - 18:00
AI4SE 3Industry Papers / Demonstrations / Journal First at Pitomba
Chair(s): Maliheh Izadi Delft University of Technology
16:00
18m
Talk
Rethinking Software Engineering in the Era of Foundation Models
Industry Papers
Ahmed E. Hassan Queen’s University, Dayi Lin Centre for Software Excellence, Huawei Canada, Gopi Krishnan Rajbahadur Centre for Software Excellence, Huawei, Canada, Keheliya Gallaba Centre for Software Excellence, Huawei Canada, Filipe Cogo Centre for Software Excellence, Huawei Canada, Boyuan Chen Centre for Software Excellence, Huawei Canada, Haoxiang Zhang Huawei, Kishanthan Thangarajah Centre for Software Excellence, Huawei Canada, Gustavo Oliva Centre for Software Excellence, Huawei Canada, Jiahuei (Justina) Lin Centre for Software Excellence, Huawei Canada, Wali Mohammad Abdullah Centre for Software Excellence, Huawei Canada, Zhen Ming (Jack) Jiang York University
16:18
18m
Talk
LM-PACE: Confidence Estimation by Large Language Models for Effective Root Causing of Cloud Incidents
Industry Papers
Shizhuo Zhang University of Illinois Urbana-Champaign, Xuchao Zhang Microsoft, Chetan Bansal Microsoft Research, Pedro Las-Casas Microsoft, Rodrigo Fonseca Microsoft Research, Saravan Rajmohan Microsoft
16:36
18m
Talk
Application of Quantum Extreme Learning Machines for QoS Prediction of Elevators' Software in an Industrial Context
Industry Papers
Xinyi Wang Simula Research Laboratory and University of Oslo, Shaukat Ali Simula Research Laboratory and Oslo Metropolitan University, Aitor Arrieta Mondragon University, Paolo Arcaini National Institute of Informatics , Maite Arratibel Orona
16:54
18m
Talk
X-lifecycle Learning for Cloud Incident Management using LLMs
Industry Papers
Drishti Goel Microsoft, Fiza Husain Microsoft, Aditya Kumar Singh Microsoft, Supriyo Ghosh Microsoft, Anjaly Parayil Microsoft, Chetan Bansal Microsoft Research, Xuchao Zhang Microsoft, Saravan Rajmohan Microsoft
Media Attached
17:12
18m
Talk
Neat: Mobile App Layout Similarity Comparison based on Graph Convolutional Networks
Industry Papers
Zhu Tao ByteDance, Yongqiang Gao ByteDance, Jiayi Qi ByteDance, Chao Peng ByteDance, China, Qinyun Wu Bytedance Ltd., Xiang Chen ByteDance, Ping Yang Bytedance Network Technology
17:30
18m
Talk
Transformers and Meta-Tokenization in Sentiment Analysis for Software Engineering
Journal First
Nathan Cassee Eindhoven University of Technology, Andrei Agaronian Eindhoven University of Technology, Eleni Constantinou University of Cyprus, Nicole Novielli University of Bari, Alexander Serebrenik Eindhoven University of Technology
17:48
9m
Talk
EM-Assist: Safe automated ExtractMethod refactoring with LLMs
Demonstrations
Dorin Pomian University of Colorado Boulder, Abhiram Bellur University of Colorado Boulder, Malinda Dilhara University of Colorado Boulder, Zarina Kurbatova JetBrains Research, Egor Bogomolov JetBrains Research, Andrey Sokolov JetBrains Research, Timofey Bryksin JetBrains Research, Danny Dig University of Colorado Boulder, JetBrains Research
Pre-print