Thu 18 Jul 2024 12:12 - 12:30 at Mandacaru - Human Aspects 2 Chair(s): Bianca Trinkenreich

During code reviews, an essential step in software quality assurance, reviewers have the difficult task of understanding and evaluating code changes to validate their quality and prevent introducing faults to the codebase. This is a tedious process where the effort needed is highly dependent on the code submitted, as well as the author’s and the reviewer’s experience, leading to median wait times for review feedback of 15-64 hours. This paper aims to improve the velocity and effectiveness of code reviews by predicting three tasks of review activity at code submission time: which parts of a patch need to be (1) commented, (2) revised, or (3) are hotspots (will be commented or revised). We evaluate two different types of text embeddings (i.e., Bag-of-Words and Large Language Models encoding) and review process features (i.e., code size-based and history-based features) to predict the tasks. Our empirical study on three open-source and two industrial datasets shows that combining the code embedding and review process features leads to better results than the state-of-the-art approach. F1-scores (median of 40-62%) are significantly better compared to the state-of-the-art for all tasks (from +1 to +9%). Furthermore, we found that size-based review process features improve the most performance for all datasets, whereas history-based features are found less important, though they still improve performance.

Thu 18 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30
Human Aspects 2Research Papers at Mandacaru
Chair(s): Bianca Trinkenreich Colorado State University
11:00
18m
Talk
Can GPT-4 Replicate Empirical Software Engineering Research?
Research Papers
Jenny T. Liang Carnegie Mellon University, Carmen Badea Microsoft Research, Christian Bird Microsoft Research, Robert DeLine Microsoft Research, Denae Ford Microsoft Research, Nicole Forsgren Microsoft Research, Thomas Zimmermann Microsoft Research
Pre-print
11:18
18m
Talk
Do Code Generation Models Think Like Us? - A Study of Attention Alignment between Large Language Models and Human Programmers
Research Papers
Bonan Kou Purdue University, Shengmai Chen Purdue University, Zhijie Wang University of Alberta, Lei Ma The University of Tokyo & University of Alberta, Tianyi Zhang Purdue University
Pre-print
11:36
18m
Talk
Do Words Have Power? Understanding and Fostering Civility in Code Review Discussion
Research Papers
Md Shamimur Rahman University of Saskatchewan, Canada, Zadia Codabux University of Saskatchewan, Chanchal K. Roy University of Saskatchewan, Canada
11:54
18m
Talk
Effective Teaching through Code Reviews: Patterns and Anti-Patterns
Research Papers
Anita Sarma Oregon State University, Nina Chen Google
DOI
12:12
18m
Talk
An empirical study on code review activity prediction in practice
Research Papers
Doriane Olewicki Queen's University, Sarra Habchi Ubisoft Montréal, Bram Adams Queen's University
Pre-print