Sharing Software-Evolution Datasets: Practices, Challenges, and Recommendations
Sharing research artifacts (e.g., software, data, protocols) is an immensely important topic for improving transparency, replicability, and reusability in research, and has recently gained more and more traction in software engineering. For instance, recent studies have focused on artifact reviewing, the impact of open science, and specific legal or ethical issues of sharing artifacts. Most of such studies are concerned with artifacts created by the researchers themselves (e.g., scripts, algorithms, tools) and processes for quality assuring these artifacts (e.g., through artifact-evaluation committees). In contrast, the practices and challenges of sharing software-evolution datasets (i.e., republished version-control data with person-related information) have only been scratched in such works. To tackle this gap, we conducted a meta study of software-evolution datasets published at the International Conference on Mining Software Repositories from 2017 until 2021 and snowballed a set of papers that build upon these datasets. Investigating 200 papers, we elicited what types of software-evolution datasets have been shared following what practices and what challenges researchers experienced with sharing or using the datasets. We discussed our findings with an authority on research-data management and ethics reviews through a semi-structured interview to put the practices and challenges into context. Through our meta study, we provide an overview of the sharing practices for software-evolution datasets and the corresponding challenges. The expert interview enriched this analysis by discussing how to solve the challenges and defining recommendations for sharing software-evolution datsets in the future. Our results extend and complement current research, and we are confident that they help researchers share software-evolution datasets (as well as datasets involving the same types of data) in a reliable, ethical, and trustworthy way.
Thu 18 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
14:00 - 15:30 | Software Maintenance and Comprehension 3Research Papers / Journal First at Pitomba Chair(s): Xin Xia Huawei Technologies | ||
14:00 18mTalk | Revealing Software Development Work Patterns with PR-Issue Graph Topologies Research Papers Cleidson de Souza Federal University of Pará, Brazil, Emilie Ma University of British Columbia, Jesse Wong University of British Columbia, Dongwook Yoon University of British Columbia, Ivan Beschastnikh University of British Columbia | ||
14:18 18mTalk | Using acceptance tests to predict merge conflict risk Journal First Thaís Rocha UFAPE - Universidade Federal do Agreste de Pernambuco, Paulo Borba Federal University of Pernambuco Pre-print | ||
14:36 18mTalk | Generative AI for Pull Request Descriptions: Adoption, Impact, and Developer Interventions Research Papers Tao Xiao Nara Institute of Science and Technology, Hideaki Hata Shinshu University, Christoph Treude Singapore Management University, Kenichi Matsumoto Nara Institute of Science and Technology Pre-print Media Attached | ||
14:54 18mTalk | SimLLM: Measuring Semantic Similarity in Code Summaries Using a Large Language Model-Based Approach Research Papers | ||
15:12 18mTalk | Sharing Software-Evolution Datasets: Practices, Challenges, and Recommendations Research Papers David Broneske DZHW Hannover, Germany, Sebastian Kittan Otto-von-Guericke Unviersity Magdeburg, Germany, Jacob Krüger Eindhoven University of Technology |