Thu 18 Jul 2024 11:54 - 12:12 at Pitomba - Software Maintenance and Comprehension 2 Chair(s): Denys Poshyvanyk

Pythonic idioms are highly valued and widely used in the Python programming community. However, many Python users find it challenging to use Pythonic idioms. Adopting rule-based approach or LLM-only approach is not sufficient to overcome three persistent challenges of code idiomatization including code miss, wrong detection and wrong refactoring. Motivated by the determinism of rules and adaptability of LLMs, we propose a hybrid approach consisting of three modules. We not only write prompts to instruct LLMs to complete tasks but also invoke APIs to complete tasks, with these APIs being generated by prompting LLMs to generate code. We first construct a knowledge module with three elements including ASTscenario, ASTcomponent and Condition, and prompt LLMs to generate Python code for incorporation into an API library for subsequent use. After that, for any syntax-error-free Python code, we invoke APIs from the API library to extract ASTcomponent from the ASTscenario, and then filter out ASTcomponent that does not meet the condition. Finally, we design prompts to instruct LLMs to abstract and idiomatize code, and then invoke APIs from the API library to rewrite non-idiomatic code into the idiomatic code. Next, we conduct a comprehensive evaluation of our approach, RIdiom, and Prompt-LLM on nine established Pythonic idioms in RIdiom. Our approach exhibits superior accuracy, F1-score, and recall, while maintaining precision levels comparable to RIdiom, all of which consistently exceed or come close to 90% for each metric of each idiom. Lastly, we extend our evaluation to encompass four new Pythonic idioms. Our approach consistently outperforms Prompt-LLM, achieving metrics with values consistently exceeding 90% for accuracy, F1-score, precision, and recall.

Thu 18 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30
Software Maintenance and Comprehension 2Research Papers at Pitomba
Chair(s): Denys Poshyvanyk William & Mary
11:00
18m
Talk
Bloat beneath Python's Scales: A Fine-Grained Inter-Project Dependency Analysis
Research Papers
Georgios-Petros Drosos ETH Zurich, Thodoris Sotiropoulos ETH Zurich, Diomidis Spinellis Athens University of Economics and Business & Delft University of Technology, Dimitris Mitropoulos University of Athens
DOI Pre-print
11:18
18m
Research paper
Characterizing Python Library Migrations
Research Papers
Mohayeminul Islam University of Alberta, Ajay Jha North Dakota State University, Ildar Akhmetov Northeastern University, Sarah Nadi New York University Abu Dhabi, University of Alberta
DOI Pre-print
11:36
18m
Talk
PyRadar: Towards Automatically Retrieving and Validating Source Code Repository Information for PyPI Packages
Research Papers
Kai Gao University of Science and Technology Beijing, Weiwei Xu Peking University, Wenhao Yang Peking University, Minghui Zhou Peking University
DOI Pre-print
11:54
18m
Talk
Refactoring to Pythonic Idioms: A Hybrid Knowledge-Driven Approach Leveraging Large Language Models
Research Papers
zejun zhang Australian National University, Zhenchang Xing CSIRO's Data61, Xiaoxue Ren Zhejiang University, Qinghua Lu Data61, CSIRO, Xiwei (Sherry) Xu Data61, CSIRO
12:12
18m
Talk
Dependency-Induced Waste in Continuous Integration: An Empirical Study of Unused Dependencies in the NPM Ecosystem
Research Papers
Nimmi Weeraddana University of Waterloo, Mahmoud Alfadel University of Waterloo, Shane McIntosh University of Waterloo
DOI Pre-print