Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?
Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a program’s intent. However, there is typically no guarantee that a program’s implementation and natural language documentation are aligned. In the case of a conflict, leveraging information in code-adjacent natural language has the potential to enhance fault localization, debugging, and code trustworthiness. In practice, however, this information is often underutilized due to the inherent ambiguity of natural language which makes natural language intent challenging to check programmatically. The “emergent abilities” of Large Language Models (LLMs) have the potential to facilitate the translation of natural language intent to programmatically checkable assertions. However, it is unclear if LLMs can correctly translate informal natural language specifications into formal specifications that match programmer intent. Additionally, it is unclear if such translation could be useful in practice.
In this paper, we describe LLM4nl2post, the problem leveraging LLMs for transforming informal natural language to formal method postconditions, expressed as program assertions. We introduce and validate metrics to measure and compare different LLM4nl2post approaches, using the correctness and {\it discriminative power} of generated postconditions. We then perform qualitative and quantitative methods to assess the quality of LLM4nl2post postconditions, finding that they are generally correct and able to discriminate incorrect code. Finally, we find that LLM4nl2post via LLMs has the potential to be helpful in practice; specifications generated from natural language were able to catch 70 real-world historical bugs from Defects4J.
Thu 18 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change
11:00 - 12:30 | Program Analysis and Performance 2Research Papers at Pitanga Chair(s): Rahul Purandare University of Nebraska-Lincoln | ||
11:00 18mTalk | Adapting Multi-objectivized Software Configuration Tuning Research Papers Pre-print | ||
11:18 18mTalk | Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions? Research Papers Madeline Endres University of Massachusetts Amherst, Sarah Fakhoury Microsoft Research, Saikat Chakraborty Microsoft Research, Shuvendu K. Lahiri Microsoft Research | ||
11:36 18mTalk | Analyzing Quantum Programs with LintQ: A Static Analysis Framework for Qiskit Research Papers Pre-print | ||
11:54 18mTalk | Abstraction-Aware Inference of Metamorphic Relations Research Papers Agustin Nolasco University of Rio Cuarto, Facundo Molina IMDEA Software Institute, Renzo Degiovanni Luxembourg Institute of Science and Technology, Alessandra Gorla IMDEA Software Institute, Diego Garbervetsky Departamento de Computación, FCEyN, UBA, Mike Papadakis University of Luxembourg, Sebastian Uchitel Imperial College and University of Buenos Aires, Nazareno Aguirre University of Rio Cuarto and CONICET, Marcelo F. Frias Dept. of Software Engineering Instituto Tecnológico de Buenos Aires | ||
12:12 18mTalk | Predicting Configuration Performance in Multiple Environments with Sequential Meta-Learning Research Papers Pre-print |