AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation
The rise of large language models (LLMs) has unlocked various applications of this technology in software development. In particular, generative LLMs have been shown to effectively power AI-based code authoring tools that can suggest entire statements or blocks of code during code authoring. In this paper we present ComposeCode, an AI-assisted code authoring tool developed and deployed at CompanyA internally. ComposeCode is based on the InCoder LLM that merges generative capabilities with bi-directionality. We have scaled up ComposeCode to serve tens of thousands of developers at CompanyA, across 9 programming lan- guages and several coding surfaces. We present our experience in making design decisions about the model and system architecture for ComposeCode that addresses these challenges.
To release a LLM model at this scale, we needed to first ensure that it is sufficiently accurate. In a random sample of 20K source code files, depending on the language, we are able to reproduce hidden lines between 40% and 58% of the time, an improvement of 1.4× and 4.1× over a model trained only on public data.
We gradually rolled ComposeCode out to developers. At the time of this writing, 16K developers have used it with 8% of their code coming directly from ComposeCode.
To triangulate our numerical findings, we conduct a thematic analysis on the feedback from 70 developers. We find that 91.5% of the feedback is positive, with the most common themes being discovering APIs, dealing with boilerplate code, and accelerating coding. CompanyA continues to integrate this feedback into ComposeCode.