"Programming by Voice" by Sadia Nowrin

Date of Award

2024

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Computer Science (PhD)

Administrative Home Department

Department of Computer Science

Advisor 1

Keith Vertanen

Committee Member 1

Laura Brown

Committee Member 2

Leo Ureel

Committee Member 3

Patricia Ordonez

Abstract

Programmers typically rely on a keyboard and mouse for input, which poses significant challenges for individuals with motor impairments, limiting their ability
to effectively input programs. Voice-based programming offers a promising alternative,
enabling a more inclusive and accessible programming environment. Insights from interviews with motor-impaired programmers revealed that memorizing unnatural commands in existing voice-based programming systems led to frustration. In this work, we explore how programmers naturally speak a single line of code and present a comprehensive methodology for a voice programming system aimed at making programming more accessible for diverse users. To achieve this, we adopted a two-step pipeline. The first step focuses on recognizing single lines of spoken code by adapting a large pre-trained speech recognition model. By adapting the model with just one hour of spoken programs and leveraging existing natural English language data, we reduced the word error rate from 28.4% to 8.7%. Additional improvements were achieved by decoding with a domain-specific N-gram model and rescoring with a fine-tuned large language model tailored to programming languages, resulting in a WER of 5.5%. The second step involves translating the recognized text into the target line of code. Our approach to text-to-code translation is the first to address spoken programs, converting a single line of text to a single line of code, whereas current systems typically translate comments to blocks of code. We used a large language model known for generating code from comments and adapted it to learn how to generate single lines of code. This adaptation led to a significant improvement in the CodeBLEU score from 56.9% to 83.3% on our test set. In addition, when translating recognized transcripts to target code, our best-adapted model showed marked success. The CodeBLEU score improved from 53.7% to 76.7%, demonstrating the model’s ability to handle errors from the speech recognizer.

Share

COinS