Date of Award

2024

Document Type

Open Access Dissertation

Degree Name

Doctor of Philosophy in Computer Science (PhD)

Administrative Home Department

Department of Computer Science

Advisor 1

Keith Vertanen

Committee Member 1

Laura Brown

Committee Member 2

Leo Ureel

Committee Member 3

Patricia Ordonez

Abstract

Programmers typically rely on a keyboard and mouse for input, which poses significant challenges for individuals with motor impairments, limiting their ability
to effectively input programs. Voice-based programming offers a promising alternative,
enabling a more inclusive and accessible programming environment. Insights from interviews with motor-impaired programmers revealed that memorizing unnatural commands in existing voice-based programming systems led to frustration. In this work, we explore how programmers naturally speak a single line of code and present a comprehensive methodology for a voice programming system aimed at making programming more accessible for diverse users. To achieve this, we adopted a two-step pipeline. The first step focuses on recognizing single lines of spoken code by adapting a large pre-trained speech recognition model. By adapting the model with just one hour of spoken programs and leveraging existing natural English language data, we reduced the word error rate from 28.4% to 8.7%. Additional improvements were achieved by decoding with a domain-specific N-gram model and rescoring with a fine-tuned large language model tailored to programming languages, resulting in a WER of 5.5%. The second step involves translating the recognized text into the target line of code. Our approach to text-to-code translation is the first to address spoken programs, converting a single line of text to a single line of code, whereas current systems typically translate comments to blocks of code. We used a large language model known for generating code from comments and adapted it to learn how to generate single lines of code. This adaptation led to a significant improvement in the CodeBLEU score from 56.9% to 83.3% on our test set. In addition, when translating recognized transcripts to target code, our best-adapted model showed marked success. The CodeBLEU score improved from 53.7% to 76.7%, demonstrating the model’s ability to handle errors from the speech recognizer.

Share

COinS