Date of Award
2024
Document Type
Open Access Dissertation
Degree Name
Doctor of Philosophy in Computer Science (PhD)
Administrative Home Department
Department of Computer Science
Advisor 1
Keith Vertanen
Committee Member 1
Laura Brown
Committee Member 2
Leo Ureel
Committee Member 3
Patricia Ordonez
Abstract
Programmers typically rely on a keyboard and mouse for input, which poses significant challenges for individuals with motor impairments, limiting their ability
to effectively input programs. Voice-based programming offers a promising alternative,
enabling a more inclusive and accessible programming environment. Insights from interviews with motor-impaired programmers revealed that memorizing unnatural commands in existing voice-based programming systems led to frustration. In this work, we explore how programmers naturally speak a single line of code and present a comprehensive methodology for a voice programming system aimed at making programming more accessible for diverse users. To achieve this, we adopted a two-step pipeline. The first step focuses on recognizing single lines of spoken code by adapting a large pre-trained speech recognition model. By adapting the model with just one hour of spoken programs and leveraging existing natural English language data, we reduced the word error rate from 28.4% to 8.7%. Additional improvements were achieved by decoding with a domain-specific N-gram model and rescoring with a fine-tuned large language model tailored to programming languages, resulting in a WER of 5.5%. The second step involves translating the recognized text into the target line of code. Our approach to text-to-code translation is the first to address spoken programs, converting a single line of text to a single line of code, whereas current systems typically translate comments to blocks of code. We used a large language model known for generating code from comments and adapted it to learn how to generate single lines of code. This adaptation led to a significant improvement in the CodeBLEU score from 56.9% to 83.3% on our test set. In addition, when translating recognized transcripts to target code, our best-adapted model showed marked success. The CodeBLEU score improved from 53.7% to 76.7%, demonstrating the model’s ability to handle errors from the speech recognizer.
Recommended Citation
Nowrin, Sadia, "Programming by Voice", Open Access Dissertation, Michigan Technological University, 2024.
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons