Leveraging Large Pretrained Models for Line-by-Line Spoken Program Recognition
Document Type
Conference Proceeding
Publication Date
3-18-2024
Department
Department of Computer Science
Abstract
Spoken programming languages significantly differ from natural English due to the inherent variability in speech patterns among programmers and the wide range of programming constructs. In this paper, we employ Wav2Vec 2.0 to enhance the accuracy of transcribing spoken programming languages like Java. Adapting a model with just one hour of spoken programs that had prior exposure to a substantial amount of natural English-labeled data, we achieve a word error rate (WER) of 8.7%, surpassing the high 28.4% WER of a model trained solely on natural English. Decoding with a domain-specific N-gram model and subsequently rescoring the N-best list with a fine-tuned large language model tailored to the programming domain resulted in a WER of 5.5% on our test set.
Publication Title
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Recommended Citation
Nowrin, S.,
&
Vertanen, K.
(2024).
Leveraging Large Pretrained Models for Line-by-Line Spoken Program Recognition.
ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 12216-12220.
http://doi.org/10.1109/ICASSP48485.2024.10448435
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/887