LEVERAGING LARGE PRETRAINED MODELS FOR LINE-BY-LINE SPOKEN PROGRAM RECOGNITION

Document Type

Conference Proceeding

Publication Date

3-18-2024

Department

Department of Computer Science

Abstract

Spoken programming languages significantly differ from natural English due to the inherent variability in speech patterns among programmers and the wide range of programming constructs. In this paper, we employ Wav2Vec 2.0 to enhance the accuracy of transcribing spoken programming languages like Java. Adapting a model with just one hour of spoken programs that had prior exposure to a substantial amount of natural English-labeled data, we achieve a word error rate (WER) of 8.7%, surpassing the high 28.4% WER of a model trained solely on natural English. Decoding with a domain-specific N-gram model and subsequently rescoring the N-best list with a fine-tuned large language model tailored to the programming domain resulted in a WER of 5.5% on our test set.

Publication Title

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ISBN

[9798350344851]

Share

COinS