StegoType: Surface Typing from Egocentric Cameras
Document Type
Conference Proceeding
Publication Date
10-13-2024
Department
Department of Computer Science
Abstract
Text input is a critical component of any general purpose computing system, yet efficient and natural text input remains a challenge in AR and VR. Headset based hand-tracking has recently become pervasive among consumer VR devices and affords the opportunity to enable touch typing on virtual keyboards. We present an approach for decoding touch typing on uninstrumented flat surfaces using only egocentric camera-based hand-tracking as input. While egocentric hand-tracking accuracy is limited by issues like self occlusion and image fidelity, we show that a sufficiently diverse training set of hand motions paired with typed text can enable a deep learning model to extract signal from this noisy input. Furthermore, by carefully designing a closed-loop data collection process, we can train an end-to-end text decoder that accounts for natural sloppy typing on virtual keyboards. We evaluate our work with a user study (n=18) showing a mean online throughput of 42.4 WPM with an uncorrected error rate (UER) of 7% with our method compared to a physical keyboard baseline of 74.5 WPM at 0.8% UER, showing progress towards unlocking productivity and high throughput use cases in AR/VR.
Publication Title
UIST Adjunct 2024 - Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology
ISBN
[9798400707186]
Recommended Citation
Richardson, M.,
Botros, F.,
Shi, Y.,
Snow, B.,
Guo, P.,
Zhang, L.,
Dong, J.,
Vertanen, K.,
Ma, S.,
&
Wang, R.
(2024).
StegoType: Surface Typing from Egocentric Cameras.
UIST Adjunct 2024 - Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology.
http://doi.org/10.1145/3672539.3686762
Retrieved from: https://digitalcommons.mtu.edu/michigantech-p2/1287