Speech recognition technology continues to improve, but users still experience significant difficulty using the software to create and edit documents. The reported composition speed using speech software is only between 8 and 15 words per minute [Proc CHI 99 (1999) 568; Universal Access Inform Soc 1 (2001) 4], much lower than people's normal speaking speed of 125-150 words per minute. What causes the huge gap between natural speaking and composing using speech recognition? Is it possible to narrow the gap and make speech recognition more promising to users? In this paper we discuss users' learning processes and the difficulties they experience as related to continuous dictation tasks using state of the art Automatic Speech Recognition (ASR) software. Detailed data was collected for the first time on various aspects of the three activities involved in document composition tasks: dictation, navigation, and correction. The results indicate that navigation and error correction accounted for big chunk of the dictation task during the early stages of interaction. As users gained more experience, they became more efficient at dictation, navigation and error correction. However, the major improvements in productivity were due to dictation quality and the usage of navigation commands. These results provide insights regarding the factors that cause the gap between user expectation with speech recognition software and the reality of use, and how those factors changed with experience. Specific advice is given to researchers as to the most critical issues that must be addressed.
All Science Journal Classification (ASJC) codes
- Human-Computer Interaction