The Effects of Automatic Speech Recognition Quality on Human Transcription Latency
The 17th International ACM SIGACCESS Conference on Computers and Accessibility - Student Research Competition (ASSETS-SRC 2015)
Lisbon, Portugal, October 26-28, 2015
Converting speech to text quickly is the fundamental task for making aural content accessible to deaf and hard of hearing. Despite high cost, this is done by human captionists, as automatic speech recognition (ASR) does not give satisfactory performance in real world settings. Offering ASR output to captionists as a starting point seems more facile and economical, yet the effectiveness of this approach is clearly dependent on the quality of ASR because fixing inaccurate ASR output may take longer than producing the transcriptions without ASR support. In this paper, we empirically study how the time required by captionists to produce transcriptions from partially correct ASR output varies based on the accuracy of the ASR output. Our studies with 160 participants recruited on Amazon's Mechanical Turk indicate that starting with the ASR output is worse unless it is sufficiently accurate (Word Error Rate (WER) is under 30%).
Conference Manager (V2.61.0 - Rev. 3862)