Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates

Sharon Goldwater; Dan Jurafsky; Christopher D. Manning

doi:10.1016/j.specom.2009.10.001

Article Dans Une Revue Speech Communication Année : 2010

Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates

, ,

Sharon Goldwater

Fonction : Auteur correspondant
PersonId : 905463

Connectez-vous pour contacter l'auteur

Dan Jurafsky

Fonction : Auteur

Christopher D. Manning

Fonction : Auteur

Résumé

Despite years of speech recognition research, little is known about which words tend to be misrecognized and why. Previous work has shown that errors increase for infrequent words, short words, and very loud or fast speech, but many other presumed causes of error (e.g., nearby disfluencies, turn-initial words, phonetic neighborhood density) have never been carefully tested. The reasons for the huge differences found in error rates between speakers also remain largely mysterious. Using a mixed-effects regression model, we investigate these and other factors by analyzing the errors of two state-of-the-art recognizers on conversational speech. Words with higher error rates include those with extreme prosodic characteristics, those occurring turn-initially or as discourse markers, and : acoustically similar words that also have similar language model probabilities. Words preceding disfluent interruption points (first repetition tokens and words before fragments) also have higher error rates. Finally, even after accounting for other factors, speaker differences cause enormous variance in error rates, suggesting that speaker error rate variance is not fully explained by differences in word choice, fluency, or prosodic characteristics. We also propose that doubly confusable pairs, rather than high neighborhood density, may better explain phonetic neighborhood errors in human speech processing.

Mots clés

speech recognition conversational error analysis individual differences mixed-effects model

Fichier principal

PEER_stage2_10.1016%2Fj.specom.2009.10.001.pdf (1.71 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Hal Peer : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00608401

Soumis le : mercredi 13 juillet 2011-02:56:24

Dernière modification le : mercredi 13 juillet 2011-02:56:24

Archivage à long terme le : dimanche 4 décembre 2016-09:23:13

Dates et versions

hal-00608401 , version 1 (13-07-2011)

Identifiants

HAL Id : hal-00608401 , version 1
DOI : 10.1016/j.specom.2009.10.001

Citer

Sharon Goldwater, Dan Jurafsky, Christopher D. Manning. Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates. Speech Communication, 2010, 52 (3), pp.181. ⟨10.1016/j.specom.2009.10.001⟩. ⟨hal-00608401⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

PEER

59 Consultations

535 Téléchargements

Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates

Résumé

Mots clés

Dates et versions

Identifiants

Citer

Exporter

Collections

Altmetric

Partager