Skip to Main content Skip to Navigation
New interface
Journal articles

Spoken document representations for probabilistic retrieval

Abstract : This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifcations of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques.
Complete list of metadata

Cited literature [25 references]  Display  Hide  Download
Contributor : pierre jourlin Connect in order to contact the contributor
Submitted on : Thursday, June 13, 2019 - 11:29:47 AM
Last modification on : Wednesday, June 16, 2021 - 6:14:01 PM


Publisher files allowed on an open archive


  • HAL Id : hal-02152860, version 1



Pierre Jourlin, Sue E Johnson, Karen Spärck Jones, Philip C. Woodland. Spoken document representations for probabilistic retrieval. Speech Communication, 2000. ⟨hal-02152860⟩



Record views


Files downloads