Toward the Integration of Natural Language Processing and Automatic Speech Recognition: Using Morpho-Syntax and Pragmatics for Transcription

Stéphane Huet 1 Gwénolé Lecorvé 1 Guillaume Gravier 1 Pascale Sébillot 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : In the framework of multimedia analysis and interaction, speech and language processing plays a major role. Many multimedia documents contain speech from which high level semantic information can be extracted, as in broadcast news or sports videos, with typical applications such as spoken document indexing, topic tracking and summarization. Hence, many multimedia document analysis applications require a collaboration between speech recognition and natural language processing (NLP) techniques. As NLP techniques are traditionally designed for text analysis, this combination can be seen as a mul-timodal fusion issue where the two modalities are audio and text. However, most of the time, both modalities are considered sequentially. A typical approach consists in automatically transcribing the audio track before analyzing the output-here considered as a regular text-with NLP methods. Independently processing the two modalities clearly seems suboptimal. This chapter focuses on recent research work toward a better integration between automatic speech recognition (ASR) and NLP for the analysis of spoken multime-dia documents with the goal of achieving a better transcription of multimedia streams.
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02021921
Contributor : Stéphane Huet <>
Submitted on : Saturday, February 16, 2019 - 9:46:55 PM
Last modification on : Wednesday, February 27, 2019 - 2:47:22 PM
Long-term archiving on : Friday, May 17, 2019 - 2:36:57 PM

File

chapter-Multimodal-Processing-...
Files produced by the author(s)

Identifiers

Citation

Stéphane Huet, Gwénolé Lecorvé, Guillaume Gravier, Pascale Sébillot. Toward the Integration of Natural Language Processing and Automatic Speech Recognition: Using Morpho-Syntax and Pragmatics for Transcription. Petros Maragos, Alexandros Potamianos, Patrick Gros. Multimodal Processing and Interaction: Audio, Video, Text, Springer US, pp.201-218, 2008, 978-0-387-76316-3. ⟨10.1007/978-0-387-76316-3_9⟩. ⟨hal-02021921⟩

Share

Metrics

Record views

32

Files downloads

16