Cross-Lingual Spoken Language Understanding from Unaligned Data using Discriminative Classification Models and Machine Translation

Abstract : This paper investigates several approaches to bootstrapping a new spoken language understanding (SLU) component in a target language given a large dataset of semantically-annotated utterances in some other source language. The aim is to reduce the cost associated with porting a spoken dialogue system from one language to another by minimising the amount of data required in the target language. Since word-level semantic annotations are costly, Semantic Tuple Classifiers (STCs) are used in conjunction with statistical machine translation models both of which are trained from unaligned data to further reduce development time. The paper presents experiments in which a French SLU component in the tourist information domain is bootstrapped from English data. Results show that training STCs on automatically translated data produced the best performance for predicting the utterance's dialogue act type, however individual slot/value pairs are best predicted by training STCs on the source language and using them to decode translated utterances. Index Terms: spoken dialogue system, spoken language understanding , portability, bootstrapping
Document type :
Conference papers
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01318164
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Thursday, May 19, 2016 - 12:11:28 PM
Last modification on : Wednesday, May 15, 2019 - 10:12:03 AM

Identifiers

  • HAL Id : hal-01318164, version 1

Collections

Citation

Fabrice Lefèvre, François Mairesse, Steve Young. Cross-Lingual Spoken Language Understanding from Unaligned Data using Discriminative Classification Models and Machine Translation. INTERSPEECH, Sep 2010, Makuhari, Japan. ⟨hal-01318164⟩

Share

Metrics

Record views

30