Learning from Multiple Partially Observed Views -- an Application to Multilingual Text Categorization

Massih-Reza Amini; Nicolas Usunier; Cyril Goutte

Communication Dans Un Congrès Année : 2009

Learning from Multiple Partially Observed Views -- an Application to Multilingual Text Categorization

(1) , (1) ,

Massih-Reza Amini

Fonction : Auteur
PersonId : 747054
IdHAL : massih-reza-amini
ORCID : 0000-0001-9032-4233
IdRef : 132277042

Machine Learning and Information Retrieval

Nicolas Usunier

Fonction : Auteur
PersonId : 933831

Machine Learning and Information Retrieval

Cyril Goutte

Fonction : Auteur

Résumé

We address the problem of learning classifiers when observations have multiple views, some of which may not be observed for all examples. We assume the existence of view generating functions which may complete the missing views in an approximate way. This situation corresponds for example to learning text classifiers from multilingual collections where documents are not available in all languages. In that case, Machine Translation (MT) systems may be used to translate each document in the missing languages. We derive a generalization error bound for classifiers learned on examples with multiple artificially created views. Our result uncovers a trade-off between the size of the training set, the number of views, and the quality of the view generating functions. As a consequence, we identify situations where it is more interesting to use multiple views for learning instead of classical single view learning. An extension of this framework is a natural way to leverage unlabeled multi-view data in semi-supervised learning. Experimental results on a subset of the Reuters RCV1/RCV2 collections support our findings by showing that additional views obtained from MT may significantly improve the classification performance in the cases identified by our trade-off.

Domaines

Informatique [cs]

Lip6 Publications : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01297947

Soumis le : mardi 5 avril 2016-10:56:31

Dernière modification le : jeudi 14 mars 2024-14:40:45

Dates et versions

hal-01297947 , version 1 (05-04-2016)

Identifiants

HAL Id : hal-01297947 , version 1

Citer

Massih-Reza Amini, Nicolas Usunier, Cyril Goutte. Learning from Multiple Partially Observed Views -- an Application to Multilingual Text Categorization. Advances in Neural Information Processing Systems, Dec 2009, Vancouver, Canada. ⟨hal-01297947⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UPMC CNRS LIP6 SORBONNE-UNIVERSITE SU-SCIENCES

91 Consultations

0 Téléchargements

Learning from Multiple Partially Observed Views -- an Application to Multilingual Text Categorization

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager