Extraction de propriétés de produits

Patrick Marty; Tian Tian; Isabelle Tellier

Communication Dans Un Congrès Année : 2014

Extraction de propriétés de produits

(1) , (2, 3, 4, 5, 1) , (2, 3, 4, 5)

1
2
3
4
5

Patrick Marty

Fonction : Auteur

Leguide.com

Tian Tian

Fonction : Auteur
PersonId : 175382
IdHAL : tian-tian
IdRef : 241824850

Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094

Université Paris Sciences et Lettres

Université Sorbonne Paris Cité

École normale supérieure - Paris

Leguide.com

Isabelle Tellier

Fonction : Auteur
PersonId : 10815
IdHAL : isabelle-tellier
ORCID : 0000-0002-0977-2926
IdRef : 154913634

Lattice - Langues, Textes, Traitements informatiques, Cognition - UMR 8094

Université Paris Sciences et Lettres

Université Sorbonne Paris Cité

École normale supérieure - Paris

Résumé

In the work presented here, we try to automatically extract some product properties from descriptive texts provided by a merchant website. The constitution of an annotated reference corpus reveals some problems, not only due to the texts but also to the specificities of the task. To handle it, two distinct approaches have been tested : an extraction method based on dictionaries and a machine learning approach making use of CRFs (Conditional Random Fields), for which a large number of models have been tried. The results of our experiments outline the advantages and drawbacks of these two methods

Le travail présenté dans cet article vise à extraire automatiquement certaines carac-téristiques de produits à partir de descriptions textuelles fournies par un site marchand. La constitution d'un corpus de référence annoté révèle certains problèmes, provenant à la fois des textes et des particularités de la tâche. Pour l'aborder, nous avons testé deux approches : une méthode d'extraction fondée sur des dictionnaires et une méthode d'apprentissage automatique avec les CRF (Champs Aléatoires Conditionnels), pour lesquels nous avons essayé un grand nombre de modèles. Les résultats de nos expériences montrent les avantages et limites de ces deux méthodes.

Mots clés

product descriptions information extraction machine learning CRFs

descriptions de produits extraction d'information apprentissage automatique CRF

Domaines

Recherche d'information [cs.IR]

Fichier principal

CORIA-11.pdf (459.61 Ko)

Origine : Fichiers éditeurs autorisés sur une archive ouverte

Tian Tian : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01473389

Soumis le : mardi 28 février 2017-14:45:20

Dernière modification le : vendredi 19 avril 2024-16:18:57

Archivage à long terme le : lundi 29 mai 2017-12:14:17

Dates et versions

hal-01473389 , version 1 (28-02-2017)

Identifiants

HAL Id : hal-01473389 , version 1

Citer

Patrick Marty, Tian Tian, Isabelle Tellier. Extraction de propriétés de produits. COnférence en Recherche d’Information et Applications (CORIA 2014), Mar 2014, Nancy, France. pp.121-136. ⟨hal-01473389⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

ENS-PARIS CNRS UNIV-PARIS3 LATTICE PSL

95 Consultations

77 Téléchargements

Extraction de propriétés de produits

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager