Régularisation spatiale de représentations distribuées de mots

Paul Mousset; Yoann Pitarch; Lynda Tamine

Communication Dans Un Congrès Année : 2019

Régularisation spatiale de représentations distribuées de mots

(1, 2) , (2) , (2)

1
2

Paul Mousset

Fonction : Auteur
PersonId : 1093793
IdRef : 234265795

Atos Intégration SAS

Recherche d’Information et Synthèse d’Information

Yoann Pitarch

Fonction : Auteur
PersonId : 1108646
ORCID : 0000-0002-1508-5436
IdRef : 153248017

Recherche d’Information et Synthèse d’Information

Lynda Tamine

Fonction : Auteur
PersonId : 744669
IdHAL : lynda-tamine-lechani
ORCID : 0000-0002-3615-8032
IdRef : 110204875

Recherche d’Information et Synthèse d’Information

Résumé

Stimulated by the heavy use of smartphones, the joint use of textual and spatial data in space-textual objects (eg., tweets) has become the mainstay of many applications, such as the finding of places of interest. These tasks are fundamentally based on the representation of spatial objects and the definition of matching functions. In this article, we focus on the representation of these objects. More precisely, reinforced by the success of distributed word representations approaches, we propose to regularize word embeddings that can be combined to construct object representations, using their spatial distributions. The purpose is to reveal possible local semantic relationships between words and the multiplicity of meanings of the same word. Experiments based on a semantic location prediction task demonstrate that the integration of our method of spatial retrofitting of word embeddings into a basic matching model provides significant improvements over strong baselines.

Stimulée par l’usage intensif des téléphones mobiles, l’exploitation conjointe des don-nées textuelles et des données spatiales présentes dans les objets spatio-textuels (p. ex. tweets)est devenue la pierre angulaire à de nombreuses applications comme la recherche de lieux d’attraction. Du point de vue scientifique, ces tâches reposent de façon critique sur la représentation d’objets spatiaux et la définition de fonctions d’appariement entre ces objets. Dans cet article,nous nous intéressons au problème de représentation de ces objets. Plus spécifiquement, confortés par le succès des représentations distribuées basées sur les approches neuronales, nous proposons de régulariser les représentations distribuées de mots (c.-à-d. plongements lexicaux ou word embeddings), pouvant être combinées pour construire des représentations d’objets,grâce à leurs répartitions spatiales. L’objectif sous-jacent est de révéler d’éventuelles relations sémantiques locales entre mots ainsi que la multiplicité des sens d’un même mot. Les expérimentations basées sur une tâche de recherche d’information qui consiste à retourner le lieu physique faisant l’objet (sujet) d’un géo-texte montrent que l’intégration notre méthode de régularisation spatiale de représentations distribuées de mots dans un modèle d’appariement de base permet d’obtenir des améliorations significatives par rapport aux modèles de référence.

Mots clés

retrofitting geo-text word embeddings

Plongement lexical Apprentissage hors-ligne Géo-texte

Domaines

Informatique et langage [cs.CL] Apprentissage [cs.LG]

Fichier principal

mousset_24916.pdf (613.98 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Open Archive Toulouse Archive Ouverte (OATAO) : Connectez-vous pour contacter le contributeur

https://hal.science/hal-02494102

Soumis le : vendredi 28 février 2020-14:23:39

Dernière modification le : mercredi 17 janvier 2024-10:25:48

Archivage à long terme le : vendredi 29 mai 2020-15:19:27

Dates et versions

hal-02494102 , version 1 (28-02-2020)

Identifiants

HAL Id : hal-02494102 , version 1
OATAO : 24916

Citer

Paul Mousset, Yoann Pitarch, Lynda Tamine. Régularisation spatiale de représentations distribuées de mots. 16ème Conférence francophone en Recherche d'Information et Applications (CORIA 2019), Apr 2019, Lyon, France. pp.1-17. ⟨hal-02494102⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-TLSE2 CNRS SMS UT1-CAPITOLE IRIT IRIT-IRIS IRIT-GD TOULOUSE-INP UNIV-UT3 UT3-TOULOUSEINP

45 Consultations

39 Téléchargements

Régularisation spatiale de représentations distribuées de mots

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager