Efficient supervised and semi-supervised approaches for affiliations disambiguation

Pascal Cuxac; Jean-Charles Lamirel; Valérie Bonvallot

doi:10.1007/s11192-013-1025-5

Article Dans Une Revue Scientometrics Année : 2013

Efficient supervised and semi-supervised approaches for affiliations disambiguation

(1) , (2) , (1)

1
2

Pascal Cuxac

Fonction : Auteur
PersonId : 179348
IdHAL : pascal-cuxac
ORCID : 0000-0002-6809-5654
IdRef : 165835257

Institut de l'information scientifique et technique

Jean-Charles Lamirel

Fonction : Auteur
PersonId : 8202
IdHAL : jean-charles-lamirel

Natural Language Processing : representations, inference and semantics

Valérie Bonvallot

Fonction : Auteur
PersonId : 1002263

Institut de l'information scientifique et technique

Résumé

The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... Therefore, the search of names of persons or of organizations is difficult as soon as a single name might appear in many different forms. This paper proposes two approaches to disambiguate on the affiliations of authors of scientific papers in bibliographic databases: the first way considers that a training dataset is available, and uses a Naive Bayes model. The second way assumes that there is no learning resource, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and the approach is already partially applied in a scientific survey department. However, our experiments also highlight that our approach has some limitations: it cannot process efficiently highly unbalanced data. Alternatives solutions are possible for future developments, particularly with the use of a recent clustering algorithm relying on feature maximization.

Mots clés

Clustering Classification automatique Texte Infométrie Affiliations Désambiguisation

Domaines

Sciences de l'information et de la communication Applications [stat.AP] Réseau de neurones [cs.NE]

Fichier principal

cuxac_lamirel_scientometrics.pdf (918.25 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Patricia Gautier : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00960435

Soumis le : mardi 18 mars 2014-11:03:42

Dernière modification le : dimanche 8 octobre 2023-04:10:29

Archivage à long terme le : mercredi 18 juin 2014-11:15:36

Dates et versions

hal-00960435 , version 1 (18-03-2014)

Licence

Paternité

Identifiants

HAL Id : hal-00960435 , version 1
DOI : 10.1007/s11192-013-1025-5

Citer

Pascal Cuxac, Jean-Charles Lamirel, Valérie Bonvallot. Efficient supervised and semi-supervised approaches for affiliations disambiguation. Scientometrics, 2013, 97 (1), pp.47-58. ⟨10.1007/s11192-013-1025-5⟩. ⟨hal-00960435⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA UNIV-LORRAINE LORIA LORIA-NLPKD INIST

262 Consultations

335 Téléchargements

Efficient supervised and semi-supervised approaches for affiliations disambiguation

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Altmetric

Partager