Efficient supervised and semi-supervised approaches for affliations disambiguation

Abstract : The disambiguation of named entities is a challenge in many elds such as sciento- metrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... So the search of names of persons or of organization is di cult, a single name can appear in di erent forms. This paper proposes two approaches to disambiguate on the a liations of authors of sci- enti c papers in bibliographic databases: the rst way, considers that we have a training corpus, and uses a Naive Bayesian model. The second way assumes that we have not re- source learning, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and are already partially applied in a scienti c survey department. However, we aware that our approach may have limitations: we can't process e ciently highly unbalanced data but solutions are possible for future developments.
Type de document :
Communication dans un congrès
13th COLLNET Meeting, Oct 2012, Seoul, North Korea. 10 p., 2012
Liste complète des métadonnées

Littérature citée [27 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-00956386
Contributeur : Patricia Gautier <>
Soumis le : jeudi 6 mars 2014 - 14:51:38
Dernière modification le : mardi 24 avril 2018 - 13:36:51
Document(s) archivé(s) le : vendredi 6 juin 2014 - 10:55:34

Fichier

cuxac_lamirel_collnet12.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-00956386, version 1

Collections

Citation

Pascal Cuxac, Valérie Bonvallot, Jean-Charles Lamirel. Efficient supervised and semi-supervised approaches for affliations disambiguation. 13th COLLNET Meeting, Oct 2012, Seoul, North Korea. 10 p., 2012. 〈hal-00956386〉

Partager

Métriques

Consultations de la notice

448

Téléchargements de fichiers

104