Skip to Main content Skip to Navigation
Journal articles

Efficient supervised and semi-supervised approaches for affiliations disambiguation

Abstract : The disambiguation of named entities is a challenge in many fields such as scientometrics, social networks, record linkage, citation analysis, semantic web...etc. The names ambiguities can arise from misspelling, typographical or OCR mistakes, abbreviations, omissions... Therefore, the search of names of persons or of organizations is difficult as soon as a single name might appear in many different forms. This paper proposes two approaches to disambiguate on the affiliations of authors of scientific papers in bibliographic databases: the first way considers that a training dataset is available, and uses a Naive Bayes model. The second way assumes that there is no learning resource, and uses a semi-supervised approach, mixing soft-clustering and Bayesian learning. The results are encouraging and the approach is already partially applied in a scientific survey department. However, our experiments also highlight that our approach has some limitations: it cannot process efficiently highly unbalanced data. Alternatives solutions are possible for future developments, particularly with the use of a recent clustering algorithm relying on feature maximization.
Complete list of metadatas

Cited literature [31 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-00960435
Contributor : Patricia Gautier <>
Submitted on : Tuesday, March 18, 2014 - 11:03:42 AM
Last modification on : Wednesday, March 18, 2020 - 3:57:16 PM
Document(s) archivé(s) le : Wednesday, June 18, 2014 - 11:15:36 AM

File

cuxac_lamirel_scientometrics.p...
Files produced by the author(s)

Identifiers

Collections

Citation

Pascal Cuxac, Jean-Charles Lamirel, Valérie Bonvallot. Efficient supervised and semi-supervised approaches for affiliations disambiguation. Scientometrics, Springer Verlag, 2013, 97 (1), pp.47-58. ⟨10.1007/s11192-013-1025-5⟩. ⟨hal-00960435⟩

Share

Metrics

Record views

598

Files downloads

676