InTeReC: In-text Reference Corpus for Applying Natural Language Processing to Bibliometrics - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

InTeReC: In-text Reference Corpus for Applying Natural Language Processing to Bibliometrics

Résumé

Bibliometrics is more and more interested in the full text processing and the study of the structure of scientific papers. The contexts of in-text references present in articles are particularly relevant for such studies. This work describes the construction of the InTeReC dataset, which is an in-text reference corpus that aims to promote experimental reproducibility and to provide a standard dataset for further research. The InTeReC dataset is a set of sentences containing in-text references together with all the data necessary for their recontextualization in papers using standard CSV format. This should encourage the implementation of natural language processing tools for Bibliometric studies and related research in information retrieval and visualization.
Fichier non déposé

Dates et versions

hal-01742178 , version 1 (23-03-2018)

Identifiants

  • HAL Id : hal-01742178 , version 1

Citer

Marc Bertin, Iana Atanassova. InTeReC: In-text Reference Corpus for Applying Natural Language Processing to Bibliometrics. 7th International Workshop on Bibliometric-enhanced Information Retrieval (BIR 2018) to be held as part of the 40th European Conference on Information Retrieval (ECIR), Mar 2018, Grenoble, France. ⟨hal-01742178⟩
152 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More