InTeReC: In-text Reference Corpus for Applying Natural Language Processing to Bibliometrics

Abstract : Bibliometrics is more and more interested in the full text processing and the study of the structure of scientific papers. The contexts of in-text references present in articles are particularly relevant for such studies. This work describes the construction of the InTeReC dataset, which is an in-text reference corpus that aims to promote experimental reproducibility and to provide a standard dataset for further research. The InTeReC dataset is a set of sentences containing in-text references together with all the data necessary for their recontextualization in papers using standard CSV format. This should encourage the implementation of natural language processing tools for Bibliometric studies and related research in information retrieval and visualization.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01742178
Contributor : Marc Bertin <>
Submitted on : Friday, March 23, 2018 - 11:40:13 PM
Last modification on : Friday, April 26, 2019 - 9:42:44 AM

Identifiers

  • HAL Id : hal-01742178, version 1

Citation

Marc Bertin, Iana Atanassova. InTeReC: In-text Reference Corpus for Applying Natural Language Processing to Bibliometrics. 7th International Workshop on Bibliometric-enhanced Information Retrieval (BIR 2018) to be held as part of the 40th European Conference on Information Retrieval (ECIR), Mar 2018, Grenoble, France. ⟨hal-01742178⟩

Share

Metrics

Record views

77