Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Content-based subject classification at article level in biomedical context

Abstract : Subject classification is an important task to analyze scholarly publications. In general, mainly two kinds of approaches are used: classification at a journal level and classification at the article level. We propose a mixed approach, leveraging on embeddings technique in NLP to train classifiers with article metadata (title, abstract, keywords in particular) labelled with the journal-level classification FoR (Fields of Research) and then apply these classifiers at the article level. We use this approach in the context of biomedical publications using metadata from Pubmed. Fasttext classifiers are trained with FoR codes and used to classify publications based on their available metadata. Results show that using a stratification sampling strategy for training help reduce the bias due to unbalanced field distribution. An implementation of the method is proposed on the repository
Document type :
Preprints, Working Papers, ...
Complete list of metadata
Contributor : Eric Jeangirard Connect in order to contact the contributor
Submitted on : Thursday, April 29, 2021 - 4:47:04 PM
Last modification on : Friday, April 30, 2021 - 9:41:36 AM
Long-term archiving on: : Friday, July 30, 2021 - 7:01:58 PM


Files produced by the author(s)


Distributed under a Creative Commons Attribution 4.0 International License


  • HAL Id : hal-03212544, version 1


Eric Jeangirard. Content-based subject classification at article level in biomedical context. 2021. ⟨hal-03212544⟩



Record views


Files downloads