Skip to Main content Skip to Navigation
Preprints, Working Papers, ...

Content-based subject classification at article level in biomedical context

Abstract : Subject classification is an important task to analyze scholarly publications. In general, mainly two kinds of approaches are used: classification at a journal level and classification at the article level. We propose a mixed approach, leveraging on embeddings technique in NLP to train classifiers with article metadata (title, abstract, keywords in particular) labelled with the journal-level classification FoR (Fields of Research) and then apply these classifiers at the article level. We use this approach in the context of biomedical publications using metadata from Pubmed. Fasttext classifiers are trained with FoR codes and used to classify publications based on their available metadata. Results show that using a stratification sampling strategy for training help reduce the bias due to unbalanced field distribution. An implementation of the method is proposed on the repository https://github.com/dataesr/scientific_tagger
Document type :
Preprints, Working Papers, ...
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03212544
Contributor : Eric Jeangirard Connect in order to contact the contributor
Submitted on : Thursday, April 29, 2021 - 4:47:04 PM
Last modification on : Friday, April 30, 2021 - 9:41:36 AM
Long-term archiving on: : Friday, July 30, 2021 - 7:01:58 PM

Files

scientific_tagger.pdf
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution 4.0 International License

Identifiers

  • HAL Id : hal-03212544, version 1

Citation

Eric Jeangirard. Content-based subject classification at article level in biomedical context. 2021. ⟨hal-03212544⟩

Share

Metrics

Record views

56

Files downloads

14