Skip to Main content Skip to Navigation
Conference papers

RFreeStem: A multilanguage rule-free stemmer

Abstract : With the large expansion of available textual data, text mining has become of specialinterest. Due to their unstructured nature, such data require important preprocessing steps.Among them, stemming is a popularly used preprocessing method that extracts the root of thewords. However, the most popular algorithms are based on the application of rules, and there-fore highly language-related. We propose a new approach, the RFreeStem, that is rather basedon corpus and can therefore be applied on many languages.
Complete list of metadata

Cited literature [30 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02891675
Contributor : Open Archive Toulouse Archive Ouverte (oatao) Connect in order to contact the contributor
Submitted on : Tuesday, July 7, 2020 - 9:30:43 AM
Last modification on : Wednesday, June 9, 2021 - 10:00:30 AM
Long-term archiving on: : Friday, November 27, 2020 - 12:13:38 PM

File

baril_26186.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02891675, version 1
  • OATAO : 26186

Citation

Xavier Baril, Oihana Coustié, Josiane Mothe, Olivier Teste. RFreeStem: A multilanguage rule-free stemmer. 37e Congres Informatique des Organisations et Systemes d'Information et de Decision (INFORSID 2019), Jun 2019, Paris, France. pp.12-29. ⟨hal-02891675⟩

Share

Metrics

Record views

40

Files downloads

251