Classification de texte enrichie à l'aide de motifs séquentiels

Abstract : Sequential pattern mining for text classification Most methods in text classification rely on contiguous sequences of words as features. Indeed, if we want to take non-contiguous (gappy) patterns into account, the number of features increases exponentially with the size of the text. Furthermore , most of these patterns will be mere noise. To overcome both issues, sequential pattern mining can be used to efficiently extract a smaller number of relevant, non-contiguous, features. In this paper, we compare the use of constrained frequent pattern mining and δ-free patterns as features for text classification. We show experimentally the advantages and disadvantages of each type of patterns.
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01168500
Contributor : Pierre Holat <>
Submitted on : Friday, June 26, 2015 - 1:06:19 AM
Last modification on : Friday, November 22, 2019 - 1:48:26 PM
Long-term archiving on: Friday, October 9, 2015 - 6:05:18 PM

File

taln-2015-paper.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01168500, version 1

Collections

Citation

Pierre Holat, Nadi Tomeh, Thierry Charnois. Classification de texte enrichie à l'aide de motifs séquentiels. TALN 2015, Jun 2015, Caen, France. ⟨hal-01168500⟩

Share

Metrics

Record views

372

Files downloads

426