What About Sequential Data Mining Techniques to Identify Linguistic Patterns for Stylistics?

Solen Quiniou 1 Peggy Cellier 2 Thierry Charnois 3 Dominique Legallois 1
2 LIS - Logical Information Systems
IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
3 Equipe CODAG - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : In this paper, we study the use of data mining techniques for stylistic analysis, from a linguistic point of view, by considering emerging sequential patterns. First, we show that mining sequential patterns of words with gap constraints gives new relevant linguistic patterns with respect to patterns built on n-grams. Then, we investigate how sequential patterns of itemsets can provide more generic linguistic patterns. We validate our approach from a linguistic point of view by conducting experiments on three corpora of various types of French texts (Poetry, Letters, and Fictions). By considering more particularly poetic texts, we show that characteristic linguistic patterns can be identified using data mining techniques. We also discuss how to improve our proposed approach so that it can be used more efficiently for linguistic analyses.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-00675578
Contributor : Solen Quiniou <>
Submitted on : Thursday, March 1, 2012 - 2:46:31 PM
Last modification on : Thursday, February 7, 2019 - 5:27:50 PM
Long-term archiving on: Monday, November 26, 2012 - 10:21:32 AM

File

cicling2012.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-00675578, version 1

Citation

Solen Quiniou, Peggy Cellier, Thierry Charnois, Dominique Legallois. What About Sequential Data Mining Techniques to Identify Linguistic Patterns for Stylistics?. International Conference on Intelligent Text Processing and Computational Linguistics (CICLing'12), Mar 2012, New Delhi, India. pp.166-177. ⟨hal-00675578⟩

Share

Metrics

Record views

874

Files downloads

1287