Skip to Main content Skip to Navigation
Conference papers

Machine Learning under the light of Phraseology expertise: use case of presidential speeches, De Gaulle -Hollande (1958-2016)

Mélanie Ducoffe 1 Damon Mayaffre 2 Frédéric Precioso 1 Frédéric Lavigne 3 Laurent Vanni 2 A Tre-Hardy 1
1 Laboratoire d'Informatique, Signaux, et Systèmes de Sophia-Antipolis (I3S) / Projet MinD
Laboratoire I3S - SPARKS - Scalable and Pervasive softwARe and Knowledge Systems
2 BCL, équipe Logométrie : corpus, traitements, modèles
BCL - Bases, Corpus, Langage (UMR 7320 - UNS / CNRS)
3 BCL, équipe Langage et Cognition
BCL - Bases, Corpus, Langage (UMR 7320 - UNS / CNRS)
Abstract : Author identification and text genesis have always been a hot topic for the statistical analysis of textual data community. Recent advances in machine learning have seen the emergence of machines competing state-of-the-art computational linguistic methods on specific natural language processing tasks (part-of-speech tagging, chunking and parsing, etc). In particular, Deep Linguistic Architectures are based on the knowledge of language speci-ficities such as grammar or semantic structure. These models are considered as the most competitive thanks to their assumed ability to capture syntax. However if those methods have proven their efficiency, their underlying mechanisms, both from a theoretical and an empirical analysis point of view, remains hard both to explicit and to maintain stable, which restricts their area of applications. Our work is enlightening mechanisms involved in deep architectures when applied to Natural Language Processing (NLP) tasks. The Query-By-Dropout-Committee (QBDC) algorithm is an active learning technique we have designed for deep architectures: it selects iteratively the most relevant samples to be added to the training set so that the model is improved the most when built from the new training set. However in this article, we do not go into details of the QBDC algorithm-as it has already been studied in the original QBDC article-but we rather confront the relevance of the sentences chosen by our active strategy to state of the art phraseology techniques. We have thus conducted experiments on the presidential discourses from presidents C. De Gaulle, N. Sarkozy and F. Hollande in order to exhibit the interest of our active deep learning method in terms of discourse author identification and to analyze the extracted linguistic patterns by our artificial approach compared to standard phraseology techniques.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-01343209
Contributor : Damon Mayaffre <>
Submitted on : Wednesday, September 7, 2016 - 10:13:30 PM
Last modification on : Tuesday, May 26, 2020 - 6:50:57 PM

File

JADT2016_Ducoffe_et_al.pdf
Explicit agreement for this submission

Identifiers

  • HAL Id : hal-01343209, version 2

Collections

Citation

Mélanie Ducoffe, Damon Mayaffre, Frédéric Precioso, Frédéric Lavigne, Laurent Vanni, et al.. Machine Learning under the light of Phraseology expertise: use case of presidential speeches, De Gaulle -Hollande (1958-2016). JADT 2016 - Statistical Analysis of Textual Data, Damon Mayaffre; Céline Poudat; Laurent Vanni; Véronique Magri; Peter Follette; Caroline Daire, Jun 2016, Nice, France. pp.157-168. ⟨hal-01343209v2⟩

Share

Metrics

Record views

566

Files downloads

273