Vers une analyse des différences interlinguistiques entre les genres textuels : étude de cas basée sur les n-grammes et l'analyse factorielle des correspondances

Abstract : The aim of the present study is to assess the use of n-grams and Correspondence Analysis (CA) to compare genres in cross-linguistic studies. The study is based on an English-French bilingual corpus made up of original (i.e. non-translated) texts, representing three genres: European parliamentary debates, newspaper editorials and academic articles. First, 2- to 4-grams are extracted in each language. Second, the most frequent 1000 n-grams for each n-gram length and in each language are analyzed by means of CA with a view to determining which n-grams are particularly salient in the genres examined. Finally, n-grams are manually classified into a range of categories, such as stance expressions, discourse markers and referential expressions. The results show that the n-gram approach makes it possible to uncover typical features of the three genres investigated, as well as interesting contrasts between English and French.
Liste complète des métadonnées

Cited literature [13 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01426820
Contributor : Natalia Grabar <>
Submitted on : Wednesday, January 4, 2017 - 10:56:18 PM
Last modification on : Tuesday, July 3, 2018 - 11:47:18 AM
Document(s) archivé(s) le : Wednesday, April 5, 2017 - 3:28:28 PM

File

lefer-TALN2016short.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01426820, version 1

Collections

Citation

Marie-Aude Lefer, Yves Bestgen, Natalia Grabar. Vers une analyse des différences interlinguistiques entre les genres textuels : étude de cas basée sur les n-grammes et l'analyse factorielle des correspondances. TALN 2016: Traitement Automatique des Langues Naturelles, Jul 2016, Paris, France. 〈hal-01426820〉

Share

Metrics

Record views

56

Files downloads

167