Skip to Main content Skip to Navigation
Theses

Veille épidémiologique multilingue : une approche parcimonieuse au grain caractère fondée sur le genre textuel

Gaël Lejeune 1
1 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : In this dissertation we tackle the problem of multilingual epidemic surveillance. The approach advocated here which is differential, endogenous and noncompositionnal. We maximise the factorization by using genre properties and communication principles. Our local analysis does not rely on classical linguistic analyzers for morphology, syntax or semantics. The distribution of character strings at key positions is exploited, thus avoiding the problem of the definition of a "word". We implemented DAnIEL (Data Analysis for Information Extraction in any Language), a system using this approach. DanIEL analyzes press articles in order to detect epidemic events. DAnIEL is fast in comparison to state-of-the-art systems. It needs very few additional knowledge for processing new languages. DAnIEL is also evaluated on the analysis of scientific articles for classification and keyword extraction. Finally, we propose to use DAnIEL outputs to perform a task-based evaluation of boilerplate removal systems.
Document type :
Theses
Complete list of metadatas

Cited literature [113 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/tel-01074940
Contributor : Greyc Référent <>
Submitted on : Thursday, October 16, 2014 - 9:54:58 AM
Last modification on : Friday, October 23, 2020 - 4:45:50 PM
Long-term archiving on: : Saturday, January 17, 2015 - 10:20:44 AM

Identifiers

  • HAL Id : tel-01074940, version 1

Citation

Gaël Lejeune. Veille épidémiologique multilingue : une approche parcimonieuse au grain caractère fondée sur le genre textuel. Traitement du texte et du document. Université de Caen, 2013. Français. ⟨tel-01074940⟩

Share

Metrics

Record views

283

Files downloads

381