Skip to Main content Skip to Navigation
Book sections

DAnIEL: Language Independent Character-Based News Surveillance

Gaël Lejeune 1 Romain Brixtel 1 Antoine Doucet 1 Nadine Lucas 1
1 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
Abstract : This study aims at developing a news surveillance system able to address multilingual web corpora. As an example of a domain where multilingual capacity is crucial, we focus on Epidemic Surveillance. This task necessitates worldwide coverage of news in order to detect new events as quickly as possible, anywhere, whatever the language it is rst reported in. In this study, text-genre is used rather than sentence analysis. The news-genre properties allow us to assess the thematic relevance of news, ltered with the help of a specialised lexicon that is automatically collected on Wikipedia. Afterwards, a more detailed analysis of text speci c properties is applied to relevant documents to better characterize the epidemic event (i.e., which disease spreads where?). Results from 400 documents in each language demonstrate the interest of this multilingual approach with light resources. DAnIEL achieves an F1-measure score around 85%. Two issues are addressed: the rst is morphology rich languages, e.g. Greek, Polish and Russian as compared to English. The second is event location detection as related to disease detection. This system provides a reliable alternative to the generic IE architecture that is constrained by the lack of numerous components in many languages.
Document type :
Book sections
Complete list of metadata

Cited literature [1 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01071903
Contributor : Greyc Référent Connect in order to contact the contributor
Submitted on : Tuesday, October 7, 2014 - 9:13:56 AM
Last modification on : Tuesday, October 19, 2021 - 11:34:56 PM
Long-term archiving on: : Thursday, January 8, 2015 - 10:25:48 AM

File

CHS-LEJEUNE-2012-1.pdf
Files produced by the author(s)

Identifiers

Citation

Gaël Lejeune, Romain Brixtel, Antoine Doucet, Nadine Lucas. DAnIEL: Language Independent Character-Based News Surveillance. Isahara, Hitoshi and Kanzaki, Kyoko. Advances in Natural Language Processing: 8th International Conference on NLP, JapTAL 2012, Springer, pp.64-75, 2012, 978-3-642-33982-0. ⟨10.1007/978-3-642-33983-7_7⟩. ⟨hal-01071903⟩

Share

Metrics

Les métriques sont temporairement indisponibles