Breast cancer and quality of life: medical information extraction from health forums

Thomas Opitz 1, 2, * Jérôme Azé 3, 4, 5 Sandra Bringay 5, 6 Cyrille Joutard 1 Christian Lavergne 1 Caroline Mollevi 7, 8
* Auteur correspondant
3 AMIB - Algorithms and Models for Integrative Biology
CNRS - Centre National de la Recherche Scientifique : UMR8623, X - École polytechnique, Inria Saclay - Ile de France, UP11 - Université Paris-Sud - Paris 11, LRI - Laboratoire de Recherche en Informatique, LIX - Laboratoire d'informatique de l'École polytechnique [Palaiseau]
5 ADVANSE - ADVanced Analytics for data SciencE
LIRMM - Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier
Abstract : Internet health forums are a rich textual resource with content generated through free exchanges among patients and, in certain cases, health professionals. We tackle the problem of retrieving clinically relevant information from such forums, with relevant topics being defined from clinical auto-questionnaires. Texts in forums are largely unstructured and noisy, calling for adapted preprocessing and query methods. We minimize the number of false negatives in queries by using a synonym tool to achieve query expansion of initial topic keywords. To avoid false positives, we propose a new measure based on a statistical comparison of frequent co-occurrences in a large reference corpus (Web) to keep only relevant expansions. Our work is motivated by a study of breast cancer patients' health-related quality of life (QoL). We consider topics defined from a breast-cancer specific QoL-questionnaire. We quantify and structure occurrences in posts of a specialized French forum and outline important future developments.
Type de document :
Communication dans un congrès
Medical Informatics Europe, Aug 2014, Istanbul, Turkey. pp.1070-1074, 2014
Liste complète des métadonnées

Littérature citée [8 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01061891
Contributeur : Thomas Opitz <>
Soumis le : lundi 8 septembre 2014 - 16:48:08
Dernière modification le : jeudi 24 mai 2018 - 15:59:25
Document(s) archivé(s) le : mardi 9 décembre 2014 - 12:37:17

Fichier

MIERevision.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01061891, version 1

Citation

Thomas Opitz, Jérôme Azé, Sandra Bringay, Cyrille Joutard, Christian Lavergne, et al.. Breast cancer and quality of life: medical information extraction from health forums. Medical Informatics Europe, Aug 2014, Istanbul, Turkey. pp.1070-1074, 2014. 〈hal-01061891〉

Partager

Métriques

Consultations de la notice

1145

Téléchargements de fichiers

511