What Patients Can Tell Us: Topic Analysis for Social Media on Breast Cancer

Abstract : Background: Social media dedicated to health are increasingly used by patients and health professionals. They are rich textual resources with content generated through free exchange between patients. We are proposing a method to tackle the problem of retrieving clinically relevant information from such social media in order to analyze the quality of life of patients with breast cancer. Objective: Our aim was to detect the different topics discussed by patients on social media and to relate them to functional and symptomatic dimensions assessed in the internationally standardized self-administered questionnaires used in cancer clinical trials (European Organization for Research and Treatment of Cancer [EORTC] Quality of Life Questionnaire Core 30 [QLQ-C30] and breast cancer module [QLQ-BR23]). Methods: First, we applied a classic text mining technique, latent Dirichlet allocation (LDA), to detect the different topics discussed on social media dealing with breast cancer. We applied the LDA model to 2 datasets composed of messages extracted from public Facebook groups and from a public health forum (cancerdusein.org, a French breast cancer forum) with relevant preprocessing. Second, we applied a customized Jaccard coefficient to automatically compute similarity distance between the topics detected with LDA and the questions in the self-administered questionnaires used to study quality of life. Results: Among the 23 topics present in the self-administered questionnaires, 22 matched with the topics discussed by patients on social media. Interestingly, these topics corresponded to 95% (22/23) of the forum and 86% (20/23) of the Facebook group topics. These figures underline that topics related to quality of life are an important concern for patients. However, 5 social media topics had no corresponding topic in the questionnaires, which do not cover all of the patients’ concerns. Of these 5 topics, 2 could potentially be used in the questionnaires, and these 2 topics corresponded to a total of 3.10% (523/16,868) of topics in the cancerdusein.org corpus and 4.30% (3014/70,092) of the Facebook corpus. Conclusions: We found a good correspondence between detected topics on social media and topics covered by the self-administered questionnaires, which substantiates the sound construction of such questionnaires. We detected new emerging topics from social media that can be used to complete current self-administered questionnaires. Moreover, we confirmed that social media mining is an important source of information for complementary analysis of quality of life.
Type de document :
Article dans une revue
JMIR Medical Informatics, JMIR Publications, 2017, 5 (3), pp.e23. 〈10.2196/medinform.7779〉
Liste complète des métadonnées

Littérature citée [73 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01583152
Contributeur : Mike Donald Tapi Nzali <>
Soumis le : jeudi 7 septembre 2017 - 11:35:16
Dernière modification le : jeudi 24 mai 2018 - 15:59:25

Fichier

fc-xsltGalley-7779-138004-11-P...
Publication financée par une institution

Identifiants

Citation

Mike Donald Tapi Nzali, Sandra Bringay, Christian Lavergne, Caroline Mollevi, Thomas Opitz. What Patients Can Tell Us: Topic Analysis for Social Media on Breast Cancer. JMIR Medical Informatics, JMIR Publications, 2017, 5 (3), pp.e23. 〈10.2196/medinform.7779〉. 〈hal-01583152〉

Partager

Métriques

Consultations de la notice

163

Téléchargements de fichiers

35