Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts

Peggy Cellier 1 Thierry Charnois 2 Marc Plantevit 3 Christophe Rigotti 4, 3, 5 Bruno Crémilleux 6 Olivier Gandrillon 7 Jiri Klema 8 Jean-Luc Manguin 6
1 LIS - Logical Information Systems
IRISA-D7 - GESTION DES DONNÉES ET DE LA CONNAISSANCE
3 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
5 BEAGLE - Artificial Evolution and Computational Biology
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information, Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive, CarMeN - Laboratoire de recherche en cardiovasculaire, métabolisme, diabétologie et nutrition
6 Equipe CODAG - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
7 DRACULA - Multi-scale modelling of cell dynamics : application to hematopoiesis
CGMC - Centre de génétique moléculaire et cellulaire, Inria Grenoble - Rhône-Alpes, ICJ - Institut Camille Jordan [Villeurbanne], UCBL - Université Claude Bernard Lyon 1 : EA
Abstract : Background Discovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user. Results We take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed. Conclusions Experiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at https://bingotexte.greyc.fr/ webcite. The software is available at https://bingo2.greyc.fr/?q=node/22 webcite.
Type de document :
Article dans une revue
Journal of Biomedical Semantics, BioMed Central, 2015, 6, pp.1-27. <http://dx.doi.org/10.1186/s13326-015-0023-3>. <10.1186/s13326-015-0023-3>
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01192959
Contributeur : Marc Plantevit <>
Soumis le : vendredi 4 septembre 2015 - 09:06:14
Dernière modification le : mercredi 2 août 2017 - 10:06:11

Identifiants

Citation

Peggy Cellier, Thierry Charnois, Marc Plantevit, Christophe Rigotti, Bruno Crémilleux, et al.. Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts. Journal of Biomedical Semantics, BioMed Central, 2015, 6, pp.1-27. <http://dx.doi.org/10.1186/s13326-015-0023-3>. <10.1186/s13326-015-0023-3>. <hal-01192959>

Partager

Métriques

Consultations de la notice

530