Skip to Main content Skip to Navigation
Journal articles

Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts

Peggy Cellier 1 Thierry Charnois 2 Marc Plantevit 3 Christophe Rigotti 4, 3, 5 Bruno Crémilleux 6 Olivier Gandrillon 7 Jiri Klema 8 Jean-Luc Manguin 6
1 LIS - Logical Information Systems
3 DM2L - Data Mining and Machine Learning
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
5 BEAGLE - Artificial Evolution and Computational Biology
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information, Inria Grenoble - Rhône-Alpes, LBBE - Laboratoire de Biométrie et Biologie Evolutive - UMR 5558
6 Equipe CODAG - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
7 DRACULA - Multi-scale modelling of cell dynamics : application to hematopoiesis
ICJ - Institut Camille Jordan [Villeurbanne], Inria Grenoble - Rhône-Alpes, CGPhiMC - Centre de génétique et de physiologie moléculaire et cellulaire
Abstract : Background Discovering gene interactions and their characterizations from biological text collections is a crucial issue in bioinformatics. Indeed, text collections are large and it is very difficult for biologists to fully take benefit from this amount of knowledge. Natural Language Processing (NLP) methods have been applied to extract background knowledge from biomedical texts. Some of existing NLP approaches are based on handcrafted rules and thus are time consuming and often devoted to a specific corpus. Machine learning based NLP methods, give good results but generate outcomes that are not really understandable by a user. Results We take advantage of an hybridization of data mining and natural language processing to propose an original symbolic method to automatically produce patterns conveying gene interactions and their characterizations. Therefore, our method not only allows gene interactions but also semantics information on the extracted interactions (e.g., modalities, biological contexts, interaction types) to be detected. Only limited resource is required: the text collection that is used as a training corpus. Our approach gives results comparable to the results given by state-of-the-art methods and is even better for the gene interaction detection in AIMed. Conclusions Experiments show how our approach enables to discover interactions and their characterizations. To the best of our knowledge, there is few methods that automatically extract the interactions and also associated semantics information. The extracted gene interactions from PubMed are available through a simple web interface at webcite. The software is available at webcite.
Complete list of metadatas
Contributor : Marc Plantevit <>
Submitted on : Friday, September 4, 2015 - 9:06:14 AM
Last modification on : Wednesday, July 8, 2020 - 12:43:50 PM

Links full text



Peggy Cellier, Thierry Charnois, Marc Plantevit, Christophe Rigotti, Bruno Crémilleux, et al.. Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts. Journal of Biomedical Semantics, BioMed Central, 2015, 6, pp.1-27. ⟨10.1186/s13326-015-0023-3⟩. ⟨hal-01192959⟩



Record views