Text mining tools for extracting information about microbial biodiversity in food - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

Text mining tools for extracting information about microbial biodiversity in food

Résumé

Introduction Information on food microbial biodiversity is scattered across millions of scientific papers (2 million references in the PubMed bibliographic database in 2017). It is impossible to manually achieve an exhaustive analysis of these documents. Text-mining and knowledge engineering methods can assist the researcher in finding relevant information. Material & Methods We propose to study bacterial biodiversity using text-mining tools from the Alvis platform. First, we analyzed terms that designate Microbial and Habitat entities in text. Microorganism names were predicted using the NCBI taxonomy. Habitat entities were detected using the syntactic structure of the terms and the OntoBiotope ontology. This ontology has been specifically enriched for the recognition of food terms in text. In a second time, we predicted links between microorganisms and their habitats (labeled “Lives_in” relationships) using pattern and machine-learning based methods. The results of text-mining predictions are indexed and presented in a semantic search engine. Result The AlvisIR search engine for microbe literature gives online access to 1.2 million PubMed abstracts in 2015, among which 13% are specific to food. This tool makes it possible to use text-mining results to search for information on bacterial biodiversity. It covers all types of microbial habitats to help understand the origin of microbial presence in food. Significance This work presents the first semantic search engine dedicated to better understand microbial food biodiversity from text.
Fichier principal
Vignette du fichier
Chaix_Spoiler presentation2_1.pdf (2.64 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-01602552 , version 1 (02-06-2020)

Identifiants

  • HAL Id : hal-01602552 , version 1
  • PRODINRA : 397819

Citer

Estelle Chaix, Louise Deleger, Robert Bossy, Claire Nédellec. Text mining tools for extracting information about microbial biodiversity in food. Microbial Spoilers in Food 2017, Association pour le Développement de la Recherche Appliquée aux Industries Agricoles et Alimentaires (ADRIA). FRA., Jun 2017, Quimper, France. pp.95. ⟨hal-01602552⟩
129 Consultations
28 Téléchargements

Partager

Gmail Facebook X LinkedIn More