A process for business entities extraction on the web

Armel Fotsoh; Christian Sallaberry; Annig Le Parc-Lacayrelle; Tanguy Moal

Communication Dans Un Congrès Année : 2016

A process for business entities extraction on the web

(1) , , (1) ,

Armel Fotsoh

Fonction : Auteur
PersonId : 1023446

Laboratoire Informatique de l'Université de Pau et des Pays de l'Adour

Christian Sallaberry

Fonction : Auteur
PersonId : 172396
IdHAL : christian-sallaberry
ORCID : 0000-0002-3605-3927
IdRef : 166690767

Annig Le Parc-Lacayrelle

Fonction : Auteur
PersonId : 174692
IdHAL : annig-lacayrelle
IdRef : 253128331

Laboratoire Informatique de l'Université de Pau et des Pays de l'Adour

Tanguy Moal

Fonction : Auteur
PersonId : 1023447

Résumé

Searching information about local businesses is a dif-cult task. Most of existing services are supplied with manually recorded data, however, an increasing number of companies are referenced on Internet and release information on their websites. In addition, data collected from companies is made available by the administration as open data. Therefore, we propose a process to extract companies information such as addresses, activities , jobs, products, emails, fax and phone numbers from websites in order to oer a business search service with low cost constructed and updated data. This process relies on the use of knowledge-based and pattern-based extraction approaches. The proposal is composed of two main modules : the rst one relies on a heuristic that uses companies registration data to bootstrap the web in order to lter their ocial corporate websites; the second module , on the other hand, analyses these websites to extract targeted data in order to map it on a dedicated knowledge graph, made of several indexes.

Mots clés

Information Extraction Web Mining

Domaines

Recherche d'information [cs.IR] Intelligence artificielle [cs.AI] Apprentissage [cs.LG] Traitement du texte et du document Web

Fichier principal

article_review_v1.pdf (283.28 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Armel FOTSOH : Connectez-vous pour contacter le contributeur

https://hal.science/hal-01644303

Soumis le : mercredi 22 novembre 2017-10:06:39

Dernière modification le : lundi 7 novembre 2022-17:24:33

Dates et versions

hal-01644303 , version 1 (22-11-2017)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

HAL Id : hal-01644303 , version 1

Citer

Armel Fotsoh, Christian Sallaberry, Annig Le Parc-Lacayrelle, Tanguy Moal. A process for business entities extraction on the web. iSWAG 2016 Second International Symposium on Web Algorithms, Jun 2016, Deauville, France. ⟨hal-01644303⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-PAU LIUPPA

83 Consultations

96 Téléchargements

A process for business entities extraction on the web

Résumé

Mots clés

Domaines

Dates et versions

Licence

Identifiants

Citer

Exporter

Collections

Partager