A process for business entities extraction on the web - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2016

A process for business entities extraction on the web

Christian Sallaberry
Tanguy Moal
  • Fonction : Auteur
  • PersonId : 1023447

Résumé

Searching information about local businesses is a dif-cult task. Most of existing services are supplied with manually recorded data, however, an increasing number of companies are referenced on Internet and release information on their websites. In addition, data collected from companies is made available by the administration as open data. Therefore, we propose a process to extract companies information such as addresses, activities , jobs, products, emails, fax and phone numbers from websites in order to oer a business search service with low cost constructed and updated data. This process relies on the use of knowledge-based and pattern-based extraction approaches. The proposal is composed of two main modules : the rst one relies on a heuristic that uses companies registration data to bootstrap the web in order to lter their ocial corporate websites; the second module , on the other hand, analyses these websites to extract targeted data in order to map it on a dedicated knowledge graph, made of several indexes.
Fichier principal
Vignette du fichier
article_review_v1.pdf (283.28 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01644303 , version 1 (22-11-2017)

Licence

Paternité - Pas d'utilisation commerciale

Identifiants

  • HAL Id : hal-01644303 , version 1

Citer

Armel Fotsoh, Christian Sallaberry, Annig Le Parc-Lacayrelle, Tanguy Moal. A process for business entities extraction on the web. iSWAG 2016 Second International Symposium on Web Algorithms, Jun 2016, Deauville, France. ⟨hal-01644303⟩

Collections

UNIV-PAU LIUPPA
83 Consultations
96 Téléchargements

Partager

Gmail Facebook X LinkedIn More