A process for business entities extraction on the web
Résumé
Searching information about local businesses is a dif-cult task. Most of existing services are supplied with manually recorded data, however, an increasing number of companies are referenced on Internet and release information on their websites. In addition, data collected from companies is made available by the administration as open data. Therefore, we propose a process to extract companies information such as addresses, activities , jobs, products, emails, fax and phone numbers from websites in order to oer a business search service with low cost constructed and updated data. This process relies on the use of knowledge-based and pattern-based extraction approaches. The proposal is composed of two main modules : the rst one relies on a heuristic that uses companies registration data to bootstrap the web in order to lter their ocial corporate websites; the second module , on the other hand, analyses these websites to extract targeted data in order to map it on a dedicated knowledge graph, made of several indexes.
Origine : Fichiers produits par l'(les) auteur(s)
Loading...