Automatic Ontology Population from Product Catalogs
Résumé
In this paper we present an approach for ontology population based on heterogeneous documents describing commercial products with various descriptions and diverse styles. The originality is the generation and progressive refinement of semantic annotations leading to identify
the types of the products and their features whereas the initial information is very poor quality. Documents are annotated using an ontology. The annotation process is based on an initial set of known instances, this set being built from terminological elements added in the ontology. Our
approach first uses semi-automated annotation techniques on a small dataset and then applies machine learning techniques in order to fully annotate the entire dataset. This work was motivated by specific application needs. Experimentations were conducted on real-world datasets in the toys domain.