Mining Information Extraction Rules from Datasheets without Linguistic Parsing - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2005

Mining Information Extraction Rules from Datasheets without Linguistic Parsing

Résumé

In the context of the Pangea project at IBM, we needed to design an information extraction module in order to extract some information from datasheets. Contrary to several information extraction systems based on some machine learning techniques that need some linguistic parsing of the documents, we propose an hybrid approach based on association rules mining and decision tree learning that does not require any linguistic processing. The system may be parameterized in various ways that influence the efficiency of the information extraction rules we discovered. The experiments show the system does not need a large training set to perform well.
Fichier non déposé

Dates et versions

hal-00117478 , version 1 (01-12-2006)

Identifiants

  • HAL Id : hal-00117478 , version 1

Citer

Rakesh Agrawal, Howard Ho, François Jacquenet, Marielle Jacquenet. Mining Information Extraction Rules from Datasheets without Linguistic Parsing. 2005, pp.510-520. ⟨hal-00117478⟩
36 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More