Mining Information Extraction Rules from Datasheets without Linguistic Parsing

Rakesh Agrawal; Howard Ho; François Jacquenet; Marielle Jacquenet

Communication Dans Un Congrès Année : 2005

Mining Information Extraction Rules from Datasheets without Linguistic Parsing

(1) , (1) , (2) ,

1
2

Rakesh Agrawal

Fonction : Auteur

IBM Almaden Research Center [San Jose]

Howard Ho

Fonction : Auteur

IBM Almaden Research Center [San Jose]

François Jacquenet

Fonction : Auteur
PersonId : 7550
IdHAL : francois-jacquenet
ORCID : 0000-0002-0653-0710
IdRef : 130579645

Laboratoire Hubert Curien

Marielle Jacquenet

Fonction : Auteur

Résumé

In the context of the Pangea project at IBM, we needed to design an information extraction module in order to extract some information from datasheets. Contrary to several information extraction systems based on some machine learning techniques that need some linguistic parsing of the documents, we propose an hybrid approach based on association rules mining and decision tree learning that does not require any linguistic processing. The system may be parameterized in various ways that influence the efficiency of the information extraction rules we discovered. The experiments show the system does not need a large training set to perform well.

Domaines

Apprentissage [cs.LG] Intelligence artificielle [cs.AI]

François Jacquenet : Connectez-vous pour contacter le contributeur

https://hal.science/hal-00117478

Soumis le : vendredi 1 décembre 2006-18:13:16

Dernière modification le : vendredi 24 mars 2023-14:52:48

Dates et versions

hal-00117478 , version 1 (01-12-2006)

Identifiants

HAL Id : hal-00117478 , version 1

Citer

Rakesh Agrawal, Howard Ho, François Jacquenet, Marielle Jacquenet. Mining Information Extraction Rules from Datasheets without Linguistic Parsing. 2005, pp.510-520. ⟨hal-00117478⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

UNIV-ST-ETIENNE IOGS CNRS LAHC PARISTECH UDL

36 Consultations

0 Téléchargements

Mining Information Extraction Rules from Datasheets without Linguistic Parsing

Résumé

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager