Knowledge discovery with CRF-based clustering of named entities without a priori classes

Vincent Claveau 1 Abir Ncibi 1
1 TEXMEX - Multimedia content-based indexing
IRISA - Institut de Recherche en Informatique et Systèmes Aléatoires, Inria Rennes – Bretagne Atlantique
Abstract : Knowledge discovery aims at bringing out coherent groups of entities. It is usually based on clustering which necessitates defining a notion of similarity between the relevant entities. In this paper, we propose to divert a supervised machine learning technique (namely Conditional Random Fields, widely used for supervised labeling tasks) in order to calculate, indirectly and without supervision, similarities among text sequences. Our approach consists in generating artificial labeling problems on the data to reveal regularities between entities through their labeling. We describe how this framework can be implemented and experiment it on two information extraction/discovery tasks. The results demonstrate the usefulness of this unsupervised approach, and open many avenues for defining similarities for complex representations of textual data.
Complete list of metadatas

Cited literature [33 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01027520
Contributor : Vincent Claveau <>
Submitted on : Tuesday, July 22, 2014 - 10:41:44 AM
Last modification on : Wednesday, December 19, 2018 - 1:08:13 PM
Long-term archiving on : Tuesday, November 25, 2014 - 10:25:48 AM

File

Claveau_CICling14.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01027520, version 1

Citation

Vincent Claveau, Abir Ncibi. Knowledge discovery with CRF-based clustering of named entities without a priori classes. Conference on Intelligent Text Processing and Computational Linguistics CICLing, Apr 2014, Kathmandu, Nepal. pp.415-428. ⟨hal-01027520⟩

Share

Metrics

Record views

346

Files downloads

530