Skip to Main content Skip to Navigation
Conference papers

Structured Named Entity Recognition by Cascading CRFs

Abstract : NER is an important task in NLP, often used as a basis for further treatments. A new challenge has emerged in the last few years: structured named entity recognition, where not only named entities must be identied but also their hierarchical components. In this article, we describe a cascading CRFs approach to address this challenge. It reaches the state of the art while remaining very simple on a structured NER challenge. We then oer an error analysis of our system based on a detailed, yet simple, error classication. 1 Introduction In this paper, we present a linear CRF cascade approach for structured named entity recognition (SNER) on Quaero v1 and v2 corpora, used in the ETAPE evaluation campaigns [10]. Named Entity Recognition (NER) is a fundamental NLP task, its structured variant being increasingly popular. We can overall distinguish two main approaches used to address this task, the rst one being cascading multiple annotations with either the same or dierent methods. In this respect, we can cite [19], which cascaded rules in order to gradually build the structure. We can also cite [5], where a CRF and a PCFG were used, the former giving the leaves while the latter built the rest of the tree. And nally [22], the winner of ETAPE, used one CRF per entity type, for a total of 68 CRFs, and then aligned their annotations. The second approach to annotate tree-structured named entities is to directly retrieve the structure, as was done by [20], who used partial annotation rules for predicting beginnings and ends of entities and then built the tree in one pass. Finally, we can cite [8], who used a tree-CRF to learn nested biomedical entities on the GENIA corpus [14]. Cascading linear CRFs have also been applied for syntactic parsing, as did [25]. At each step, they retrieved chunks and then only kept their respective heads for the next iteration until only one chunk covering the whole sentence was found (with the class sentence). The tree was then reconstructed by simply unfolding chunks at each step. In this paper, we design a new, more general and eective cascade of CRFs adapted to the ETAPE evaluation campaign (sections 2 and 3), evaluate its eciency and analyse its errors (section 4) and nally conclude (section 5).
Complete list of metadatas

Cited literature [23 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-01579109
Contributor : Marco Dinarelli <>
Submitted on : Thursday, August 31, 2017 - 3:15:14 PM
Last modification on : Wednesday, May 22, 2019 - 3:46:02 PM
Document(s) archivé(s) le : Friday, December 1, 2017 - 5:00:54 PM

File

2017_CICling_CRFCascade.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01579109, version 1

Collections

Citation

Yoann Dupont, Marco Dinarelli, Isabelle Tellier, Christian Lautier. Structured Named Entity Recognition by Cascading CRFs. Intelligent Text Processing and Computational Linguistics (CICling), Apr 2017, Budapest, Hungary. ⟨hal-01579109⟩

Share

Metrics

Record views

189

Files downloads

232