Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Journal of the Association for Information Science and Technology Année : 2020

Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information

Résumé

The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper‐noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variation in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structures. For example, it was frequent to add nicknames and information about persons’ role in society and geographic origin. To tackle this complexity, a named entity recognition and classification system has been implemented. The system uses contextual cues to detect entities and assign them a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity‐type‐specific dependencies for these attributes. Moreover, the system uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for person and role-name attributes with an overall F1 of 0.75.

Dates et versions

hal-02970312 , version 1 (17-10-2020)

Identifiants

Citer

María Luisa Díez Platas, Salvador Ros Muñoz, Elena Gonzalez-Blanco, Pablo Ruiz, Elena Álvarez Mellado. Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information. Journal of the Association for Information Science and Technology, 2020, ⟨10.1002/asi.24399⟩. ⟨hal-02970312⟩

Collections

SITE-ALSACE
54 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More