Annotation-based Digital Text Corpora Analysis within the TXM Platform

Abstract : This paper presents new developments in the TXM textual corpora analysis platform (http://textometrie.org) towards direct text annotation functionalities. Some annotations are related to a web based external historic ontology called SyMoGIH and others to co-reference information between words or to word properties like part of speech or lemma. The paper discusses the methodological stakes of unifying in a single framework the production and the analysis those annotations with the traditional ones already available in TXM corresponding to the XML markup of the text sources and to the linguistic annotations automatically added to texts by NLP tools.
Complete list of metadatas

https://hal.archives-ouvertes.fr/hal-02015898
Contributor : Serge Heiden <>
Submitted on : Tuesday, February 12, 2019 - 2:41:51 PM
Last modification on : Tuesday, May 28, 2019 - 5:30:10 PM

Licence


Distributed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International License

Identifiers

  • HAL Id : hal-02015898, version 1

Citation

Serge Heiden. Annotation-based Digital Text Corpora Analysis within the TXM Platform. 14th International Conference on the Statistical Analysis of Textual Data / 14es Journées internationales d'Analyse statistique des Données Textuelles (JADT 2018), DII– Department of Enterprise Engineering “Mario Lucertini” Tor Vergata University; DSS– Department of Statistical Sciences, Sapienza University, Rome, Jun 2018, Rome, Italy. pp.367-374. ⟨hal-02015898⟩

Share

Metrics

Record views

22