De l'analyse au partage des données, quel(s) format(s) choisir ? L'exemple d'un corpus d'interactions parents-enfant

Abstract : Any project dealing with corpus building will be faced with any array of different challenges. However, amongst these, the choice of the data encoding format will be central. This article describes the processing chain used during the ALIPE project whose aim is to build a corpus of verbal interactions between parents and their young children. In order to put together an organized, structured, documented, open-access resource with maximal interoperability, we selected two encoding formats: CHAT and XML-TEI. In this article, we introduce the methods used by the research team for data collection and annotation and describe how the data was assembled into a corpus. We also discuss the advantages of using the XML format with respect to data analysis as well as interoperability between corpus processing and analysis software.
Document type :
Conference papers
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-00850172
Contributor : Loïc Liégeois <>
Submitted on : Monday, August 5, 2013 - 11:33:47 AM
Last modification on : Tuesday, December 11, 2018 - 11:54:01 AM
Document(s) archivé(s) le : Wednesday, November 6, 2013 - 4:19:55 AM

Identifiers

  • HAL Id : hal-00850172, version 1

Collections

Citation

Loïc Liégeois. De l'analyse au partage des données, quel(s) format(s) choisir ? L'exemple d'un corpus d'interactions parents-enfant. COLDOC 2012 : Traitement de corpus linguistiques, Oct 2012, Paris, France. pp. 128-142. ⟨hal-00850172⟩

Share

Metrics

Record views

405

Files downloads

290