Linguistic documents synchronizing sound and text - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Speech Communication Année : 2001

Linguistic documents synchronizing sound and text

Résumé

The goal of the LACITO linguistic archive project is to conserve and to make available for research recorded and transcribed oral traditions and other linguistic materials in (mainly) unwritten languages, giving simultaneous access to sound recordings and text annotation. The project uses simple, TEI-inspired XML markup for the kinds of annotation traditionally used in field linguistics. Transcriptions are segmented at the levels of, roughly, the sentence and the word, and annotation associated with different levels: metadata at the text level, free translation at the sentence level, interlinear glosses at the word level, etc. Time alignment is at the sentence (and optionally the word) level. To minimize in-house development and maintenance, the project uses standard software to the extent possible. Marked-up data is processed using widely-available XML/XSL/XSLT/XQL software tools, and displayed using standard browsers. The project has developed (1) an authoring tool, SoundIndex, to facilitate time-alignment, (2) a Java applet which enables standard browsers to access time-aligned speech, (3) XSL stylesheets which determine \"views\" on the data, and (4) a simple CGI interface permitting the user to choose documents and views and to enter queries. The paper describes these elements in detail. Current objectives are further development of the annotation with a view to linguistic research beyond simple browsing, and of a querying system (using a standard XML query processor) to exploit the annotated material.

Domaines

Linguistique
Fichier principal
Vignette du fichier
speechCom33.pdf (639.95 Ko) Télécharger le fichier

Dates et versions

hal-00005544 , version 1 (22-06-2005)

Identifiants

  • HAL Id : hal-00005544 , version 1

Citer

Michel Jacobson, Boyd Michailovsky, John B. Lowe. Linguistic documents synchronizing sound and text. Speech Communication, 2001, 33, p. 79-96. ⟨hal-00005544⟩
222 Consultations
339 Téléchargements

Partager

Gmail Facebook X LinkedIn More