ROSETTA: Resources for Endangered languages through translated texts - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2019

ROSETTA: Resources for Endangered languages through translated texts

Résumé

Out of the world’s 6000+ languages only a small fraction currently enjoys the benefits of modern language technologies. Languages left behind are called endangered or technologically low-resourced (even though they may have millions of speakers). This collaborative and interdisciplinary digital humanities research project aims to help salvage those languages by combining computational linguistics, American Literature, and Translation Studies. Much as the Rosetta Stone helped decipher the demotic and hieroglyphic scripts thanks to the presence of the Greek translation, our project intends to preserve contemporary endangered languages and assist with their sur- vival through translation. Our project puts to use the extant translated versions of a single fictional text—Mark Twain’s Adventures of Huckleberry Finn—into a number of low-resourced languages spanning a period of nearly a century and a half. The project relies on the involvement of humans for data collection while natural language processing tools generate language resources (corpora, dictionaries, thesauri, lexicons) for those endangered languages.
Fichier non déposé

Dates et versions

hal-03083801 , version 1 (19-12-2020)

Identifiants

  • HAL Id : hal-03083801 , version 1

Citer

Ronald Jenn, Amel Fraisse, Zheng Zhang, Shelley Fisher Fishkin. ROSETTA: Resources for Endangered languages through translated texts. The Center for Spatial and Textual Analysis (CESTA) Seminar Series, Stanford University, USA, The Center for Spatial and Textual Analysis (CESTA), Apr 2019, Stanford, United States. ⟨hal-03083801⟩
84 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More