Terminology Extraction, Translation Tools and Comparable Corpora: TTC concept, midterm progress and achieved results - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2012

Terminology Extraction, Translation Tools and Comparable Corpora: TTC concept, midterm progress and achieved results

Résumé

The TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) has contributed to leveraging computer-assisted translation tools, machine translation systems and multilingual content (corpora and terminology) management tools by generating bilingual terminologies automatically from comparable corpora in seven EU languages, as well as Russian and Chinese. This paper presents the main concept of TTC, discusses the issue of parallel corpora scarceness and potential of comparable corpora, and briefly describes the TTC terminology extraction workflow. The TTC terminology extraction workflow includes the collection of domain-specific comparable corpora from the web, extraction of monolingual terminology in the two domains of wind energy and mobile technology, and bilingual alignment of extracted terminology. We also present TTC usage scenarios , the way in which the project deals with under-resourced and disconnected languages, and report on the project midterm progress and results achieved during the two years of the project. And finally, we touch upon the problem of under-resourced languages (for example, Latvian) and disconnected languages (for example, Latvian and Russian) covered by the project.
Fichier principal
Vignette du fichier
TTC_LREC_CREDISLAS_2012.pdf (336.99 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-00819909 , version 1 (09-05-2013)

Identifiants

  • HAL Id : hal-00819909 , version 1

Citer

Tatiana Gornostay, Anita Gojun, Marion Weller, Ulrich Heid, Emmanuel Morin, et al.. Terminology Extraction, Translation Tools and Comparable Corpora: TTC concept, midterm progress and achieved results. LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS), May 2012, Istanbul, Turkey. 4 p. ⟨hal-00819909⟩
632 Consultations
621 Téléchargements

Partager

Gmail Facebook X LinkedIn More