ParCoLab, a Parallel Corpus of French, Serbian and English - Archive ouverte HAL Accéder directement au contenu
Autre Publication Scientifique Année : 2015

ParCoLab, a Parallel Corpus of French, Serbian and English

Dejan Stosic
Aleksandra Miletic
  • Fonction : Auteur
  • PersonId : 1028050
Veran Stanojevic
  • Fonction : Auteur

Résumé

ParCoLab is a 12-million-word parallel corpus containing original and translated texts in three European languages: Serbian, French, and English. Each of the languages functions both as a source and as a target language. The texts included in the corpus, which are mainly literary, are paragraph- and sentence-aligned. The alignments have been manually validated, which guarantees their quality. ParCoLab is also distinguished by the fact that it follows the current standards of corpus creation and distribution (it is stored in a TEI-compliant XML format). The ParCoLab parallel corpus can be queried online for free. A search engine allows users to formulate queries and extract sentences containing the target expression, as well as the corresponding sentences in one or both other languages. As a work in progress, the corpus is in continuous qualitative, quantitative, and technical development.
Fichier non déposé

Dates et versions

hal-01979624 , version 1 (13-01-2019)

Licence

Copyright (Tous droits réservés)

Identifiants

  • HAL Id : hal-01979624 , version 1

Citer

Dejan Stosic, Aleksandra Miletic, Saša Marjanović, Veran Stanojevic. ParCoLab, a Parallel Corpus of French, Serbian and English. 2015. ⟨hal-01979624⟩
202 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More