Collaborative construction of a good quality, broad coverage and copyright free Japanese-French dictionary

Abstract : This research project is located in the field of natural language processing (NLP), at the intersection of computer science and linguistics, specifically multilingual lexicography and lexicology. Concerning the Web, although French and Japanese are two well resourced languages (Berment, 2004), is not the case of the French-Japanese couple: - Electronic French-Japanese bilingual dictionaries (denshi jishô) can not be copied to a computer or reused; - There is a French-Japanese dictionary on the Web1, but it only contains 40 000 entries, no examples and is not available for download. There are collaborative Web dictionaries such as the Japanese-English JMdict project led by Jim Breen (2004) that contains over 173,000 items. These resources are freely downloadable. It is therefore possible to carry out such projects. During a first stay in Japan from November 2001 to March 2004, we had already noticed the lack of French-Japanese bilingual resources on the Web. Which gave rise to the Papillon project about the construction of a multilingual lexical database with a pivot structure (Sérasset et al., 2001). Since then, progress has been made in several areas (technical, theoretical, social) (Mangeot, 2006) but the actual production of data has made very little progress. On the other hand, there is a new trend in reusing existing lexical resources (word sense disambiguation, using open source resources (Wiktionary, dbpedia) merging with ontologies, etc.). Although they allow to consolidate and expand the coverage of existing resources, these experiences still use data created by hand by professional lexicographers. There are printed French-Japanese dictionaries of good quality and sufficiently old to be royalty free. It should be possible to reuse these resources as part of our project to build a good quality dictionary and broad coverage available on the Web. Based on this observation, we defined the following project to build a rich multilingual lexical system with priority over French-Japanese languages. The construction will be done first by reusing existing resources (printed Japanese-French dictionaries, Japanese-other language dictionaries, 1http://www.dictionnaire-japonais.com  Wikipedia) and automatic operations (scanning and corrections, calculating translation links) and then by volunteer contributors working as a community on the Web. They will have to contribute to dictionary articles according to their level of expertise and knowledge in the field of lexicography or bilingual translation. The resulting resources will be royalty-free and intended for use by both humans via conventional bilingual dictionaries and by machines for automatic language processing tools (analysis, machine translation, etc.).
Type de document :
Chapitre d'ouvrage
Hosei University International Found Foreign Scholar Fellowship Report, 2016, Volume XVI 2013-2014
Liste complète des métadonnées

Littérature citée [23 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-01294566
Contributeur : Mathieu Mangeot <>
Soumis le : mardi 29 mars 2016 - 14:35:42
Dernière modification le : jeudi 11 octobre 2018 - 08:48:03
Document(s) archivé(s) le : jeudi 30 juin 2016 - 16:30:37

Fichier

Rapport-HIF-Hosei-en.pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-01294566, version 1

Collections

Citation

Mathieu Mangeot. Collaborative construction of a good quality, broad coverage and copyright free Japanese-French dictionary. Hosei University International Found Foreign Scholar Fellowship Report, 2016, Volume XVI 2013-2014. 〈hal-01294566〉

Partager

Métriques

Consultations de la notice

309

Téléchargements de fichiers

222