Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”

Anh Khoa Ngo Ho

Rapport (Rapport Technique) Année : 2021

Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”

Rapport d’accompagnement à la thèse de doctorat: "Modèles d’Alignement Probabilistes Génératifs pour les Mots et Sous-mots: une Exploration Systématique des Limites et Potentialités des Paramétrisations Neuronales”

(1)

Anh Khoa Ngo Ho

Fonction : Auteur
PersonId : 744016
IdHAL : anh-khoa-ngo-ho
ORCID : 0000-0002-4844-5012
IdRef : 255027419

Traitement du Langage Parlé - LISN

Résumé

This is a companion document to the Ph.D. dissertation "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations” [Ngo Ho, 2021]. This document contains an exhaustive collection of graphs and tables related the analysis of various aspects of automatic word alignment, such as for instance the aligned/unaligned words, rare/unknown words, function/content words, word order divergences, etc; and for six language pairs: English with French, German, Romanian, Czech, Japanese and Vietnamese. We mostly analyze statistical word alignment models (Giza++ and Fastalign) as well as several variants based on neural models: IBM style word alignment models including context-independent models, contextual models, and character-based models; variants of a fully generative neural model based on variational autoencoders. We also document a deep analysis for Byte-Pair-Encoding, a subword tokenization algorithm. For information regarding these various methods, please refer to the thesis.

Mots clés

Machine translation Word alignment Artificial neural network

Domaines

Informatique [cs]

Fichier principal

Companion report to the PhD dissertation - Generative Probabilistic Alignment Models for Words and Subwords - a Systematic Exploration of the Limits and Potentials of Neural Parametrizations.pdf (67.04 Mo)

Origine : Fichiers produits par l'(les) auteur(s)

Anh Khoa NGO HO : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03153752

Soumis le : jeudi 1 avril 2021-18:04:57

Dernière modification le : mardi 6 février 2024-14:40:08

Archivage à long terme le : vendredi 2 juillet 2021-18:02:52

Dates et versions

hal-03153752 , version 1 (01-04-2021)

Identifiants

HAL Id : hal-03153752 , version 1

Citer

Anh Khoa Ngo Ho. Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”. [Technical Report] Université Paris Saclay; Laboratoire Interdisciplinaire des Sciences du Numérique. 2021. ⟨hal-03153752⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS INRIA CENTRALESUPELEC LARA UNIV-PARIS-SACLAY LISN GS-ENGINEERING GS-COMPUTER-SCIENCE LISN-TLP

89 Consultations

2 Téléchargements

Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”

Rapport d’accompagnement à la thèse de doctorat: "Modèles d’Alignement Probabilistes Génératifs pour les Mots et Sous-mots: une Exploration Systématique des Limites et Potentialités des Paramétrisations Neuronales”

Résumé

Mots clés

Domaines

Dates et versions

Identifiants

Citer

Exporter

Collections

Partager