Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations” - Archive ouverte HAL Accéder directement au contenu
Rapport (Rapport Technique) Année : 2021

Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”

Rapport d’accompagnement à la thèse de doctorat: "Modèles d’Alignement Probabilistes Génératifs pour les Mots et Sous-mots: une Exploration Systématique des Limites et Potentialités des Paramétrisations Neuronales”

Anh Khoa Ngo Ho

Résumé

This is a companion document to the Ph.D. dissertation "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations” [Ngo Ho, 2021]. This document contains an exhaustive collection of graphs and tables related the analysis of various aspects of automatic word alignment, such as for instance the aligned/unaligned words, rare/unknown words, function/content words, word order divergences, etc; and for six language pairs: English with French, German, Romanian, Czech, Japanese and Vietnamese. We mostly analyze statistical word alignment models (Giza++ and Fastalign) as well as several variants based on neural models: IBM style word alignment models including context-independent models, contextual models, and character-based models; variants of a fully generative neural model based on variational autoencoders. We also document a deep analysis for Byte-Pair-Encoding, a subword tokenization algorithm. For information regarding these various methods, please refer to the thesis.
Fichier principal
Vignette du fichier
Companion report to the PhD dissertation - Generative Probabilistic Alignment Models for Words and Subwords - a Systematic Exploration of the Limits and Potentials of Neural Parametrizations.pdf (67.04 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03153752 , version 1 (01-04-2021)

Identifiants

  • HAL Id : hal-03153752 , version 1

Citer

Anh Khoa Ngo Ho. Companion report to the Ph.D. dissertation: "Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations”. [Technical Report] Université Paris Saclay; Laboratoire Interdisciplinaire des Sciences du Numérique. 2021. ⟨hal-03153752⟩
89 Consultations
2 Téléchargements

Partager

Gmail Facebook X LinkedIn More