Skip to Main content Skip to Navigation
Theses

Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations

Abstract : Alignment consists of establishing a mapping between units in a bitext, combining a text in a source language and its translation in a target language. Alignments can be computed at several levels: between documents, between sentences, between phrases, between words, or even between smaller units end when one of the languages is morphologically complex, which implies to align fragments of words (morphemes). Alignments can also be considered between more complex linguistic structures such as trees or graphs. This is a complex, under-specified task that humans accomplish with difficulty. Its automation is a notoriously difficult problem in natural language processing, historically associated with the first probabilistic word-based translation models. The design of new models for natural language processing, based on distributed representations computed by neural networks, allows us to question and revisit the computation of these alignments. This research project, therefore, aims to comprehensively understand the limitations of existing statistical alignment models and to design neural models that can be learned without supervision to overcome these drawbacks and to improve the state of art in terms of alignment accuracy.
Document type :
Theses
Complete list of metadata

https://hal.archives-ouvertes.fr/tel-03269967
Contributor : Anh Khoa Ngo Ho Connect in order to contact the contributor
Submitted on : Thursday, June 24, 2021 - 2:33:42 PM
Last modification on : Friday, July 2, 2021 - 3:45:57 AM
Long-term archiving on: : Saturday, September 25, 2021 - 6:30:04 PM

File

Generative Probabilistic Align...
Files produced by the author(s)

Identifiers

  • HAL Id : tel-03269967, version 1

Citation

Anh Khoa Ngo Ho. Generative Probabilistic Alignment Models for Words and Subwords: a Systematic Exploration of the Limits and Potentials of Neural Parametrizations. Computer Science [cs]. Université Paris-Saclay, 2021. English. ⟨tel-03269967⟩

Share

Metrics

Record views

41

Files downloads

37