Skip to Main content Skip to Navigation
Theses

Méthode pour l'analyse automatique de structures formelles sur documents multilingues

Emmanuel Giguet 1 
1 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
Abstract : This thesis deals with automatic parsing of formal structures in written texts. It begins with a presentation of documents in their multilingual dimension and of the necessity to process them in this way. We study their multilingual structure and present how to compute it with the help of a language identification tool. Then, we present an original syntactic parsing method of unrestricted french sentences. This method is a generalization and an abstraction of Jacques Vergne's researches. The syntactic structures we are interested in are the minimal syntagm and the proposition ; both units can be defined as multilingual units so that the method can be applied to various languages. We propose two processes which allow the building of these units. Both processes consider texts as flows and build syntactic structures thanks to a relational constraints propagation. As the syntagmatic and propositional structures are dependent, they are built up by the interaction of the two processes. We show that both processes are identical if we disregard the nature of the unit they build up and the rule base they use. The main thread of this thesis is the method. Each time a process is described, we emphasize the related method. We show that this method is unique. Each structure is computed with the help of formal and positionnal clues: these clues come from the study of the units located inside the structure (internal clues) or from the study of the function of the structure in its upper-level units (external clues).
Complete list of metadata

https://hal.archives-ouvertes.fr/tel-03760676
Contributor : Giguet Emmanuel Connect in order to contact the contributor
Submitted on : Thursday, August 25, 2022 - 2:38:31 PM
Last modification on : Thursday, September 15, 2022 - 3:51:16 AM

File

these-EGiguet.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : tel-03760676, version 1

Citation

Emmanuel Giguet. Méthode pour l'analyse automatique de structures formelles sur documents multilingues. Informatique [cs]. Université de Caen - Basse Normandie, 1998. Français. ⟨tel-03760676⟩

Share

Metrics

Record views

0

Files downloads

0