Le développement d'une plate-forme pour l'annotation spécialisée de documents Web : retour d'expérience - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Revue TAL : traitement automatique des langues Année : 2008

Le développement d'une plate-forme pour l'annotation spécialisée de documents Web : retour d'expérience

Thierry Hamon
Adeline Nazarenko
  • Fonction : Auteur
  • PersonId : 830553

Résumé

Beyond general search engines, there is a need for tools able to mine specialised document collections in order to answer very precise and specialised queries related to specific domains. To achieve this goal, a semantic analysis of the documents must be performed and it must be specifically adapted to the domain of the collection. This is the role of the Ogmios platform, which conception and development are described in this paper. Processing such web document collections imposes various operational constraints. Integrating pre-existing NLP tools in a unique annotation platform raises some problems of interoperability. Adapting the semantic analysis to a given domain is an additional challenge. This paper shows how these problems have been solved : by distributing the annotation process on several machines, by wrapping NLP tools into modules that ensure the conformity of their input/output with the interchange format of the platform, and by integrating in a common architecture the annotation of large collection of documents and the building of semantic resources from acquisition corpora. This paper finally explains how the platform has been integrated in a specialised search engine. The performance of the platform for the annotation of web documents is compatible with the speed of the document crawling. The semantic annotations are exploited to enrich the user interface and handle the disambiguation, generalisation or refinement of user queries.
Fichier principal
Vignette du fichier
TAL-2008-49-2-05-Hamon.pdf (981.46 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-00641163 , version 1 (15-11-2011)

Identifiants

  • HAL Id : hal-00641163 , version 1

Citer

Thierry Hamon, Adeline Nazarenko. Le développement d'une plate-forme pour l'annotation spécialisée de documents Web : retour d'expérience. Revue TAL : traitement automatique des langues, 2008, 49 (2), pp.127-154. ⟨hal-00641163⟩
290 Consultations
418 Téléchargements

Partager

Gmail Facebook X LinkedIn More