Daniel@FinTOC-2021: Taking Advantage of Images and Vectorial Shapes in Native PDF Document Analysis - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Daniel@FinTOC-2021: Taking Advantage of Images and Vectorial Shapes in Native PDF Document Analysis

Résumé

In this paper, we present our contribution to the FinTOC-2021 Shared Task "Financial Document Structure Extraction". We participated in the tracks dedicated to English and French document processing. We get results for Title detection and TOC generation performance which demonstrates a good precision. We address the problem in a fairly unusual but ambitious way which consists in considering simultaneously text content, vectorial shapes and images embedded in the native PDF document, and to structure the document in its entirety.
Fichier principal
Vignette du fichier
Fintoc-2021.fnp-1.13.pdf (101.37 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03744586 , version 1 (03-08-2022)

Identifiants

  • HAL Id : hal-03744586 , version 1

Citer

Emmanuel Giguet, Gaël Lejeune. Daniel@FinTOC-2021: Taking Advantage of Images and Vectorial Shapes in Native PDF Document Analysis. 3rd Financial Narrative Processing Workshop, Sep 2021, Lancaster, United Kingdom. pp.70-74. ⟨hal-03744586⟩
27 Consultations
16 Téléchargements

Partager

Gmail Facebook X LinkedIn More