Skip to Main content Skip to Navigation
Conference papers

Daniel@FinTOC-2019 Shared Task : TOC Extraction and Title Detection

Emmanuel Giguet 1 Gaël Lejeune 2, 3
1 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image et Instrumentation de Caen
3 STIH-LC - Équipe Linguistique computationnelle
STIH - Sens, Texte, Informatique, Histoire
Abstract : We present different methods for the two tasks of the 2019 FinTOC challenge: Title Detection and Table of Contents Extraction. For the Title Detection task we present different approaches using various features : visual characteristics , punctuation density and character n-grams. Our best approach achieved an official F-measure score of 94.88%, ranking 6 on this task. For the TOC extraction task, we presented a method combining visual characteristics of the document layout. With this method we ranked first on this task with 42.72%.
Complete list of metadata

Cited literature [21 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02303131
Contributor : Giguet Emmanuel Connect in order to contact the contributor
Submitted on : Wednesday, October 2, 2019 - 9:33:24 AM
Last modification on : Tuesday, January 4, 2022 - 5:46:16 AM

File

article-Giguet-Lejeune.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02303131, version 1

Citation

Emmanuel Giguet, Gaël Lejeune. Daniel@FinTOC-2019 Shared Task : TOC Extraction and Title Detection. The Second Financial Narrative Processing Workshop (FNP 2019), Sep 2019, Turku, Finland. pp.63-68. ⟨hal-02303131⟩

Share

Metrics

Record views

83

Files downloads

445