Daniel@FinTOC-2019 Shared Task : TOC Extraction and Title Detection

Emmanuel Giguet 1 Gaël Lejeune 2
1 Equipe Hultech - Laboratoire GREYC - UMR6072
GREYC - Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen
Abstract : We present different methods for the two tasks of the 2019 FinTOC challenge: Title Detection and Table of Contents Extraction. For the Title Detection task we present different approaches using various features : visual characteristics , punctuation density and character n-grams. Our best approach achieved an official F-measure score of 94.88%, ranking 6 on this task. For the TOC extraction task, we presented a method combining visual characteristics of the document layout. With this method we ranked first on this task with 42.72%.
Complete list of metadatas

Cited literature [21 references]  Display  Hide  Download

https://hal.archives-ouvertes.fr/hal-02303131
Contributor : Giguet Emmanuel <>
Submitted on : Wednesday, October 2, 2019 - 9:33:24 AM
Last modification on : Friday, October 4, 2019 - 2:00:26 AM

File

article-Giguet-Lejeune.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02303131, version 1

Citation

Emmanuel Giguet, Gaël Lejeune. Daniel@FinTOC-2019 Shared Task : TOC Extraction and Title Detection. The Second Financial Narrative Processing Workshop (FNP 2019), Sep 2019, Turku, Finland. pp.63-68. ⟨hal-02303131⟩

Share

Metrics

Record views

28

Files downloads

55