Skip to Main content Skip to Navigation
Conference papers

Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction

Abstract : We present our contributions for the two tracks of the 2020 FinTOC Shared Tasks: Table of Content (ToC) extraction in English documents and French documents. We describe separately our work on Title Detection and ToC Extraction. For ToC Extraction, we propose an approach that combines information from multiple sources: the table of contents, the wording of the document, and lexical domain knowledge. For the title detection part, we compare surface features to character-based features on various training configurations. We show that title detection results are very sensitive to the kind of training dataset used.
Document type :
Conference papers
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03024867
Contributor : Giguet Emmanuel Connect in order to contact the contributor
Submitted on : Thursday, November 26, 2020 - 9:11:21 AM
Last modification on : Friday, December 3, 2021 - 11:43:31 AM
Long-term archiving on: : Saturday, February 27, 2021 - 6:20:08 PM

File

Daniel_FinTOC2020_TOC_detectio...
Files produced by the author(s)

Identifiers

  • HAL Id : hal-03024867, version 1

Citation

Emmanuel Giguet, Gaël Lejeune, Jean-Baptiste Tanguy. Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction. 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation @COLING’2020, Dec 2020, Barcelone, Spain. ⟨hal-03024867⟩

Share

Metrics

Record views

69

Files downloads

49