Skip to Main content Skip to Navigation
New interface
Conference papers

Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction

Abstract : We present our contributions for the two tracks of the 2020 FinTOC Shared Tasks: Table of Content (ToC) extraction in English documents and French documents. We describe separately our work on Title Detection and ToC Extraction. For ToC Extraction, we propose an approach that combines information from multiple sources: the table of contents, the wording of the document, and lexical domain knowledge. For the title detection part, we compare surface features to character-based features on various training configurations. We show that title detection results are very sensitive to the kind of training dataset used.
Document type :
Conference papers
Complete list of metadata
Contributor : Giguet Emmanuel Connect in order to contact the contributor
Submitted on : Thursday, November 26, 2020 - 9:11:21 AM
Last modification on : Saturday, June 25, 2022 - 9:56:24 AM
Long-term archiving on: : Saturday, February 27, 2021 - 6:20:08 PM


Files produced by the author(s)


  • HAL Id : hal-03024867, version 1


Emmanuel Giguet, Gaël Lejeune, Jean-Baptiste Tanguy. Daniel@FinTOC’2 Shared Task: Title Detection and Structure Extraction. 1st Joint Workshop on Financial Narrative Processing and MultiLing Financial Summarisation @COLING’2020, Dec 2020, Barcelone, Spain. ⟨hal-03024867⟩



Record views


Files downloads