Skip to Main content Skip to Navigation
Conference papers

Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools

Abstract : Automatic bibliographic reference annotation involves the tokenization and identification of reference fields. Recent methods use machine learning techniques such as Conditional Random Fields to tackle this problem. On the other hand, the state of the art methods always learn and evaluate their systems with a well structured data having simple format such as bibliography at the end of scientific articles. And that is a reason why the parsing of new reference different from a regular format does not work well. In our previous work, we have established a standard for the tokeniza-tion and feature selection with a less formulaic data such as notes. In this paper, we evaluate our system BILBO with other popular online reference parsing tools on a new data from totally different source. BILBO is constructed with our own corpora extracted and annotated from real world data, digital humanities articles of site (90% in French) of OpenEdition. The robustness of BILBO system allows a language independent tagging result. We expect that this first attempt of evaluation will motivate the development of other efficient techniques for the scattered and less formulaic bibliographic references.
Document type :
Conference papers
Complete list of metadata
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Wednesday, May 18, 2016 - 4:46:42 PM
Last modification on : Tuesday, November 17, 2020 - 3:09:04 AM



Young-Min Kim, Patrice Bellot, Jade Tavernier, Elodie Faath, Marin Dacos. Evaluation of BILBO reference parsing in digital humanities via a comparison of different tools. DocEng '12 , Sep 2012, Paris, France. ⟨10.1145/2361354.2361400⟩. ⟨hal-01317656⟩



Record views