Automatic annotation of bibliographical references in digital humanities books, articles and blogs

Abstract : In this paper, we deal with the problem of extracting and processing useful information from bibliographic references in Digital Humanities (DH) data. A machine learning tech- nique for sequential data analysis, Conditional Random Field is applied to a corpus extracted from OpenEdition site, a web platform for journals and book collections in the hu- manities and social sciences. We present our ongoing project with this purpose that includes the construction of a proper corpus and a efficient CRF model on this as a preliminary. This project is supported by Google Grant for Digital Hu- manities. A number of experiments are conducted to find one of the best settings for a CRF model on the corpus, and we verify them both in an automatic and manual way of evaluation.
Document type :
Conference papers
Liste complète des métadonnées

https://hal.archives-ouvertes.fr/hal-01317638
Contributor : Bibliothèque Universitaire Déposants Hal-Avignon <>
Submitted on : Monday, January 21, 2019 - 11:52:03 AM
Last modification on : Friday, March 22, 2019 - 11:34:07 AM

File

article.pdf
Files produced by the author(s)

Identifiers

Collections

Citation

Young-Min Kim, Patrice Bellot, Elodie Faath, Marin Dacos. Automatic annotation of bibliographical references in digital humanities books, articles and blogs. 4th ACM workshop on Online books, complementary social media and crowdsourcing - BooksOnline '11, 2011, Glasgow, United Kingdom. ⟨10.1145/2064058.2064068⟩. ⟨hal-01317638⟩

Share

Metrics

Record views

99

Files downloads

14