Towards Extraction of Theorems and Proofs in Scholarly Articles - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2021

Towards Extraction of Theorems and Proofs in Scholarly Articles

Shrey Mishra
  • Fonction : Auteur
  • PersonId : 1105903
Lucas Pluvinage
  • Fonction : Auteur
  • PersonId : 1078307

Résumé

Scholarly articles in mathematical fields often feature mathematical statements (theorems, propositions, etc.) and their proofs. In this paper, we present preliminary work for extracting such information from PDF documents, with several types of approaches: vision (using YOLO), natural language (with transformers), and styling information (with linear conditional random fields). Our main task is to identify which parts of the paper to label as theorem-like environments and proofs. We rely on a dataset collected from arXiv, with LaTeX sources of research articles used to train the models.
Fichier principal
Vignette du fichier
without_copy.pdf (1.38 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03293643 , version 1 (21-07-2021)

Identifiants

  • HAL Id : hal-03293643 , version 1

Citer

Shrey Mishra, Lucas Pluvinage, Pierre Senellart. Towards Extraction of Theorems and Proofs in Scholarly Articles. DocEng '21 - 21st ACM Symposium on Document Engineering, Aug 2021, Limerick, Ireland. ⟨hal-03293643⟩
216 Consultations
238 Téléchargements

Partager

Gmail Facebook X LinkedIn More