Topic modelling on archive documents from the 1970s: global policies on refugees - Archive ouverte HAL Accéder directement au contenu
Article Dans Une Revue Digital Scholarship in the Humanities Année : 2021

Topic modelling on archive documents from the 1970s: global policies on refugees

Philip Grant
  • Fonction : Auteur
Ratan Sebastian
  • Fonction : Auteur
Marc Allassonnière-Tang
Sara Cosemans
  • Fonction : Auteur

Résumé

This study conducts a historical analysis of global policies on refugees within typewritten and digitally born documents (c. 55,000 pages) from international and national archives. The data originate from the 1970s and are stored in archives from the UK and US governments, plus the United Nations High Commissioner for Refugees (UNHCR). The overarching theme is to analyse the involvement of the UK, the USA, and the UNHCR in different refugee cases that occurred during the 1970s. To do so, we (1) identify the main topics in each document; (2) investigate the transmission of topics horizontally (between organizations) and vertically (through time); and (3) suggest targeted areas of the document set for further close reading by historians. Standard Optical Character Recognition and object detection are used to extract information from documents and categorize them. Then, natural language processing (NLP) methods like topic modelling and clustering are used to identify topics and the relationships between them across time. The results identify several main themes covered by different organizations and how the focus of each organization changes diachronically. Besides its academic contribution, this study also demonstrates how, through the use of existing techniques with limited customization, digital technologies in the hands of the historian can augment and complement qualitative methods in bringing to light the themes and trends demonstrated in large bodies of historical documents.
Fichier non déposé

Dates et versions

hal-03435806 , version 1 (18-11-2021)

Identifiants

Citer

Philip Grant, Ratan Sebastian, Marc Allassonnière-Tang, Sara Cosemans. Topic modelling on archive documents from the 1970s: global policies on refugees. Digital Scholarship in the Humanities, 2021, 36 (4), pp.886-904. ⟨10.1093/llc/fqab018⟩. ⟨hal-03435806⟩
135 Consultations
0 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More