The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use - Archive ouverte HAL Accéder directement au contenu
Communication Dans Un Congrès Année : 2017

The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use

Jarmo Saarti
  • Fonction : Auteur
  • PersonId : 1024805
Jari Ropponen
  • Fonction : Auteur
  • PersonId : 1024806

Résumé

The Karjala database contains digitized demographic data of the parish registers from the regions ceded to the Soviet Union in 1944. The objectives of the digitization project have been to promote access to digitized records for scientific research and genealogy as well as encouraging research on the people of the ceded Karelia region. The main sources for the database have been catechetical lists, lists of children, and registers of vital statistics (registers of births, marriages, migrations and deaths) that are available in Digital Archives of the National Archives of Finland from the period of 1681 – 1949. The data in the database amounts to about 10.3 million entries, but only data older than 100 years is published openly on the Internet. According to decisions by the Finnish data protection authorities, the Personal Data Act is applied to personal registers less than 100 years old. The digitization process is still going on; it has been calculated that there are 1.2 million entries still to be processed. The database is available to users via https://katiha.mamk.fi/. At present, there are about 6.5 million file entries available on the Internet, each presenting data about one individual, e.g. names, the date of birth and death, the cause of death, age, gender, marital status, occupation, residence, migration, the parish. The Karjala database can be exploited for diverse research purposes; it improves access to the church records that are sometimes very difficult to read. Information in the database can be utilized for historical research, medical genetics, social sciences, and family and onomastics. The database is can be utilized for clarifying family structures, migratory patterns or child mortality. The database also offers excellent opportunities for interdisciplinary research. Our presentation will describe the digitization process management of old, handwritten documents that consist of non-structured data from a historical period that contains varied linguistic material: several languages from a historical period where nations, states and languages were still evolving, different calendars and spelling rules etc. We will also introduce our plans to use text recognition technology so that the handwritten documents such as the Karjala database will be incorporated into the international READ project network http://read.transkribus.eu/network/. We will also discuss the challenges encountered in this type of heterogeneous data and the possibilities for more defined and structured data management that could enable the automated use of the database. We will also include in our presentation a description of the evolution of the different phases of the database, emphasizing the evolution of the database and its linkage with internet technologies e.g. how they have either hindered or enabled the digitization project.
Fichier principal
Vignette du fichier
331967.pdf (1.19 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-01660143 , version 1 (11-12-2017)

Licence

Paternité

Identifiants

  • HAL Id : hal-01660143 , version 1

Citer

Jarmo Saarti, Jari Ropponen, Satu Soivanen. The Karjala database – challenges and solutions for digitizing heterogeneous, old genealogical documents for internet use. DH. Opportunities and Risks. Connecting Libraries and Research, Aug 2017, Berlin, Germany. ⟨hal-01660143⟩
206 Consultations
714 Téléchargements

Partager

Gmail Facebook X LinkedIn More