DataSHIELD: taking the analysis to the data, not the data to the analysis
A. Gaye
(1)
,
Y. Marcon
(2)
,
J. Isaeva
(3)
,
P. Laflamme
(2)
,
A. Turner
(1)
,
E. M. Jones
(4)
,
J. Minion
(1)
,
A. W. Boyd
(1)
,
C. J. Newby
(5)
,
M. L. Nuotio
(6, 7)
,
R. Wilson
(1)
,
O. Butters
(1)
,
B. Murtagh
(8)
,
I. Demir
(9)
,
D. Doiron
(2)
,
L. Giepmans
(10)
,
S. E. Wallace
(8)
,
I. Budin-Ljosne
(3)
,
C. Oliver Schmidt
(11)
,
P. Boffetta
(12, 13)
,
M. Boniol
(12)
,
M. Bota
(12)
,
K. W. Carter
(14)
,
N. Deklerk
(14)
,
C. Dibben
(15)
,
R. W. Francis
(14)
,
T. Hiekkalinna
(6, 7)
,
K. Hveem
(16)
,
K. Kvaloy
(16)
,
S. Millar
(17)
,
I. J. Perry
(17)
,
A. Peters
(18)
,
C. M. Phillips
(17)
,
F. Popham
(19)
,
G. Raab
(15)
,
E. Reischl
(18)
,
N. Sheehan
(8)
,
M. Waldenberger
(18)
,
M. Perola
(20)
,
E. van Den Heuvel
(21)
,
J. Macleod
(1)
,
B. M. Knoppers
(22)
,
R. P. Stolk
(10, 23)
,
I. Fortier
(2)
,
J. R. Harris
(3)
,
B. H. Woffenbuttel
(23, 10)
,
M. J. Murtagh
(1)
,
V. Ferretti
(24, 2)
,
P. R. Burton
(1, 2)
1
School of Social and Community Medicine [Bristol]
2 MUHC - McGill University Health Center [Montreal]
3 NIPH - Norwegian Institute of Public Health [Oslo]
4 UCL - University College of London [London]
5 Department of Infection, Immunity and Inflammation, Health Sciences, University of Leiceste
6 Institute for Molecular Medicine Finland (FIMM)
7 Department of Chronic Disease Prevention, Unit of Public Health Genomics
8 Department of Health Sciences [Leicester]
9 Department of Sociology [Leicester]
10 University of Groningen [Groningen]
11 Greifswald University Hospital
12 IPRI - International Prevention Research Institute
13 The Tisch Cancer Institute
14 UWA - The University of Western Australia
15 School of Geosciences [Edinburgh]
16 NTNU - Norwegian University of Science and Technology [Trondheim]
17 UCC - University College Cork
18 Research Unit of Molecular Epidemiology, Research Center for Environmental Health
19 MRC/CSO Social and Public Health Sciences Unit [Glasgow, UK]
20 University of Tartu
21 University Medical Center Groningen, Medical Statistic
22 CGP - Centre of Genomics and Policy [Montréal]
23 University Medical Center Groningen, LifeLines Cohort Study
24 OICR - Ontario Institute for Cancer Research [Canada]
2 MUHC - McGill University Health Center [Montreal]
3 NIPH - Norwegian Institute of Public Health [Oslo]
4 UCL - University College of London [London]
5 Department of Infection, Immunity and Inflammation, Health Sciences, University of Leiceste
6 Institute for Molecular Medicine Finland (FIMM)
7 Department of Chronic Disease Prevention, Unit of Public Health Genomics
8 Department of Health Sciences [Leicester]
9 Department of Sociology [Leicester]
10 University of Groningen [Groningen]
11 Greifswald University Hospital
12 IPRI - International Prevention Research Institute
13 The Tisch Cancer Institute
14 UWA - The University of Western Australia
15 School of Geosciences [Edinburgh]
16 NTNU - Norwegian University of Science and Technology [Trondheim]
17 UCC - University College Cork
18 Research Unit of Molecular Epidemiology, Research Center for Environmental Health
19 MRC/CSO Social and Public Health Sciences Unit [Glasgow, UK]
20 University of Tartu
21 University Medical Center Groningen, Medical Statistic
22 CGP - Centre of Genomics and Policy [Montréal]
23 University Medical Center Groningen, LifeLines Cohort Study
24 OICR - Ontario Institute for Cancer Research [Canada]
R. Wilson
- Fonction : Auteur
- PersonId : 766225
- ORCID : 0000-0001-9276-2368
P. Boffetta
- Fonction : Auteur
A. Peters
- Fonction : Auteur
- PersonId : 755326
- ORCID : 0000-0003-0224-647X
Résumé
BACKGROUND: Research in modern biomedicine and social science requires sample sizes so large that they can often only be achieved through a pooled co-analysis of data from several studies. But the pooling of information from individuals in a central database that may be queried by researchers raises important ethico-legal questions and can be controversial. In the UK this has been highlighted by recent debate and controversy relating to the UK's proposed 'care.data' initiative, and these issues reflect important societal and professional concerns about privacy, confidentiality and intellectual property. DataSHIELD provides a novel technological solution that can circumvent some of the most basic challenges in facilitating the access of researchers and other healthcare professionals to individual-level data. METHODS: Commands are sent from a central analysis computer (AC) to several data computers (DCs) storing the data to be co-analysed. The data sets are analysed simultaneously but in parallel. The separate parallelized analyses are linked by non-disclosive summary statistics and commands transmitted back and forth between the DCs and the AC. This paper describes the technical implementation of DataSHIELD using a modified R statistical environment linked to an Opal database deployed behind the computer firewall of each DC. Analysis is controlled through a standard R environment at the AC. RESULTS: Based on this Opal/R implementation, DataSHIELD is currently used by the Healthy Obese Project and the Environmental Core Project (BioSHaRE-EU) for the federated analysis of 10 data sets across eight European countries, and this illustrates the opportunities and challenges presented by the DataSHIELD approach. CONCLUSIONS: DataSHIELD facilitates important research in settings where: (i) a co-analysis of individual-level data from several studies is scientifically necessary but governance restrictions prohibit the release or sharing of some of the required data, and/or render data access unacceptably slow; (ii) a research group (e.g. in a developing nation) is particularly vulnerable to loss of intellectual property-the researchers want to fully share the information held in their data with national and international collaborators, but do not wish to hand over the physical data themselves; and (iii) a data set is to be included in an individual-level co-analysis but the physical size of the data precludes direct transfer to a new site for analysis.