scholarly journals Scalable privacy-preserving data sharing methodology for genome-wide association studies

2014 ◽  
Vol 50 ◽  
pp. 133-141 ◽  
Author(s):  
Fei Yu ◽  
Stephen E. Fienberg ◽  
Aleksandra B. Slavković ◽  
Caroline Uhler
2019 ◽  
Author(s):  
Mike A. Thelwall ◽  
Marcus Munafò ◽  
Amalia Mas Bleda ◽  
Emma Stuart ◽  
Meiko Makita ◽  
...  

AbstractPrimary data collected during a research study is increasingly shared and may be re-used for new studies. To assess the extent of data sharing in favourable circumstances and whether such checks can be automated, this article investigates the summary statistics of primary human genome-wide association studies (GWAS). This type of data is highly suitable for sharing because it is a standard research output, is straightforward to use in future studies (e.g., for secondary analysis), and may be already stored in a standard format for internal sharing within multi-site research projects. Manual checks of 1799 articles from 2010 and 2017 matching a simple PubMed query for molecular epidemiology GWAS were used to identify 330 primary human GWAS papers. Of these, only 10.6% reported the location of a complete set of GWAS summary data, increasing from 4.3% in 2010 to 16.8% in 2017. Whilst information about whether data was shared was usually located clearly within a data availability statement, the exact nature of the shared data was usually unspecified. Thus, data sharing is the exception even in suitable research fields with relatively strong norms regarding data sharing. Moreover, the lack of clear data descriptions within data sharing statements greatly complicates the task of automatically characterising shared data sets.


Sign in / Sign up

Export Citation Format

Share Document