Faculty Opinions recommendation of Data Sanitization to Reduce Private Information Leakage from Functional Genomics.

Author(s):  
Kevin Yip
Cell ◽  
2020 ◽  
Vol 183 (4) ◽  
pp. 905-917.e16
Author(s):  
Gamze Gürsoy ◽  
Prashant Emani ◽  
Charlotte M. Brannon ◽  
Otto A. Jolanki ◽  
Arif Harmanci ◽  
...  

2018 ◽  
Author(s):  
Gamze Gürsoy ◽  
Prashant Emani ◽  
Charlotte M. Brannon ◽  
Otto A. Jolanki ◽  
Arif Harmanci ◽  
...  

AbstractThe generation of functional genomics datasets is surging, as they provide insight into gene regulation and organismal phenotypes (e.g., genes upregulated in cancer). The intention of functional genomics experiments is not necessarily to study genetic variants, yet they pose privacy concerns due to their use of next-generation sequencing. Moreover, there is a great incentive to share raw reads for better analyses and general research reproducibility. Thus, we need new modes of sharing beyond traditional controlled-access models. Here, we develop a data-sanitization procedure allowing raw functional genomics reads to be shared while minimizing privacy leakage, thus enabling principled privacy-utility trade-offs. It works with traditional Illumina-based assays and newer technologies such as 10x single-cell RNA-sequencing. The procedure depends on quantifying the privacy leakage in reads by statistically linking study participants to known individuals. We carried out these linkages using data from highly accurate reference genomes and more realistic environmental samples.


2017 ◽  
Vol 26 (01) ◽  
pp. 212-213

Agarwal V, Podchiyska T, Banda JM, Goel V, Leung TI, Minty EP, Sweeney TE, Gyang E, Shah NH. Learning statistical models of phenotypes using noisy labeled training data. J Am Med Inform Assoc 2016;23(6):1166-73 https://academic.oup.com/jamia/article-lookup/doi/10.1093/jamia/ocw028 Harmanci A, Gerstein M. Quantification of private information leakage from phenotype-genotype data: linking attacks. Nat Methods 2016;13(3):251-6 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4834871/ Pfiffner PB, Pinyol I, Natter MD, Mandl KD. C3-PRO: Connecting ResearchKit to the Health System Using i2b2 and FHIR. PloS One 2016;11(3):e0152722 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4816293/ Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, ‘t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/ Springer DB, Tarassenko L, Clifford GD. Logistic regression-HSMM-based heart sound segmentation. IEEE Trans Biomed Eng 2016 Apr;63(4):822-32


Author(s):  
Suriya Murugan ◽  
Anandakumar H.

Online social networks, such as Facebook are increasingly used by many users and these networks allow people to publish and share their data to their friends. The problem is user privacy information can be inferred via social relations. This chapter makes a study and performs research on managing those confidential information leakages which is a challenging issue in social networks. It is possible to use learning methods on user released data to predict private information. Since the main goal is to distribute social network data while preventing sensitive data disclosure, it can be achieved through sanitization techniques. Then the effectiveness of those techniques is explored, and the methods of collective inference are used to discover sensitive attributes of the user profile data set. Hence, sanitization methods can be used efficiently to decrease the accuracy of both local and relational classifiers and allow secure information sharing by maintaining user privacy.


Electronics ◽  
2020 ◽  
Vol 9 (5) ◽  
pp. 719 ◽  
Author(s):  
Yangyang Li ◽  
Hao Jin ◽  
Xiangyi Yu ◽  
Haiyong Xie ◽  
Yabin Xu ◽  
...  

In the information age, leaked private information may cause significant physical and mental harm to the relevant parties, leading to a negative social impact. In order to effectively evaluate the impact of such information leakage in today’s social networks, it is necessary to accurately predict the scope and depth of private information diffusion. By doing so, it would be feasible to prevent and control the improper spread and diffusion of private information. In this paper, we propose an intelligent prediction method for private information diffusion in social networks based on comprehensive data analysis. We choose Sina Weibo, one of the most prominent social networks in China, to study. Firstly, a prediction model of message forwarding behavior is established by analyzing the characteristic factors that influence the forwarding behavior of the micro-blog users. Then the influence of users is calculated based on the interaction time and topological structure of users relationship, and the diffusion critical paths are identified. Finally, through the user forwarding probability transmission, we determine the micro-blog diffusion cut-off conditions. The simulation results on Sina Weibo data set show that the prediction accuracy is 86.9%, which indicates that our method is efficient to predict the message diffusion in real-world social networks.


Author(s):  
Suriya Murugan ◽  
Anandakumar H.

Online social networks, such as Facebook are increasingly used by many users and these networks allow people to publish and share their data to their friends. The problem is user privacy information can be inferred via social relations. This chapter makes a study and performs research on managing those confidential information leakages which is a challenging issue in social networks. It is possible to use learning methods on user released data to predict private information. Since the main goal is to distribute social network data while preventing sensitive data disclosure, it can be achieved through sanitization techniques. Then the effectiveness of those techniques is explored, and the methods of collective inference are used to discover sensitive attributes of the user profile data set. Hence, sanitization methods can be used efficiently to decrease the accuracy of both local and relational classifiers and allow secure information sharing by maintaining user privacy.


2017 ◽  
Vol 26 (01) ◽  
pp. e19-e20

Agarwal V, Podchiyska T, Banda JM, Goel V, Leung TI, Minty EP, Sweeney TE, Gyang E, Shah NH. Learning statistical models of phenotypes using noisy labeled training data. J Am Med Inform Assoc 2016;23(6):1166-73 https://academic.oup.com/jamia/article-lookup/doi/10.1093/jamia/ocw028 Harmanci A, Gerstein M. Quantification of private information leakage from phenotype-genotype data: linking attacks. Nat Methods 2016;13(3):251-6 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4834871/ Pfiffner PB, Pinyol I, Natter MD, Mandl KD. C3-PRO: Connecting ResearchKit to the Health System Using i2b2 and FHIR. PloS One 2016;11(3):e0152722 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4816293/ Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, ‘t Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 2016;3:160018 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/ Springer DB, Tarassenko L, Clifford GD. Logistic regression-HSMM-based heart sound segmentation. IEEE Trans Biomed Eng 2016 Apr;63(4):822-32


Sign in / Sign up

Export Citation Format

Share Document