Genomic Privacy: Performance Analysis, Open Issues and Future Research Directions (Preprint)
BACKGROUND In the past decade, importance of genomic data has been increased in medical research. The cost of genomic sequencing is reducing day by day we can include genomic data in routine medical care. This data is being used to detect/ prevent inherited diseases. But using this data in research purpose may increase the chance of leakage privacy or genetic information (sensitive information of individuals) to unidentified users. In current, many issues and challenges exist in preserving privacy of genomic data. In general, Identity Tracing attack, completion attack and attribute disclosure attack are three attacks (mitigated) on genomic data (in-current). Also, accessing and integrating genomic is difficult to handle and analysis to make a useful decision for future. This paper discusses about the available sequencing methods (for genomic data), where and how genomic data will be useful in prediction (i.e., in various applications). And also provide a picture of future using genomic analytics for extracting useful patterns from this data.Note that many attempts have been made towards this topic but all existed worksare strictly rule based, i.e., has no quantitative measurement of the risk of privacy breaches (genotype and phenotype information). Here, privacy-preserving linkage of genotype and phenotype information (across different locations) means genotypes stored in a sequencing facility and phenotypes stored in an electronic health record. This article discusses about several aspects in genomic privacy, with a focus on security vulnerabilities identified by them and their (possible) suggested solutions. In this article, we focus to accelerate discoveries using best prediction tools with explaining a clear cut approach, i.e., we need to protect genomic data or not or it is just a myth. In last, we listed several genomic data protection techniques against re-identification attacks and systematic comparison of existing genomic privacy preserving methodologies (attempts made by several researchers in the previous decade) in Appendix A. OBJECTIVE importance of genomic data existing genomic data privacy preservation methods comparison METHODS Re- identification Cryptographic RESULTS Comparison of different methods CONCLUSIONS Privacy is a sensitive issue and need to be protected from outsider world/ from malicious (unidentified) users. Towards this serious concern, in this article we have shared several useful suggestions, opinions with respect to genomic data (also other type of data). This paper has started with introduction to genomic data (also its characteristics), to its scope/ importance in medical care. We highlighted the related works done towards this area. We also explained evolution of genomic sequencing and various metrics to measure the performance. Later we explained the importance of genomic data in terms of where this data is useful and why it is useful with the help of one use case. Then we described how genomic data is different from other types of big data. Later, we have discussed several serious concerns, challenges, and research gaps and have provided some opportunity to the future researchers (in genomic privacy). In this article we also make a comparison between genomic privacy and other types of privacy (in brief). Hence, we find out that privacy especially genomic is necessary to protect and require attention form research communities. We request to computer science community to provide/ make/ develop some techniques for data privacy and confidentiality protection, which work/ use on real –world problems/ tested.