scholarly journals Gene Sequence Clustering Based on the Profile Hidden Markov Model with Differential Identifiability

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xujie Ren ◽  
Tao Shang ◽  
Yatong Jiang ◽  
Jianwei Liu

In the era of big data, next-generation sequencing produces a large amount of genomic data. With these genetic sequence data, research in biology fields will be further advanced. However, the growth of data scale often leads to privacy issues. Even if the data is not open, it is still possible for an attacker to steal private information by a member inference attack. In this paper, we proposed a private profile hidden Markov model (PHMM) with differential identifiability for gene sequence clustering. By adding random noise into the model, the probability of identifying individuals in the database is limited. The gene sequences could be unsupervised clustered without labels according to the output scores of private PHMM. The variation of the divergence distance in the experimental results shows that the addition of noise makes the profile hidden Markov model distort to a certain extent, and the maximum divergence distance can reach 15.47 when the amount of data is small. Also, the cosine similarity comparison of the clustering model before and after adding noise shows that as the privacy parameters changes, the clustering model distorts at a low or high level, which makes it defend the member inference attack.

2018 ◽  
Vol 13 (5) ◽  
pp. 1081-1095 ◽  
Author(s):  
Zhongliu Zhuo ◽  
Yang Zhang ◽  
Zhi-li Zhang ◽  
Xiaosong Zhang ◽  
Jingzhong Zhang

2003 ◽  
Vol 310 (2) ◽  
pp. 574-579 ◽  
Author(s):  
Norihiro Kikuchi ◽  
Yeon-Dae Kwon ◽  
Masanori Gotoh ◽  
Hisashi Narimatsu

Sign in / Sign up

Export Citation Format

Share Document