scholarly journals Maximum Likelihood Estimation of Biological Relatedness from Low Coverage Sequencing Data

2015 ◽  
Author(s):  
Mikhail Lipatov ◽  
Komal Sanjeev ◽  
Rob Patro ◽  
Krishna Veeramah

The inference of biological relatedness from DNA sequence data has a wide array of applications, such as in the study of human disease, anthropology and ecology. One of the most common analytical frameworks for performing this inference is to genotype individuals for large numbers of independent genomewide markers and use population allele frequencies to infer the probability of identity-by-descent (IBD) given observed genotypes. Current implementations of this class of methods assume genotypes are known without error. However, with the advent of second generation sequencing data there are now an increasing number of situations where the confidence attached to a particular genotype may be poor because of low coverage. Such scenarios may lead to biased estimates of the kinship coefficient, Φ. We describe an approach that utilizes genotype likelihoods rather than a single observed best genotype to estimate Φ and demonstrate that we can accurately infer relatedness in both simulated and real second generation sequencing data from a wide variety of human populations down to at least the third degree when coverage is as low as 2x for both individuals, while other commonly used methods such as PLINK exhibit large biases in such situations. In addition the method appears to be robust when the assumed population allele frequencies are diverged from the true frequencies for realistic levels of genetic drift. This approach has been implemented in the C++ software lcmlkin.

2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Huifang Zhang ◽  
Chunyan He ◽  
Rui Tian ◽  
Ruilan Wang

Abstract Background Cellulosimicrobium cellulans is a gram-positive filamentous bacterium found primarily in soil and sewage that rarely causes human infection, especially in previously healthy adults, but when it does, it often indicates a poor prognosis. Case presentation We report a case of endocarditis and intracranial infection caused by C. cellulans in a 52-year-old woman with normal immune function and no implants in vivo. The patient started with a febrile headache that progressed to impaired consciousness after 20 days, and she finally died after treatment with vancomycin combined with rifampicin. C. cellulans was isolated from her blood cultures for 3 consecutive days after her admission; however, there was only evidence of C. cellulans sequences for two samples in the second-generation sequencing data generated from her peripheral blood, which were ignored by the technicians. No C. cellulans bands were detected in her cerebrospinal fluid by second-generation sequencing. Conclusions Second-generation sequencing seems to have limitations for certain specific strains of bacteria.


2012 ◽  
Vol 14 (2) ◽  
pp. 193-202 ◽  
Author(s):  
I. Milne ◽  
G. Stephen ◽  
M. Bayer ◽  
P. J. A. Cock ◽  
L. Pritchard ◽  
...  

2010 ◽  
Vol 26 (24) ◽  
pp. 3051-3058 ◽  
Author(s):  
Sergii Ivakhno ◽  
Tom Royce ◽  
Anthony J. Cox ◽  
Dirk J. Evers ◽  
R. Keira Cheetham ◽  
...  

2014 ◽  
Vol 6 (3) ◽  
pp. 657-659 ◽  
Author(s):  
Selina Patel ◽  
Kirsten Thompson ◽  
Liam Williams ◽  
Peter Tsai ◽  
Rochelle Constantine ◽  
...  

2013 ◽  
Vol 7 (4) ◽  
pp. 409-417 ◽  
Author(s):  
David H. Warshauer ◽  
David Lin ◽  
Kumar Hari ◽  
Ravi Jain ◽  
Carey Davis ◽  
...  

Author(s):  
Anne Krogh Nøhr ◽  
Kristian Hanghøj ◽  
Genis Garcia Erill ◽  
Zilong Li ◽  
Ida Moltke ◽  
...  

Abstract Estimation of relatedness between pairs of individuals is important in many genetic research areas. When estimating relatedness, it is important to account for admixture if this is present. However, the methods that can account for admixture are all based on genotype data as input, which is a problem for low-depth next-generation sequencing (NGS) data from which genotypes are called with high uncertainty. Here we present a software tool, NGSremix, for maximum likelihood estimation of relatedness between pairs of admixed individuals from low-depth NGS data, which takes the uncertainty of the genotypes into account via genotype likelihoods. Using both simulated and real NGS data for admixed individuals with an average depth of 4x or below we show that our method works well and clearly outperforms all the commonly used state-of-the-art relatedness estimation methods PLINK, KING, relateAdmix, and ngsRelate that all perform quite poorly. Hence, NGSremix is a useful new tool for estimating relatedness in admixed populations from low-depth NGS data. NGSremix is implemented in C/C ++ in a multi-threaded software and is freely available on Github https://github.com/KHanghoj/NGSremix.


Sign in / Sign up

Export Citation Format

Share Document