scholarly journals seGMM: a new tool to infer sex from massively parallel sequencing data

2021 ◽  
Author(s):  
Sihan Liu ◽  
Yuanyuan Zeng ◽  
Meilin Chen ◽  
Qian Zhang ◽  
Lanchen Wang ◽  
...  

Inspecting concordance between self-reported sex and genotype-inferred sex from genomic data is a significant quality control measure in clinical genetic testing. Numerous tools have been developed to infer sex for genotyping array, whole-exome sequencing, and whole-genome sequencing data. However, improvements in sex inference from targeted gene sequencing panels are warranted. Here, we propose a new tool, seGMM, which applies unsupervised clustering (Gaussian Mixture Model) to determine the gender of a sample from the called genotype data integrated aligned reads. seGMM consistently demonstrated >99% sex inference accuracy in publicly available (1000 Genomes) and our in-house panel dataset, which achieved obviously better sex classification than existing popular tools. Compared to including features only in the X chromosome, our results show that adding additional features from Y chromosomes (e.g. reads mapped to the Y chromosome) can increase sex classification accuracy. Notably, for WES and WGS data, seGMM also has an extremely high degree of accuracy. Finally, we proved the ability of seGMM to infer sex in single patient or trio samples by combining with reference data and pinpointing potential sex chromosome abnormality samples. In general, seGMM provides a reproducible framework to infer sex from massively parallel sequencing data and has great promise in clinical genetics.

2012 ◽  
Vol 58 (10) ◽  
pp. 1467-1475 ◽  
Author(s):  
Kwan-Wood G Lam ◽  
Peiyong Jiang ◽  
Gary J W Liao ◽  
K C Allen Chan ◽  
Tak Y Leung ◽  
...  

Abstract BACKGROUND A genomewide genetic and mutational profile of a fetus was recently determined via deep sequencing of maternal plasma DNA. This technology could have important applications for noninvasive prenatal diagnosis (NIPD) of many monogenic diseases. Relative haplotype dosage (RHDO) analysis, a core step of this procedure, would allow one to elucidate the maternally inherited half of the fetal genome. For clinical applications, the cost and complexity of data analysis might be reduced via targeted application of this approach to selected genomic regions containing disease-causing genes. There is thus a need to explore the feasibility of performing RHDO analysis in a targeted manner. METHODS We performed target enrichment by using solution-phase hybridization followed by massively parallel sequencing of the β-globin gene region in 2 families undergoing prenatal diagnosis for β-thalassemia. We used digital PCR strategies to physically deduce parental haplotypes. Finally, we performed RHDO analysis with target-enriched sequencing data and parental haplotypes to reveal the β-thalassemic status for the fetuses. RESULTS A mean sequencing depth of 206-fold was achieved in the β-globin gene region by targeted sequencing of maternal plasma DNA. RHDO analysis was successful for the sequencing data obtained from the target-enriched samples, including a region in one of the families in which the parents had similar haplotype structures. Data analysis revealed that both fetuses were heterozygous carriers of β-thalassemia. CONCLUSIONS Targeted sequencing of maternal plasma DNA for NIPD of monogenic diseases is feasible.


2010 ◽  
Vol 3 (1) ◽  
Author(s):  
Stefan Enroth ◽  
Robin Andersson ◽  
Claes Wadelius ◽  
Jan Komorowski

PLoS ONE ◽  
2016 ◽  
Vol 11 (10) ◽  
pp. e0164169 ◽  
Author(s):  
Geik Yong Ang ◽  
Choo Yee Yu ◽  
Vinothini Subramaniam ◽  
Mohd Ikhmal Hanif Abdul Khalid ◽  
Tuan Azlin Tuan Abdu Aziz ◽  
...  

2019 ◽  
Vol 38 ◽  
pp. 15-22 ◽  
Author(s):  
Brian A Young ◽  
Katherine Butler Gettings ◽  
Bruce McCord ◽  
Peter M. Vallone

Sign in / Sign up

Export Citation Format

Share Document