scholarly journals An ancestral informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data

2019 ◽  
Author(s):  
Li-Ju Wang ◽  
Catherine W. Zhang ◽  
Sophia C. Su ◽  
Hung-I H. Chen ◽  
Yu-Chiao Chiu ◽  
...  

AbstractBackgroundEuropeans and American Indians were major genetic ancestry of Hispanics in the U.S. In those ancestral groups, it has markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility variants specifically in Hispanics. The incidence rate and genetic mutational pattern of liver cancer have been shown substantial disparity between Hispanic, Asian and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole exome sequencing (WES) protocols commonly used in translational research and clinical practice do not contain these markers, thus, the challenge to accurately determine a patient’s admixture proportion without subject to additional DNA testing.MethodsHere we designed a bioinformatics pipeline to obtain an AIM panel. The panel infers 3-way genetic admixture from three distinct continental populations (African (AFR), European (EUR), and East Asian (EAS)) constraint within evolutionary-conserved exome regions. Briefly, we extract ∼1 million exonic SNPs from all individuals of three populations in the 1000 Genomes Project. Then, the SNPs were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants only, and assembled as an AIM panel with the top ancestral informativeness statistics based on the In-statistic. The selected AIM panel was applied to training dataset and clinical dataset. Finally, The ancestral proportions of each individual was estimated by STRUCTURE.ResultsIn this study, the optimally selected AIM panel with 250 markers, or the UT-AIM250 panel, was performed with better accuracy as one of the published AIM panels when we tested with 3 ancestral populations (Accuracy: 0.995 ± 0.012 for AFR, 0.997 ± 0.007 for EUR, and 0.994 ± 0.012 for EAS). We demonstrated the utility of UT-AIM250 panel on the admixed American (AMR) of the 1000 Genomes Project and obtained similar results (AFR: 0.085 ± 0.098; EUR: 0.665 ± 0.182; and EAS 0.250 ± 0.205) to previously published AIM panels (Phillips-AIM34: AFR: 0.096 ± 0.127, EUR: 0.575 ± 0.29; and EAS: 0.330 ± 0.315; Wei-AIM278: AFR: 0.070 ± 0.096, EUR: 0.537 ± 0.267, and EAS: 0.393 ± 0.300) with no significant difference (Pearson correlation, P < 10-50, n = 347 samples). Subsequently, we applied UT-AIM250 panel to clinical datasets of self-reported Hispanic patients in South Texas with hepatocellular carcinoma (26 patients). Our estimated admixture proportions from adjacent non-cancer liver tissue data of Hispanics in South Texas is (AFR: 0.065 ± 0.043; EUR: 0.594 ± 0.150; and EAS: 0.341 ± 0.160), with smaller variation due to its unique Texan/Mexican American population in South Texas. Similar admixture proportion from the corresponding tumor tissue we also obtained. In addition, we estimated admixture proportions of entire TCGA-LIHC samples (376 patients) using UT-AIM250 panel. We demonstrated that our AIM panel estimate consistent admixture proportions from DNAs derived from tumor and normal tissues, and 2 possible incorrect reported race/ethnicity, and/or provide race/ethnicity determination if necessary.ConclusionsTaken together, we demonstrated the feasibility of using evolutionary-conserved exome regions to distinguish genetic ancestry descendants based on 3 continental-ancestry proportion, provided a robust and reliable control for sample collection or patient stratification for genetic analysis. R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S12) ◽  
Author(s):  
Li-Ju Wang ◽  
Catherine W. Zhang ◽  
Sophia C. Su ◽  
Hung-I H. Chen ◽  
Yu-Chiao Chiu ◽  
...  

Abstract Background Europeans and American Indians were major genetic ancestry of Hispanics in the U.S. These ancestral groups have markedly different incidence rates and outcomes in many types of cancers. Therefore, the genetic admixture may cause biased genetic association study with cancer susceptibility variants specifically in Hispanics. For example, the incidence rate of liver cancer has been shown with substantial disparity between Hispanic, Asian and non-Hispanic white populations. Currently, ancestry informative marker (AIM) panels have been widely utilized with up to a few hundred ancestry-informative single nucleotide polymorphisms (SNPs) to infer ancestry admixture. Notably, current available AIMs are predominantly located in intron and intergenic regions, while the whole exome sequencing (WES) protocols commonly used in translational research and clinical practice do not cover these markers. Thus, it remains challenging to accurately determine a patient’s admixture proportion without additional DNA testing. Results In this study we designed an unique AIM panel that infers 3-way genetic admixture from three distinct and selective continental populations (African (AFR), European (EUR), and East Asian (EAS)) within evolutionarily conserved exonic regions. Initially, about 1 million exonic SNPs from selective three populations in the 1000 Genomes Project were trimmed by their linkage disequilibrium (LD), restricted to biallelic variants, and finally we optimized to an AIM panel with 250 SNP markers, or the UT-AIM250 panel, using their ancestral informativeness statistics. Comparing to published AIM panels, UT-AIM250 performed better accuracy when we tested with three ancestral populations (accuracy: 0.995 ± 0.012 for AFR, 0.997 ± 0.007 for EUR, and 0.994 ± 0.012 for EAS). We further demonstrated the performance of the UT-AIM250 panel to admixed American (AMR) samples of the 1000 Genomes Project and obtained similar results (AFR, 0.085 ± 0.098; EUR, 0.665 ± 0.182; and EAS, 0.250 ± 0.205) to previously published AIM panels (Phillips-AIM34: AFR, 0.096 ± 0.127, EUR, 0.575 ± 0.290, and EAS, 0.330 ± 0.315; Wei-AIM278: AFR, 0.070 ± 0.096, EUR, 0.537 ± 0.267, and EAS, 0.393 ± 0.300). Subsequently, we applied the UT-AIM250 panel to a clinical dataset of 26 self-reported Hispanic patients in South Texas with hepatocellular carcinoma (HCC). We estimated the admixture proportions using WES data of adjacent non-cancer liver tissues (AFR, 0.065 ± 0.043; EUR, 0.594 ± 0.150; and EAS, 0.341 ± 0.160). Similar admixture proportions were identified from corresponding tumor tissues. In addition, we estimated admixture proportions of The Cancer Genome Atlas (TCGA) collection of hepatocellular carcinoma (TCGA-LIHC) samples (376 patients) using the UT-AIM250 panel. The panel obtained consistent admixture proportions from tumor and matched normal tissues, identified 3 possible incorrectly reported race/ethnicity, and/or provided race/ethnicity determination if necessary. Conclusions Here we demonstrated the feasibility of using evolutionarily conserved exonic regions to infer admixture proportions and provided a robust and reliable control for sample collection or patient stratification for genetic analysis. R implementation of UT-AIM250 is available at https://github.com/chenlabgccri/UT-AIM250.



Author(s):  
Yang Liu ◽  
Jin-Jin Zhang ◽  
Shun-Yu Piao ◽  
Ren-Juan Shen ◽  
Ya Ma ◽  
...  

High myopia (HM) is one of the leading causes of visual impairment worldwide. In order to expand the myopia gene spectrum in the Chinese population, we investigated genetic mutations in a cohort of 27 families with HM from Northwest China by using whole-exome sequencing (WES). Genetic variations were filtered using bioinformatics tools and cosegregation analysis. A total of 201 candidate mutations were detected, and 139 were cosegregated with the disease in the families. Multistep analysis revealed four missense variants in four unrelated families, including c.904C&gt;T (p.R302C) in CSMD1, c.860G&gt;A (p.R287H) in PARP8, c.G848A (p.G283D) in ADAMTSL1, and c.686A&gt;G (p.H229R) in FNDC3B. These mutations were rare or absent in the Exome Aggregation Consortium (ExAC), 1000 Genomes Project, and Genome Aggregation Database (gnomAD), indicating that they are new candidate disease-causing genes. Our findings not only expand the myopia gene spectrum but also provide reference information for further genetic study of heritable HM.



BMC Genomics ◽  
2014 ◽  
Vol 15 (Suppl 3) ◽  
pp. S2 ◽  
Author(s):  
Maria Angela Diroma ◽  
Claudia Calabrese ◽  
Domenico Simone ◽  
Mariangela Santorsola ◽  
Francesco Maria Calabrese ◽  
...  


2014 ◽  
Vol 62 (S 02) ◽  
Author(s):  
M. Hitz ◽  
S. Al-Turki ◽  
A. Schalinski ◽  
U. Bauer ◽  
T. Pickardt ◽  
...  


2018 ◽  
Author(s):  
Yasemin Dincer ◽  
Michael Zech ◽  
Matias Wagner ◽  
Nikolai Jung ◽  
Volker Mall ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document