Whole-Genome Sequencing Association Analyses of Stroke and Its Subtypes in Ancestrally Diverse Populations From Trans-Omics for Precision Medicine Project

Stroke ◽  
2021 ◽  
Author(s):  
Yao Hu ◽  
Jeffrey W. Haessler ◽  
Regina Manansala ◽  
Kerri L. Wiggins ◽  
Arden Moscati ◽  
...  

Background and Purpose: Stroke is the leading cause of death and long-term disability worldwide. Previous genome-wide association studies identified 51 loci associated with stroke (mostly ischemic) and its subtypes among predominantly European populations. Using whole-genome sequencing in ancestrally diverse populations from the Trans-Omics for Precision Medicine (TOPMed) Program, we aimed to identify novel variants, especially low-frequency or ancestry-specific variants, associated with all stroke, ischemic stroke and its subtypes (large artery, cardioembolic, and small vessel), and hemorrhagic stroke and its subtypes (intracerebral and subarachnoid). Methods: Whole-genome sequencing data were available for 6833 stroke cases and 27 116 controls, including 22 315 European, 7877 Black, 2616 Hispanic/Latino, 850 Asian, 54 Native American, and 237 other ancestry participants. In TOPMed, we performed single variant association analysis examining 40 million common variants and aggregated association analysis focusing on rare variants. We also combined TOPMed European populations with over 28 000 additional European participants from the UK BioBank genome-wide array data through meta-analysis. Results: In the single variant association analysis in TOPMed, we identified one novel locus 13q33 for large artery at whole-genome-wide significance ( P <5.00×10 −9 ) and 4 novel loci at genome-wide significance ( P <5.00×10 − 8 ), all of which need confirmation in independent studies. Lead variants in all 5 loci are low-frequency but are more common in non-European populations. An aggregation of synonymous rare variants within the gene C6orf26 demonstrated suggestive evidence of association for hemorrhagic stroke ( P <3.11×10 − 6 ). By meta-analyzing European ancestry samples in TOPMed and UK BioBank, we replicated several previously reported stroke loci including PITX2 , HDAC9 , ZFHX3 , and LRCH1 . Conclusions: We represent the first association analysis for stroke and its subtypes using whole-genome sequencing data from ancestrally diverse populations. While our findings suggest the potential benefits of combining whole-genome sequencing data with populations of diverse genetic backgrounds to identify possible low-frequency or ancestry-specific variants, they also highlight the need to increase genome coverage and sample sizes.

2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Nemat Hedayat-Evrigh ◽  
Reza Khalkhali-Evrigh ◽  
Mohammad Reza Bakhtiarizadeh

The population size of Bactrian camels is smaller than dromedary, and they are distributed in cold and mountain regions and are also at the risk of extinction in some countries such as Iran. To identify and investigate the genome-wide variations, whole-genome sequencing of two Iranian Bactrian camels were performed with 37.4- and 42.6-fold coverage for the first time. Along with Iranian Bactrian camels, sequencing data from two Mongolian domestic and two wild Bactrian camels deposited in the NCBI were reanalyzed. The analysis eventuated to the identification of 4,908,998, 4,485,725, and 4,706,654 SNPs for Iranian, Mongolian domestic, and wild Bactrian camels, respectively. Also, INDEL variations ranged from 358,311 to 533,188 in all six camels. Results of variants annotation in all samples revealed that more than 88 percent of SNPs and INDELs were located in the intergenic and intronic regions. We found that 800,530 SNPs were common among all studied camels, containing 4,046 missense variants that affected 2,428 genes. Investigation of common genes among all camels containing the missense SNPs showed that there are 98 zinc finger and 4 fertility-related genes (ZP1, ZP2, ZP4, and ZPBP) in this set.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Robert P. Adelson ◽  
Alan E. Renton ◽  
Wentian Li ◽  
Nir Barzilai ◽  
Gil Atzmon ◽  
...  

Abstract The success of next-generation sequencing depends on the accuracy of variant calls. Few objective protocols exist for QC following variant calling from whole genome sequencing (WGS) data. After applying QC filtering based on Genome Analysis Tool Kit (GATK) best practices, we used genotype discordance of eight samples that were sequenced twice each to evaluate the proportion of potentially inaccurate variant calls. We designed a QC pipeline involving hard filters to improve replicate genotype concordance, which indicates improved accuracy of genotype calls. Our pipeline analyzes the efficacy of each filtering step. We initially applied this strategy to well-characterized variants from the ClinVar database, and subsequently to the full WGS dataset. The genome-wide biallelic pipeline removed 82.11% of discordant and 14.89% of concordant genotypes, and improved the concordance rate from 98.53% to 99.69%. The variant-level read depth filter most improved the genome-wide biallelic concordance rate. We also adapted this pipeline for triallelic sites, given the increasing proportion of multiallelic sites as sample sizes increase. For triallelic sites containing only SNVs, the concordance rate improved from 97.68% to 99.80%. Our QC pipeline removes many potentially false positive calls that pass in GATK, and may inform future WGS studies prior to variant effect analysis.


Sign in / Sign up

Export Citation Format

Share Document