Measuring genetic variation in the multi-ethnic Million Veteran Program (MVP)

AbstractThe Million Veteran Program (MVP), initiated by the Department of Veterans Affairs (VA), aims to collect consented biosamples from at least one million Veterans. Presently, blood samples have been collected from over 800,000 enrolled participants. The size and diversity of the MVP cohort, as well as the availability of extensive VA electronic health records make it a promising resource for precision medicine. MVP is conducting array-based genotyping to provide genome-wide scan of the entire cohort, in parallel with whole genome sequencing, methylation, and other omics assays. Here, we present the design and performance of MVP 1.0 custom Axiom® array, which was designed and developed as a single assay to be used across the multi-ethnic MVP cohort. A unified genetic quality control analysis was developed and conducted on an initial tranche of 485,856 individuals leading to a high-quality dataset of 459,777 unique individuals. 668,418 genetic markers passed quality control and showed high quality genotypes not only on common variants but also on rare variants. We confirmed the substantial ancestral diversity of MVP with nearly 30% non-European individuals, surpassing other large biobanks. We also demonstrated the quality of the MVP dataset by replicating established genetic associations with height in European Americans and African Americans ancestries. This current data set has been made available to approved MVP researchers for genome-wide association studies and other downstream analyses. Further data releases will be available for analysis as recruitment at the VA continues and the cohort expands both in size and diversity.

Download Full-text

Insights into the genetic basis of retinal detachment

Human Molecular Genetics ◽

10.1093/hmg/ddz294 ◽

2019 ◽

Vol 29 (4) ◽

pp. 689-702 ◽

Cited By ~ 2

Author(s):

Thibaud S Boutin ◽

David G Charteris ◽

Aman Chandra ◽

Susan Campbell ◽

Caroline Hayward ◽

...

Keyword(s):

Retinal Detachment ◽

Association Studies ◽

Genetic Correlations ◽

Self Report ◽

Cataract Operation ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genetic Associations ◽

Data Set ◽

Genome Wide

Abstract Retinal detachment (RD) is a serious and common condition, but genetic studies to date have been hampered by the small size of the assembled cohorts. In the UK Biobank data set, where RD was ascertained by self-report or hospital records, genetic correlations between RD and high myopia or cataract operation were, respectively, 0.46 (SE = 0.08) and 0.44 (SE = 0.07). These correlations are consistent with known epidemiological associations. Through meta-analysis of genome-wide association studies using UK Biobank RD cases (N = 3 977) and two cohorts, each comprising ~1 000 clinically ascertained rhegmatogenous RD patients, we uncovered 11 genome-wide significant association signals. These are near or within ZC3H11B, BMP3, COL22A1, DLG5, PLCE1, EFEMP2, TYR, FAT3, TRIM29, COL2A1 and LOXL1. Replication in the 23andMe data set, where RD is self-reported by participants, firmly establishes six RD risk loci: FAT3, COL22A1, TYR, BMP3, ZC3H11B and PLCE1. Based on the genetic associations with eye traits described to date, the first two specifically impact risk of a RD, whereas the last four point to shared aetiologies with macular condition, myopia and glaucoma. Fine-mapping prioritized the lead common missense variant (TYR S192Y) as causal variant at the TYR locus and a small set of credible causal variants at the FAT3 locus. The larger study size presented here, enabled by resources linked to health records or self-report, provides novel insights into RD aetiology and underlying pathological pathways.

Download Full-text

Impact of Pre and Post Variant Filtration Strategies on Imputation

10.21203/rs.3.rs-128366/v1 ◽

2020 ◽

Author(s):

Celine Charon ◽

Rodrigue Allodji ◽

Vincent Meyer ◽

Jean-François Deleuze

Keyword(s):

Quality Control ◽

Rare Variants ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Direct Effects ◽

Single Nucleotide Variants ◽

Single Nucleotide ◽

Genome Wide ◽

Conservative Post

Abstract Quality control methods for genome-wide association studies and fine mapping are commonly used for imputation, however, they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1,031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1,089 NCBI recorded individuals for additional validation.Without variant pre-filtration based on quality control (QC), we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) <0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). As a result, to maintain confidence and enough SNVs, we propose here a 2-step post-filtration approach to increase the number of very rare and rare variants compared to conservative post-filtration methods.

Download Full-text

Quality Control for Genome-Wide Association Studies

Methods in Molecular Biology - Genome-Wide Association Studies and Genomic Prediction ◽

10.1007/978-1-62703-447-0_5 ◽

2013 ◽

pp. 129-147 ◽

Cited By ~ 6

Author(s):

Cedric Gondro ◽

Seung Hwan Lee ◽

Hak Kyo Lee ◽

Laercio R. Porto-Neto

Keyword(s):

Quality Control ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text

GWASTools: an R/Bioconductor package for quality control and analysis of genome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/bts610 ◽

2012 ◽

Vol 28 (24) ◽

pp. 3329-3331 ◽

Cited By ~ 88

Author(s):

S. M. Gogarten ◽

T. Bhangale ◽

M. P. Conomos ◽

C. A. Laurie ◽

C. P. McHugh ◽

...

Keyword(s):

Quality Control ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Bioconductor Package ◽

Genome Wide

Download Full-text

Genome-wide association study of agronomic traits in bread wheat reveals novel putative alleles for future breeding programs

BMC Plant Biology ◽

10.1186/s12870-019-2165-4 ◽

2019 ◽

Vol 19 (1) ◽

Cited By ~ 11

Author(s):

Yousef Rahimi ◽

Mohammad Reza Bihamta ◽

Alireza Taleei ◽

Hadi Alipour ◽

Pär K. Ingvarsson

Keyword(s):

Bread Wheat ◽

Genome Wide Association Study ◽

Agronomic Traits ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Wheat Varieties ◽

Data Set ◽

Protein Coding ◽

Genome Wide

Abstract Background Identification of loci for agronomic traits and characterization of their genetic architecture are crucial in marker-assisted selection (MAS). Genome-wide association studies (GWAS) have increasingly been used as potent tools in identifying marker-trait associations (MTAs). The introduction of new adaptive alleles in the diverse genetic backgrounds may help to improve grain yield of old or newly developed varieties of wheat to balance supply and demand throughout the world. Landraces collected from different climate zones can be an invaluable resource for such adaptive alleles. Results GWAS was performed using a collection of 298 Iranian bread wheat varieties and landraces to explore the genetic basis of agronomic traits during 2016–2018 cropping seasons under normal (well-watered) and stressed (rain-fed) conditions. A high-quality genotyping by sequencing (GBS) dataset was obtained using either all original single nucleotide polymorphism (SNP, 10938 SNPs) or with additional imputation (46,862 SNPs) based on W7984 reference genome. The results confirm that the B genome carries the highest number of significant marker pairs in both varieties (49,880, 27.37%) and landraces (55,086, 28.99%). The strongest linkage disequilibrium (LD) between pairs of markers was observed on chromosome 2D (0.296). LD decay was lower in the D genome, compared to the A and B genomes. Association mapping under two tested environments yielded a total of 313 and 394 significant (−log10P >3) MTAs for the original and imputed SNP data sets, respectively. Gene ontology results showed that 27 and 27.5% of MTAs of SNPs in the original set were located in protein-coding regions for well-watered and rain-fed conditions, respectively. While, for the imputed data set 22.6 and 16.6% of MTAs represented in protein-coding genes for the well-watered and rain-fed conditions, respectively. Conclusions Our finding suggests that Iranian bread wheat landraces harbor valuable alleles that are adaptive under drought stress conditions. MTAs located within coding genes can be utilized in genome-based breeding of new wheat varieties. Although imputation of missing data increased the number of MTAs, the fraction of these MTAs located in coding genes were decreased across the different sub-genomes.

Download Full-text

Genetic Associations With Serum Total IgE: A Meta-Analysis Of Genome-Wide Association Studies From North American Population Groups

10.1164/ajrccm-conference.2012.185.1_meetingabstracts.a4893 ◽

2012 ◽

Author(s):

Albert M. Levin ◽

Rasika A. Mathias ◽

Lili Huang ◽

Mao Yang ◽

Kathleen C. Barnes ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

North American Population ◽

Genome Wide Association Studies ◽

American Population ◽

Genetic Associations ◽

Population Groups ◽

Genome Wide ◽

Serum Total Ige

Download Full-text

Genetic associations with radiological damage in rheumatoid arthritis: Meta-analysis of seven genome-wide association studies of 2,775 cases

PLoS ONE ◽

10.1371/journal.pone.0223246 ◽

2019 ◽

Vol 14 (10) ◽

pp. e0223246 ◽

Cited By ~ 2

Author(s):

Matthew Traylor ◽

Rachel Knevel ◽

Jing Cui ◽

John Taylor ◽

Westra Harm-Jan ◽

...

Keyword(s):

Rheumatoid Arthritis ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Genome Wide ◽

Radiological Damage

Download Full-text

Animal-ImputeDB: a comprehensive database with multiple animal reference panels for genotype imputation

Nucleic Acids Research ◽

10.1093/nar/gkz854 ◽

2019 ◽

Vol 48 (D1) ◽

pp. D659-D667 ◽

Cited By ~ 2

Author(s):

Wenqian Yang ◽

Yanbo Yang ◽

Cecheng Zhao ◽

Kun Yang ◽

Dongyang Wang ◽

...

Keyword(s):

Large Scale ◽

Association Studies ◽

Genotype Imputation ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

High Quality ◽

Single Nucleotide ◽

Genome Wide ◽

Whole Genome Resequencing ◽

Missing Genotypes

Abstract Animal-ImputeDB (http://gong_lab.hzau.edu.cn/Animal_ImputeDB/) is a public database with genomic reference panels of 13 animal species for online genotype imputation, genetic variant search, and free download. Genotype imputation is a process of estimating missing genotypes in terms of the haplotypes and genotypes in a reference panel. It can effectively increase the density of single nucleotide polymorphisms (SNPs) and thus can be widely used in large-scale genome-wide association studies (GWASs) using relatively inexpensive and low-density SNP arrays. However, most animals except humans lack high-quality reference panels, which greatly limits the application of genotype imputation in animals. To overcome this limitation, we developed Animal-ImputeDB, which is dedicated to collecting genotype data and whole-genome resequencing data of nonhuman animals from various studies and databases. A computational pipeline was developed to process different types of raw data to construct reference panels. Finally, 13 high-quality reference panels including ∼400 million SNPs from 2265 samples were constructed. In Animal-ImputeDB, an easy-to-use online tool consisting of two popular imputation tools was designed for the purpose of genotype imputation. Collectively, Animal-ImputeDB serves as an important resource for animal genotype imputation and will greatly facilitate research on animal genomic selection and genetic improvement.

Download Full-text

A New Diversity Panel for Winter Rapeseed (Brassica napus L.) Genome-Wide Association Studies

Agronomy ◽

10.3390/agronomy10122006 ◽

2020 ◽

Vol 10 (12) ◽

pp. 2006

Author(s):

David P. Horvath ◽

Michael Stamm ◽

Zahirul I. Talukder ◽

Jason Fiedler ◽

Aidan P. Horvath ◽

...

Keyword(s):

Linkage Disequilibrium ◽

Brassica Napus ◽

Association Studies ◽

Decay Rates ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

High Quality ◽

Brassica Napus L ◽

Genome Wide ◽

Quality Markers

A diverse population (429 member) of canola (Brassica napus L.) consisting primarily of winter biotypes was assembled and used in genome-wide association studies. Genotype by sequencing analysis of the population identified and mapped 290,972 high-quality markers ranging from 18.5 to 82.4% missing markers per line and an average of 36.8%. After interpolation, 251,575 high-quality markers remained. After filtering for markers with low minor allele counts (count > 5), we were left with 190,375 markers. The average distance between these markers is 4463 bases with a median of 69 and a range from 1 to 281,248 bases. The heterozygosity among the imputed population ranges from 0.9 to 11.0% with an average of 5.4%. The filtered and imputed dataset was used to determine population structure and kinship, which indicated that the population had minimal structure with the best K value of 2–3. These results also indicated that the majority of the population has substantial sequence from a single population with sub-clusters of, and admixtures with, a very small number of other populations. Analysis of chromosomal linkage disequilibrium decay ranged from ~7 Kb for chromosome A01 to ~68 Kb for chromosome C01. Local linkage decay rates determined for all 500 kb windows with a 10kb sliding step indicated a wide range of linkage disequilibrium decay rates, indicating numerous crossover hotspots within this population, and provide a resource for determining the likely limits of linkage disequilibrium from any given marker in which to identify candidate genes. This population and the resources provided here should serve as helpful tools for investigating genetics in winter canola.

Download Full-text

Genome-wide association studies: quality control and population-based measures

Genetic Epidemiology ◽

10.1002/gepi.20472 ◽

2009 ◽

Vol 33 (S1) ◽

pp. S45-S50 ◽

Cited By ~ 27

Author(s):

Andreas Ziegler

Keyword(s):

Quality Control ◽

Association Studies ◽

Population Based ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide

Download Full-text