Novel Design of Imputation-Enabled SNP Arrays for Breeding and Research Applications Supporting Multi-Species Hybridization

Array-based single nucleotide polymorphism (SNP) genotyping platforms have low genotype error and missing data rates compared to genotyping-by-sequencing technologies. However, design decisions used to create array-based SNP genotyping assays for both research and breeding applications are critical to their success. We describe a novel approach applicable to any animal or plant species for the design of cost-effective imputation-enabled SNP genotyping arrays with broad utility and demonstrate its application through the development of the Illumina Infinium Wheat Barley 40K SNP array Version 1.0. We show that the approach delivers high quality and high resolution data for wheat and barley, including when samples are jointly hybridised. The new array aims to maximally capture haplotypic diversity in globally diverse wheat and barley germplasm while minimizing ascertainment bias. Comprising mostly biallelic markers that were designed to be species-specific and single-copy, the array permits highly accurate imputation in diverse germplasm to improve the statistical power of genome-wide association studies (GWAS) and genomic selection. The SNP content captures tetraploid wheat (A- and B-genome) and Aegilops tauschii Coss. (D-genome) diversity and delineates synthetic and tetraploid wheat from other wheat, as well as tetraploid species and subgroups. The content includes SNP tagging key trait loci in wheat and barley, as well as direct connections to other genotyping platforms and legacy datasets. The utility of the array is enhanced through the web-based tool, Pretzel (https://plantinformatics.io/) which enables the content of the array to be visualized and interrogated interactively in the context of numerous genetic and genomic resources to be connected more seamlessly to research and breeding. The array is available for use by the international wheat and barley community.

Download Full-text

Novel design of imputation-enabled SNP arrays for breeding and research applications supporting multi-species hybridisation

10.1101/2021.08.03.454059 ◽

2021 ◽

Author(s):

Gabriel Keeble-Gagnère ◽

Raj Pasam ◽

Kerrie L Forrest ◽

Debbie Wong ◽

Hannah Robinson ◽

...

Keyword(s):

Statistical Power ◽

Snp Array ◽

Tetraploid Wheat ◽

Genotyping By Sequencing ◽

Cost Effective ◽

Single Copy ◽

Snp Genotyping ◽

D Genome ◽

High Resolution Data ◽

Haplotypic Diversity

Array-based SNP genotyping platforms have low genotype error and missing data rates compared to genotyping-by-sequencing technologies. However, design decisions used to create array-based SNP genotyping assays for both research and breeding applications are critical to their success. We describe a novel approach applicable to any animal or plant species for the design of cost-effective imputation-enabled SNP genotyping arrays with broad utility and demonstrate its application through the development of the Infinium Wheat Barley 40K SNP array. We show the approach delivers high-quality and high-resolution data for wheat and barley, including when samples are jointly hybridised. The new array aims to maximally capture haplotypic diversity in globally diverse wheat and barley germplasm while minimising ascertainment bias. Comprising mostly biallelic markers designed to be species-specific and single-copy, it permits highly accurate imputation in diverse germplasm to improve statistical power for GWAS and genomic selection. The SNP content captures tetraploid wheat (A- and B-genome) and Ae. tauschii (D-genome) diversity and delineates synthetic and tetraploid wheat from other wheats, as well as tetraploid species and subgroups. The content includes SNP tagging key trait loci in wheat and barley and that directly connect to other genotyping platforms and legacy datasets. The utility of the array is enhanced through the web-based tool Pretzel (https://plantinformatics.io/) which enables the array's content to be visualised and interrogated interactively in the context of numerous genetic and genomic resources to more seamlessly connect research and breeding. The array is available for use by the international wheat and barley community.

Download Full-text

Association Analysis of Candidate Variants in Admixed Brazilian Patients With Genetic Generalized Epilepsies

Frontiers in Genetics ◽

10.3389/fgene.2021.672304 ◽

2021 ◽

Vol 12 ◽

Author(s):

Felipe S. Kaibara ◽

Tânia K. de Araujo ◽

Patricia A. O. R. A. Araujo ◽

Marina K. M. Alvim ◽

Clarissa L. Yasuda ◽

...

Keyword(s):

Native Americans ◽

Statistical Power ◽

Association Studies ◽

Snp Array ◽

Absence Epilepsy ◽

Candidate Snps ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Clonic Seizures ◽

Candidate Regions

Genetic generalized epilepsies (GGEs) include well-established epilepsy syndromes with generalized onset seizures: childhood absence epilepsy, juvenile myoclonic epilepsy (JME), juvenile absence epilepsy (JAE), myoclonic absence epilepsy, epilepsy with eyelid myoclonia (Jeavons syndrome), generalized tonic–clonic seizures, and generalized tonic–clonic seizures alone. Genome-wide association studies (GWASs) and exome sequencing have identified 48 single-nucleotide polymorphisms (SNPs) associated with GGE. However, these studies were mainly based on non-admixed, European, and Asian populations. Thus, it remains unclear whether these results apply to patients of other origins. This study aims to evaluate whether these previous results could be replicated in a cohort of admixed Brazilian patients with GGE. We obtained SNP-array data from 87 patients with GGE, compared with 340 controls from the BIPMed public dataset. We could directly access genotypes of 17 candidate SNPs, available in the SNP array, and the remaining 31 SNPs were imputed using the BEAGLE v5.1 software. We performed an association test by logistic regression analysis, including the first five principal components as covariates. Furthermore, to expand the analysis of the candidate regions, we also interrogated 14,047 SNPs that flank the candidate SNPs (1 Mb). The statistical power was evaluated in terms of odds ratio and minor allele frequency (MAF) by the genpwr package. Differences in SNP frequencies between Brazilian and Europeans, sub-Saharan African, and Native Americans were evaluated by a two-proportion Z-test. We identified nine flanking SNPs, located on eight candidate regions, which presented association signals that passed the Bonferroni correction (rs12726617; rs9428842; rs1915992; rs1464634; rs6459526; rs2510087; rs9551042; rs9888879; and rs8133217; p-values <3.55e–06). In addition, the two-proportion Z-test indicates that the lack of association of the remaining candidate SNPs could be due to different genomic backgrounds observed in admixed Brazilians. This is the first time that candidate SNPs for GGE are analyzed in an admixed Brazilian population, and we could successfully replicate the association signals in eight candidate regions. In addition, our results provide new insights on how we can account for population structure to improve risk stratification estimation in admixed individuals.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Download Full-text

Statistical power and utility of meta-analysis methods for cross-phenotype genome-wide association studies

PLoS ONE ◽

10.1371/journal.pone.0193256 ◽

2018 ◽

Vol 13 (3) ◽

pp. e0193256 ◽

Cited By ~ 13

Author(s):

Zhaozhong Zhu ◽

Verneri Anttila ◽

Jordan W. Smoller ◽

Phil H. Lee

Keyword(s):

Statistical Power ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Analysis Methods ◽

Genome Wide

Download Full-text

Fidelity of SNP Array Genotyping Using Epstein Barr Virus-Transformed B-Lymphocyte Cell Lines: Implications for Genome-Wide Association Studies

PLoS ONE ◽

10.1371/journal.pone.0006915 ◽

2009 ◽

Vol 4 (9) ◽

pp. e6915 ◽

Cited By ~ 25

Author(s):

Joshua T. Herbeck ◽

Geoffrey S. Gottlieb ◽

Kim Wong ◽

Roger Detels ◽

John P. Phair ◽

...

Keyword(s):

Epstein Barr Virus ◽

B Lymphocyte ◽

Association Studies ◽

Snp Array ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Barr Virus ◽

Genome Wide ◽

Epstein Barr ◽

Lymphocyte Cell

Download Full-text

Statistical Power of Model Selection Strategies for Genome-Wide Association Studies

PLoS Genetics ◽

10.1371/journal.pgen.1000582 ◽

2009 ◽

Vol 5 (7) ◽

pp. e1000582 ◽

Cited By ~ 14

Author(s):

Zheyang Wu ◽

Hongyu Zhao

Keyword(s):

Model Selection ◽

Statistical Power ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Selection Strategies ◽

Genome Wide

Download Full-text

Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

Scientific Reports ◽

10.1038/srep36671 ◽

2016 ◽

Vol 6 (1) ◽

Cited By ~ 20

Author(s):

Bettina Mieth ◽

Marius Kloft ◽

Juan Antonio Rodríguez ◽

Sören Sonnenburg ◽

Robin Vobruba ◽

...

Keyword(s):

Machine Learning ◽

Hypothesis Testing ◽

Statistical Power ◽

Association Studies ◽

Multiple Hypothesis Testing ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Multiple Hypothesis ◽

Genome Wide

Download Full-text

The harmonic mean p-value for combining dependent tests

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1814092116 ◽

2019 ◽

Vol 116 (4) ◽

pp. 1195-1200 ◽

Cited By ~ 43

Author(s):

Daniel J. Wilson

Keyword(s):

Multiple Testing ◽

Statistical Power ◽

Scientific Discovery ◽

Association Studies ◽

Harmonic Mean ◽

P Value ◽

Genome Wide Association Studies ◽

Familywise Error Rate ◽

Significance Threshold ◽

Genome Wide

Analysis of “big data” frequently involves statistical comparison of millions of competing hypotheses to discover hidden processes underlying observed patterns of data, for example, in the search for genetic determinants of disease in genome-wide association studies (GWAS). Controlling the familywise error rate (FWER) is considered the strongest protection against false positives but makes it difficult to reach the multiple testing-corrected significance threshold. Here, I introduce the harmonic mean p-value (HMP), which controls the FWER while greatly improving statistical power by combining dependent tests using generalized central limit theorem. I show that the HMP effortlessly combines information to detect statistically significant signals among groups of individually nonsignificant hypotheses in examples of a human GWAS for neuroticism and a joint human–pathogen GWAS for hepatitis C viral load. The HMP simultaneously tests all ways to group hypotheses, allowing the smallest groups of hypotheses that retain significance to be sought. The power of the HMP to detect significant hypothesis groups is greater than the power of the Benjamini–Hochberg procedure to detect significant hypotheses, although the latter only controls the weaker false discovery rate (FDR). The HMP has broad implications for the analysis of large datasets, because it enhances the potential for scientific discovery.

Download Full-text

Gene-Based Nonparametric Testing of Interactions Using Distance Correlation Coefficient in Case-Control Association Studies

Genes ◽

10.3390/genes9120608 ◽

2018 ◽

Vol 9 (12) ◽

pp. 608

Author(s):

Yingjie Guo ◽

Chenxi Wu ◽

Maozu Guo ◽

Xiaoyan Liu ◽

Alon Keinan

Keyword(s):

Correlation Coefficient ◽

Statistical Power ◽

Association Studies ◽

Gene Interaction ◽

P Value ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Real World Data ◽

Distance Correlation ◽

The Difference

Among the various statistical methods for identifying gene–gene interactions in qualitative genome-wide association studies (GWAS), gene-based methods have recently grown in popularity because they confer advantages in both statistical power and biological interpretability. However, most of these methods make strong assumptions about the form of the relationship between traits and single-nucleotide polymorphisms, which result in limited statistical power. In this paper, we propose a gene-based method based on the distance correlation coefficient called gene-based gene-gene interaction via distance correlation coefficient (GBDcor). The distance correlation (dCor) is a measurement of the dependency between two random vectors with arbitrary, and not necessarily equal, dimensions. We used the difference in dCor in case and control datasets as an indicator of gene–gene interaction, which was based on the assumption that the joint distribution of two genes in case subjects and in control subjects should not be significantly different if the two genes do not interact. We designed a permutation-based statistical test to evaluate the difference between dCor in cases and controls for a pair of genes, and we provided the p-value for the statistic to represent the significance of the interaction between the two genes. In experiments with both simulated and real-world data, our method outperformed previous approaches in detecting interactions accurately.

Download Full-text

Gene Co-Expression Network Analysis Implicates microRNA Processing in Parkinson’s Disease Pathogenesis

Neurodegenerative Diseases ◽

10.1159/000490427 ◽

2018 ◽

Vol 18 (4) ◽

pp. 191-199 ◽

Cited By ~ 3

Author(s):

Jason A. Chen

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Network Analysis ◽

Statistical Power ◽

Association Studies ◽

Model Systems ◽

Specific Cell ◽

Genome Wide Association Studies ◽

Microrna Processing ◽

Highly Correlated

Background: Recent advances in genetics have provided insights into important inherited causes of Parkinson’s disease (PD), but the underlying biological mechanisms are still incompletely understood. Gene expression studies have pointed toward the dysregulation of neuroinflammation, mitochondrial function, and protein degradation pathways. Objective: We aimed to identify groups of dysregulated genes in PD. Methods: In order to increase statistical power and control for potential confounders, we re-analyzed transcriptomic data from PD patients and model systems, integrating additional genomic data using a systems biology approach. Using weighted gene co-expression network analysis, we partitioned genes into co-expressed modules. Results: One co-expression module, M13, had an expression trajectory that was highly correlated with PD, was not characterized by any specific cell type markers, and was enriched in PD genes identified by genome-wide association studies. Genes within M13 seemed to be related to global microRNA biogenesis, and DICER1 and AGO3 were highly connected within the module. The NUCKS1 gene, previously identified as part of the PARK16 locus, was also a hub gene within M13. Conclusion: These results suggest that microRNA processing and function may play a role in the pathogenesis of PD, and thus may represent a useful target for future drug development.

Download Full-text