The USDA cucumber (Cucumis sativus L.) collection: genetic diversity, population structure, genome-wide association studies, and core collection development

AbstractA fundamental and important challenge in modern datasets of ever increasing dimensionality is variable selection, which has taken on renewed interest recently due to the growth of biological and medical datasets with complex, non-i.i.d. structures. Naïvely applying classical variable selection methods such as the Lasso to such datasets may lead to a large number of false discoveries. Motivated by genome-wide association studies in genetics, we study the problem of variable selection for datasets arising from multiple subpopulations, when this underlying population structure is unknown to the researcher. We propose a unified framework for sparse variable selection that adaptively corrects for population structure via a low-rank linear mixed model. Most importantly, the proposed method does not require prior knowledge of sample structure in the data and adaptively selects a covariance structure of the correct complexity. Through extensive experiments, we illustrate the effectiveness of this framework over existing methods. Further, we test our method on three different genomic datasets from plants, mice, and human, and discuss the knowledge we discover with our method.

Download Full-text

Multiplex Confounding Factor Correction for Genomic Association Mapping with Squared Sparse Linear Mixed Model

10.1101/228114 ◽

2017 ◽

Author(s):

Haohan Wang ◽

Xiang Liu ◽

Yunpeng Xiao ◽

Ming Xu ◽

Eric P. Xing

Keyword(s):

Population Structure ◽

Association Mapping ◽

Complex Traits ◽

Association Studies ◽

Phenotypic Variability ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Confounding Factors ◽

Genetic Loci ◽

Genome Wide

AbstractGenome-wide Association Study has presented a promising way to understand the association between human genomes and complex traits. Many simple polymorphic loci have been shown to explain a significant fraction of phenotypic variability. However, challenges remain in the non-triviality of explaining complex traits associated with multifactorial genetic loci, especially considering the confounding factors caused by population structure, family structure, and cryptic relatedness. In this paper, we propose a Squared-LMM (LMM2) model, aiming to jointly correct population and genetic confounding factors. We offer two strategies of utilizing LMM2 for association mapping: 1) It serves as an extension of univariate LMM, which could effectively correct population structure, but consider each SNP in isolation. 2) It is integrated with the multivariate regression model to discover association relationship between complex traits and multifactorial genetic loci. We refer to this second model as sparse Squared-LMM (sLMM2). Further, we extend LMM2/sLMM2 by raising the power of our squared model to the LMMn/sLMMn model. We demonstrate the practical use of our model with synthetic phenotypic variants generated from genetic loci of Arabidopsis Thaliana. The experiment shows that our method achieves a more accurate and significant prediction on the association relationship between traits and loci. We also evaluate our models on collected phenotypes and genotypes with the number of candidate genes that the models could discover. The results suggest the potential and promising usage of our method in genome-wide association studies.

Download Full-text

Genetic Diversity, Population Structure, and Andean Introgression in Brazilian Common Bean Cultivars after Half a Century of Genetic Breeding

Genes ◽

10.3390/genes11111298 ◽

2020 ◽

Vol 11 (11) ◽

pp. 1298

Author(s):

Caléo Panhoca de Almeida ◽

Jean Fausto de Carvalho Paulino ◽

Sérgio Augusto Morais Carbonell ◽

Alisson Fernando Chiorato ◽

Qijian Song ◽

...

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Common Bean ◽

Gene Diversity ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Genome Wide ◽

Association Mapping Approach

Brazil is the largest consumer and third highest producer of common beans (Phaseolus vulgaris L.) worldwide. Since the 1980s, the commercial Carioca variety has been the most consumed in Brazil, followed by Black and Special beans. The present study evaluates genetic diversity and population structure of 185 Brazilian common bean cultivars using 2827 high-quality single-nucleotide polymorphisms (SNPs). The Andean allelic introgression in the Mesoamerican accessions was investigated, and a Carioca panel was tested using an association mapping approach. The results distinguish the Mesoamerican from the Andean accessions, with a prevalence of Mesoamerican accessions (94.6%). When considering the commercial classes, low levels of genetic differentiation were seen, and the Carioca group showed the lowest genetic diversity. However, gain in gene diversity and allelic richness was seen for the modern Carioca cultivars. A set of 1060 ‘diagnostic SNPs’ that show alternative alleles between the pure Mesoamerican and Andean accessions were identified, which allowed the identification of Andean allelic introgression events and shows that there are putative introgression segments in regions enriched with resistance genes. Finally, genome-wide association studies revealed SNPs significantly associated with flowering time, pod maturation, and growth habit, showing that the Carioca Association Panel represents a powerful tool for crop improvements.

Download Full-text