scholarly journals A spectral theory for Wright’s inbreeding coefficients and related quantities

PLoS Genetics ◽  
2021 ◽  
Vol 17 (7) ◽  
pp. e1009665
Author(s):  
Olivier François ◽  
Clément Gain

Wright’s inbreeding coefficient, FST, is a fundamental measure in population genetics. Assuming a predefined population subdivision, this statistic is classically used to evaluate population structure at a given genomic locus. With large numbers of loci, unsupervised approaches such as principal component analysis (PCA) have, however, become prominent in recent analyses of population structure. In this study, we describe the relationships between Wright’s inbreeding coefficients and PCA for a model of K discrete populations. Our theory provides an equivalent definition of FST based on the decomposition of the genotype matrix into between and within-population matrices. The average value of Wright’s FST over all loci included in the genotype matrix can be obtained from the PCA of the between-population matrix. Assuming that a separation condition is fulfilled and for reasonably large data sets, this value of FST approximates the proportion of genetic variation explained by the first (K − 1) principal components accurately. The new definition of FST is useful for computing inbreeding coefficients from surrogate genotypes, for example, obtained after correction of experimental artifacts or after removing adaptive genetic variation associated with environmental variables. The relationships between inbreeding coefficients and the spectrum of the genotype matrix not only allow interpretations of PCA results in terms of population genetic concepts but extend those concepts to population genetic analyses accounting for temporal, geographical and environmental contexts.

2020 ◽  
Author(s):  
Olivier François ◽  
Clément Gain

AbstractWright’s inbreeding coefficient, FST, is a fundamental measure in population genetics. Assuming a predefined population subdivision, this statistic is classically used to evaluate population structure at a given genomic locus. With large numbers of loci, unsupervised approaches such as principal component analysis (PCA) have, however, become prominent in recent analyses of population structure. In this study, we describe the relationships between Wright’s inbreeding coefficients and PCA for a model of K discrete populations. Our theory provides an equivalent definition of FST based on the decomposition of the genotype matrix into between and within-population matrices. Assuming that a separation condition is fulfilled, our main result states that the proportion of genetic variation explained by the first (K − 1) principal components can be accurately approximated by the average value of FST over all loci included in the genotype matrix. This equivalent definition of FST can be used to evaluate the fit of discrete population models to the data. It is also useful for computing inbreeding coefficients from surrogate genotypes, for example, obtained after correction of experimental artifacts or after removing genetic variation associated with environmental variables. The relationships between inbreeding coefficients and the spectrum of the genotype matrix not only allow interpretations of PCA results in terms of population genetic concepts but extend those concepts to population genetic analyses accounting for temporal, geographical and environmental contexts.


2021 ◽  
Author(s):  
Simon Frederick Lashmar ◽  
Carina Visser ◽  
Moses Okpeku ◽  
Farai Catherine Muchadeyi ◽  
Ntanganedzeni Olivia Mapholi ◽  
...  

Abstract In southern Africa, the Nguni cattle breed is classified as an indigenous and transboundary animal genetic resource that manifests unique adaptation abilities across distinct agroecological zones. The genetic integrity of various ecotypes is under potential threat due to both indiscriminate crossbreeding and uncontrolled inbreeding. The aim of this study was to assess the genetic diversity and autozygosity that exists both across countries (ES: eSwatini; SA: South Africa) and within-country (SA), between purebred stud animals (SA-S) and research herds (SA-R). Subsets of 96 ES, 96 SA-S and 96 SA-R genotyped for 40 930 common SNPs were used to study inbreeding, runs of homozygosity (ROH) and heterozygosity (ROHet) profiles as well as population structure. The highest proportion (0.513) of the 3 595 ROH was <4Mb in length, while the majority (0.560) of the 4 409 ROHet segments fell within the 0.5-1Mb length category. Inbreeding coefficients indicated low inbreeding (FROH range: 0.025 for SA-S to 0.029 for SA-R). Principal component (PCA) and population structure (K=5) analyses illustrated genomic distinctiveness between SA and ES populations, greater admixture for SA-R (mean±standard deviation proportion shared=0.631±0.353) compared to SA-S (mean±standard deviation proportion shared=0.741±0.123), and three subpopulations for ES. Overall, results illustrated that genetic distinctiveness in the Nguni resulted from both geographic isolation and exposure to different production strategies. Although no impending threat to genetic diversity was observed, further loss should be monitored to prevent endangerment of unique and beneficial indigenous resources.


2020 ◽  
Author(s):  
Zhenjie Zhang ◽  
Suhui Hu ◽  
Wentao Zhao ◽  
Yaqiong Guo ◽  
Na Li ◽  
...  

Abstract Background: Cryptosporidium parvum is a zoonotic pathogen worldwide. Extensive genetic diversity and complex population structures exist in C. parvum in different geographic regions and hosts. Unlike the IIa subtype family, which is responsible for most zoonotic C. parvum infections in industrialized countries, IId is identified as the dominant subtype family in farm animals, rodents, and humans in China. Thus far, the population genetic characteristics of IId subtypes in calves in China are not clear. Methods: In the present study, 46 C. parvum isolates from dairy and beef cattle in six provinces and regions, China were characterized using sequence analysis of eight genetic loci, including msc6-7 , rpgr , msc6-5 , dz-hrgp , chom3t , hsp70 , mucin1 , and gp60 . They belonged to three IId subtypes in the gp60 gene, including IIdA20G1 (n = 17), IIdA19G1 (n = 24) and IIdA15G1 (n = 5). The data generated were analyzed for population genetic structures of C. parvum using DnaSP and LIAN, and subpopulation structures using STRUCTURE, RAxML, Arlequin, GENALEX and Network. Results: Seventeen multilocus genotypes were identified. The results of linkage disequilibrium analysis indicated the presence of an epidemic genetic structure in the C. parvum IId population. When isolates of various geographical areas were treated as individual subpopulations, maximum likelihood inference of phylogeny, pairwise genetic distance analysis, sub-structure analysis, principal component analysis and network analysis all provided evidence for geographic segregation of subpopulations in Heilongjiang, Hebei, and Xinjiang. In contrast, isolates from Guangdong, Shanghai, and Jiangsu were genetically like each other. Conclusions: Data from the multilocus analysis have revealed a much higher genetic diversity of C. parvum than gp60 sequence analysis. Despite an epidemic population structure, there is apparent geographic segregation in C. parvum subpopulations within China.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Diercles Francisco Cardoso ◽  
Gerardo Alves Fernandes Júnior ◽  
Daiane Cristina Becker Scalez ◽  
Anderson Antonio Carvalho Alves ◽  
Ana Fabrícia Braga Magalhães ◽  
...  

Abstract Highlighting genomic profiles for geographically distinct subpopulations of the same breed may provide insights into adaptation mechanisms to different environments, reveal genomic regions divergently selected, and offer initial guidance to joint genomic analysis. Here, we characterized similarities and differences between the genomic patterns of Angus subpopulations, born and raised in Canada (N = 382) and Brazil (N = 566). Furthermore, we systematically scanned for selection signatures based on the detection of autozygosity islands common between the two subpopulations, and signals of divergent selection, via FST and varLD tests. The principal component analysis revealed a sub-structure with a close connection between the two subpopulations. The averages of genomic relationships, inbreeding coefficients, and linkage disequilibrium at varying genomic distances were rather similar across them, suggesting non-accentuated differences in overall genomic diversity. Autozygosity islands revealed selection signatures common to both subpopulations at chromosomes 13 (63.77–65.25 Mb) and 14 (22.81–23.57 Mb), which are notably known regions affecting growth traits. Nevertheless, further autozygosity islands along with FST and varLD tests unravel particular sites with accentuated population subdivision at BTAs 7 and 18 overlapping with known QTL and candidate genes of reproductive performance, thermoregulation, and resistance to infectious diseases. Our findings indicate overall genomic similarity between Angus subpopulations, with noticeable signals of divergent selection in genomic regions associated with the adaptation in different environments.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Tika B. Adhikari ◽  
Norman Muzhinji ◽  
Dennis Halterman ◽  
Frank J. Louws

AbstractEarly blight (EB) caused by Alternaria linariae or Alternaria solani and leaf blight (LB) caused by A. alternata are economically important diseases of tomato and potato. Little is known about the genetic diversity and population structure of these pathogens in the United States. A total of 214 isolates of A. alternata (n = 61), A. linariae (n = 96), and A. solani (n = 57) were collected from tomato and potato in North Carolina and Wisconsin and grouped into populations based on geographic locations and tomato varieties. We exploited 220 single nucleotide polymorphisms derived from DNA sequences of 10 microsatellite loci to analyse the population genetic structure between species and between populations within species and infer the mode of reproduction. High genetic variation and genotypic diversity were observed in all the populations analysed. The null hypothesis of the clonality test based on the index of association $$\left( {\overline{r}_{d} } \right)$$ r ¯ d was rejected, and equal frequencies of mating types under random mating were detected in some studied populations of Alternaria spp., suggesting that recombination can play an important role in the evolution of these pathogens. Most genetic differences were found between species, and the results showed three distinct genetic clusters corresponding to the three Alternaria spp. We found no evidence for clustering of geographic location populations or tomato variety populations. Analyses of molecular variance revealed high (> 85%) genetic variation within individuals in a population, confirming a lack of population subdivision within species. Alternaria linariae populations harboured more multilocus genotypes (MLGs) than A. alternata and A. solani populations and shared the same MLG between populations within a species, which was suggestive of gene flow and population expansion. Although both A. linariae and A. solani can cause EB on tomatoes and potatoes, these two species are genetically differentiated. Our results provide new insights into the evolution and structure of Alternaria spp. and can lead to new directions in optimizing management strategies to mitigate the impact of these pathogens on tomato and potato production in North Carolina and Wisconsin.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Rong Huang ◽  
Yu Wang ◽  
Kuan Li ◽  
Ying-Qiang Wang

Abstract Background There has always been controversy over whether clonal plants have lower genetic diversity than plants that reproduce sexually. These conflicts could be attributed to the fact that few studies have taken into account the mating system of sexually reproducing plants and their phylogenetic distance. Moreover, most clonal plants in these previous studies regularly produce sexual progeny. Here, we describe a study examining the levels of genetic diversity and differentiation within and between local populations of fully clonal Zingiber zerumbet at a microgeographical scale and compare the results with data for the closely related selfing Z. corallinum and outcrossing Z. nudicarpum. Such studies could disentangle the phylogenetic and sexually reproducing effect on genetic variation of clonal plants, and thus contribute to an improved understanding in the clonally reproducing effects on genetic diversity and population structure. Results The results revealed that the level of local population genetic diversity of clonal Z. zerumbet was comparable to that of outcrossing Z. nudicarpum and significantly higher than that of selfing Z. corallinum. However, the level of microgeographic genetic diversity of clonal Z. zerumbet is comparable to that of selfing Z. corallinum and even slightly higher than that of outcrossing Z. nudicarpum. The genetic differentiation among local populations of clonal Z. zerumbet was significantly lower than that of selfing Z. corallinum, but higher than that of outcrossing Z. nudicarpum. A stronger spatial genetic structure appeared within local populations of Z. zerumbet compared with selfing Z. corallinum and outcrossing Z. nudicarpum. Conclusions Our study shows that fully clonal plants are able not only to maintain a high level of within-population genetic diversity like outcrossing plants, but can also maintain a high level of microgeographic genetic diversity like selfing plant species, probably due to the accumulation of somatic mutations and absence of a capacity for sexual reproduction. We suggest that conservation strategies for the genetic diversity of clonal and selfing plant species should be focused on the protection of all habitat types, especially fragments within ecosystems, while maintenance of large populations is a key to enhance the genetic diversity of outcrossing species.


2015 ◽  
Vol 112 (26) ◽  
pp. E3441-E3450 ◽  
Author(s):  
David Mimno ◽  
David M. Blei ◽  
Barbara E. Engelhardt

Admixture models are a ubiquitous approach to capture latent population structure in genetic samples. Despite the widespread application of admixture models, little thought has been devoted to the quality of the model fit or the accuracy of the estimates of parameters of interest for a particular study. Here we develop methods for validating admixture models based on posterior predictive checks (PPCs), a Bayesian method for assessing the quality of fit of a statistical model to a specific dataset. We develop PPCs for five population-level statistics of interest: within-population genetic variation, background linkage disequilibrium, number of ancestral populations, between-population genetic variation, and the downstream use of admixture parameters to correct for population structure in association studies. Using PPCs, we evaluate the quality of the admixture model fit to four qualitatively different population genetic datasets: the population reference sample (POPRES) European individuals, the HapMap phase 3 individuals, continental Indians, and African American individuals. We found that the same model fitted to different genomic studies resulted in highly study-specific results when evaluated using PPCs, illustrating the utility of PPCs for model-based analyses in large genomic studies.


Author(s):  
Matthew R. Jones ◽  
Daniel E. Winkler ◽  
Rob Massatti

AbstractFunctional connectivity (i.e., the movement of individuals across a landscape) is essential for the maintenance of genetic variation and persistence of rare species. However, illuminating the processes influencing functional connectivity and ultimately translating this knowledge into management practice remains a fundamental challenge. Here, we combine various population structure analyses with pairwise, population-specific demographic modeling to investigate historical functional connectivity in Graham’s beardtongue (Penstemon grahamii), a rare plant narrowly distributed across a dryland region of the western US. While principal component and population structure analyses indicated an isolation-by-distance pattern of differentiation across the species’ range, spatial inferences of effective migration exposed an abrupt shift in population ancestry near the range center. To understand these seemingly conflicting patterns, we tested various models of historical gene flow and found evidence for recent admixture (~ 3400 generations ago) between populations near the range center. This historical perspective reconciles population structure patterns and suggests management efforts should focus on maintaining connectivity between these previously isolated lineages to promote the ongoing transfer of genetic variation. Beyond providing species-specific knowledge to inform management options, our study highlights how understanding demographic history may be critical to guide conservation efforts when interpreting population genetic patterns and inferring functional connectivity.


2019 ◽  
Author(s):  
Aman Agrawal ◽  
Alec M. Chiu ◽  
Minh Le ◽  
Eran Halperin ◽  
Sriram Sankararaman

AbstractPrincipal component analysis (PCA) is a key tool for understanding population structure and controlling for population stratification in genome-wide association studies (GWAS). With the advent of large-scale datasets of genetic variation, there is a need for methods that can compute principal components (PCs) with scalable computational and memory requirements. We present ProPCA, a highly scalable method based on a probabilistic generative model, which computes the top PCs on genetic variation data efficiently. We applied ProPCA to compute the top five PCs on genotype data from the UK Biobank, consisting of 488,363 individuals and 146,671 SNPs, in less than thirty minutes. Leveraging the population structure inferred by ProPCA within the White British individuals in the UK Biobank, we scanned for SNPs that are not well-explained by the PCs to identify several novel genome-wide signals of recent putative selection including missense mutations in RPGRIP1L and TLR4.Author SummaryPrincipal component analysis is a commonly used technique for understanding population structure and genetic variation. With the advent of large-scale datasets that contain the genetic information of hundreds of thousands of individuals, there is a need for methods that can compute principal components (PCs) with scalable computational and memory requirements. In this study, we present ProPCA, a highly scalable statistical method to compute genetic PCs efficiently. We systematically evaluate the accuracy and robustness of our method on large-scale simulated data and apply it to the UK Biobank. Leveraging the population structure inferred by ProPCA within the White British individuals in the UK Biobank, we identify several novel signals of putative recent selection.


2020 ◽  
Author(s):  
Zhenjie Zhang ◽  
Suhui Hu ◽  
Wentao Zhao ◽  
Yaqiong Guo ◽  
Na Li ◽  
...  

Abstract Background: Cryptosporidium parvum is a zoonotic pathogen worldwide. Extensive genetic diversity and complex population structures exist in C. parvum in different geographic regions and hosts. Unlike the IIa subtype family, which is responsible for most zoonotic C. parvum infections in industrialized countries, IId is identified as the dominant subtype family in farm animals, rodents, and humans in China. Thus far, the population genetic characteristics of IId subtypes in calves in China are not clear.Methods: In the present study, 46 C. parvum isolates from cattle in China were characterized using multilocus sequenceanalysis of eight genetic loci. They belonged to three IId subtypes in the 60 kDa glycoprotein (gp60) gene, including IIdA20G1 (n = 17), IIdA19G1 (n = 24) and IIdA15G1 (n = 5). The data generated were analyzed for population genetic structures of C. parvum using various tools.Results: Seventeen multilocus genotypes were identified. The results of linkage disequilibrium analysis indicated the presence of an epidemic genetic structure in the C. parvum IId population. When isolates of various geographical areas were treated as individual subpopulations, maximum likelihood inference of phylogeny, pairwise genetic distance analysis, sub-structure analysis, principal component analysis and network analysis all provided evidence for geographic segregation of subpopulations in Heilongjiang, Hebei, and Xinjiang. In contrast, isolates from Guangdong, Shanghai, and Jiangsu were genetically similar to each other.Conclusions: Data from the multilocus analysis have revealed a much higher genetic diversity of C. parvum than gp60 sequence analysis. Despite an epidemic population structure, there is apparent geographic segregation in C. parvum subpopulations within China.


Sign in / Sign up

Export Citation Format

Share Document