Trees, Population Structure, F-statistics!

Mapping Intimacies ◽

10.1101/028753 ◽

2015 ◽

Cited By ~ 1

Author(s):

Benjamin M Peter

Keyword(s):

Population Structure ◽

Genetic Drift ◽

Phylogenetic Trees ◽

Population Substructure ◽

Methodological Framework ◽

Genetic Theory ◽

Genetic History ◽

Branch Lengths ◽

Shared Genetic ◽

F Statistics

Many questions about human genetic history can be addressed by examining the patterns of shared genetic variation between sets of populations. A useful methodological framework for this purpose are F-statistics, that measure shared genetic drift between sets of two, three and four populations, and can be used to test simple and complex hypotheses about admixture between populations. Here, we put these statistics in context of phylogenetic and population genetic theory. We show how measures of genetic drift can be interpreted as branch lengths, paths through an admixture graph or in terms of the internal branches in coalescent trees. We show that the admixture tests can be interpreted as testing general properties of phylogenies, allowing us to generalize applications for arbitrary phylogenetic trees. Furthermore, we derive novel expressions for the F-statistics, which enables us to explore the behavior of F-statistic under population structure models. In particular, we show that population substructure may complicate inference.

Download Full-text

Integrating linguistics, social structure, and geography to model genetic diversity within India

10.1101/164640 ◽

2017 ◽

Author(s):

Aritra Bose ◽

Daniel E. Platt ◽

Laxmi Parida ◽

Petros Drineas ◽

Peristera Paschou

Keyword(s):

Genetic Diversity ◽

Social Structure ◽

Genetic Drift ◽

Social Stratification ◽

Indian Subcontinent ◽

Caste System ◽

Population Substructure ◽

Data Set ◽

Genetic Substructure ◽

Shared Genetic

AbstractIndia represents an intricate tapestry of population substructure shaped by geography, language, culture and social stratification. While geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to understand Indian population structure. To date, no study has attempted to model and evaluate how these factors have interacted to shape the patterns of genetic diversity within India. We merged all publicly available data from the Indian subcontinent into a data set of 891 individuals from 90 well-defined groups. Bringing together geography, genetics and demographic factors, we developed COGG (Correlation Optimization of Genetics and Geodemographics) to build a model that explains the observed population genetic substructure. We show that shared language along with social structure have been the most powerful forces in creating paths of gene flow in the subcontinent. Furthermore, we discover the ethnic groups that best capture the diverse genetic substructure highlighted by COGG. Integrating data from India with a data set of additional 1,323 individuals from 50 populations we find that Europeans show shared genetic drift with the Indo-European and Dravidian speakers of India, whereas the East Asians have the maximum shared genetic drift with Tibeto-Burman speaking tribal groups.

Download Full-text

Integrating Linguistics, Social Structure, and Geography to Model Genetic Diversity within India

Molecular Biology and Evolution ◽

10.1093/molbev/msaa321 ◽

2021 ◽

Author(s):

Aritra Bose ◽

Daniel E Platt ◽

Laxmi Parida ◽

Petros Drineas ◽

Peristera Paschou

Keyword(s):

Genetic Diversity ◽

Social Structure ◽

Genetic Drift ◽

Social Stratification ◽

Indian Subcontinent ◽

Caste System ◽

Population Substructure ◽

Shared Language ◽

Genetic Substructure ◽

Shared Genetic

Abstract India represents an intricate tapestry of population substructure shaped by geography, language, culture and social stratification. While geography closely correlates with genetic structure in other parts of the world, the strict endogamy imposed by the Indian caste system and the large number of spoken languages add further levels of complexity to understand Indian population structure. To date, no study has attempted to model and evaluate how these factors have interacted to shape the patterns of genetic diversity within India. We merged all publicly available data from the Indian subcontinent into a dataset of 891 individuals from 90 well-defined groups. Bringing together geography, genetics and demographic factors, we developed COGG (Correlation Optimization of Genetics and Geodemographics) to build a model that explains the observed population genetic substructure. We show that shared language along with social structure have been the most powerful forces in creating paths of gene flow in the subcontinent. Furthermore, we discover the ethnic groups that best capture the diverse genetic substructure using a ridge leverage score statistic. Integrating data from India with a dataset of additional 1,323 individuals from 50 Eurasian populations we find that Indo-European and Dravidian speakers of India show shared genetic drift with Europeans, whereas the Tibeto-Burman speaking tribal groups have maximum shared genetic drift with East Asians.

Download Full-text

Predicting the Impact of Describing New Species on Phylogenetic Patterns

Integrative Organismal Biology ◽

10.1093/iob/obz028 ◽

2019 ◽

Vol 1 (1) ◽

Cited By ~ 1

Author(s):

D C Blackburn ◽

G Giribet ◽

D E Soltis ◽

E L Stanley

Keyword(s):

New Species ◽

Phylogenetic Trees ◽

Branch Length ◽

Length Variation ◽

Tree Shape ◽

Branch Lengths ◽

Taxonomic History ◽

Ecological Patterns ◽

The Impact ◽

Incomplete Sampling

Abstract Although our inventory of Earth’s biodiversity remains incomplete, we still require analyses using the Tree of Life to understand evolutionary and ecological patterns. Because incomplete sampling may bias our inferences, we must evaluate how future additions of newly discovered species might impact analyses performed today. We describe an approach that uses taxonomic history and phylogenetic trees to characterize the impact of past species discoveries on phylogenetic knowledge using patterns of branch-length variation, tree shape, and phylogenetic diversity. This provides a framework for assessing the relative completeness of taxonomic knowledge of lineages within a phylogeny. To demonstrate this approach, we use recent large phylogenies for amphibians, reptiles, flowering plants, and invertebrates. Well-known clades exhibit a decline in the mean and range of branch lengths that are added each year as new species are described. With increased taxonomic knowledge over time, deep lineages of well-known clades become known such that most recently described new species are added close to the tips of the tree, reflecting changing tree shape over the course of taxonomic history. The same analyses reveal other clades to be candidates for future discoveries that could dramatically impact our phylogenetic knowledge. Our work reveals that species are often added non-randomly to the phylogeny over multiyear time-scales in a predictable pattern of taxonomic maturation. Our results suggest that we can make informed predictions about how new species will be added across the phylogeny of a given clade, thus providing a framework for accommodating unsampled undescribed species in evolutionary analyses.

Download Full-text

Microsatellite population structure of Newfoundland black bears (Ursus americanus hamiltoni)

Canadian Journal of Zoology ◽

10.1139/z11-056 ◽

2011 ◽

Vol 89 (9) ◽

pp. 831-839 ◽

Cited By ~ 3

Author(s):

H. Dawn Marshall ◽

Edward S. Yaskowiak ◽

Casidhe Dyke ◽

Elizabeth A. Perry

Keyword(s):

Population Structure ◽

Ursus Americanus ◽

Intraspecific Variability ◽

Black Bears ◽

Postglacial Colonization ◽

Correct Assignment ◽

Hair Samples ◽

Males And Females ◽

Two Populations ◽

F Statistics

We investigated population structure of black bears ( Ursus americanus hamiltoni Cameron, 1957) from insular Newfoundland using the microsatellite profiles of 12 loci from three broadly distributed areas (Northern, Baie Verte, and Bonavista peninsulas). Our goals were to revisit earlier findings of low heterozygosity in Newfoundland and increase knowledge of intraspecific variability in black bears, and make inferences about postglacial colonization and contemporary movements of island black bears. Ninety-three individuals (42 males) were identified among 543 hair samples: 21 from Bonavista, 25 from Northern Peninsula, and 47 from Baie Verte. Genetic diversity is relatively low (HE = 0.42) and decreases from northwest to southeast. Small but significant subpopulation differentiation revealed by F statistics is greatest between Northern and Baie Verte peninsulas; it is lower and comparable in the remaining pairwise comparisons. We hypothesize that postglacial colonization proceeded from the Northern Peninsula southeastward. Bears migrated from the Northern Peninsula to Baie Verte at some more distant time in the past, then diverged by genetic drift. More recently, migration occurred from these two populations to Bonavista, characterized by positive FIS indicative of admixture. Tests of biased dispersal and posterior probability of correct assignment to locality reveal contemporary movements of both males and females with historical dispersal attributable to males.

Download Full-text

Genetic variation in allozymes of western larch

Canadian Journal of Forest Research ◽

10.1139/x86-177 ◽

1986 ◽

Vol 16 (5) ◽

pp. 1013-1018 ◽

Cited By ~ 25

Author(s):

Lauren Fins ◽

Lisa W. Seeb

Keyword(s):

Genetic Variation ◽

Genetic Drift ◽

Fire History ◽

Glacial History ◽

Geographic Proximity ◽

Electrophoretic Variation ◽

Inland Empire ◽

Genetic History ◽

Expected Heterozygosity ◽

History Of

Seed samples from 19 stands of Larixoccidentalis Nutt. were analyzed for electrophoretic variation at 23 loci. Because sample sizes consisted of only 9 or 10 trees per stand (18–20 alleles per locus per stand), samples were grouped by geographic proximity into four larger samples. For all measures of variation, this species scored lower than most, but within the range observed for other western conifers. Most of the variation was found within rather than between the population groups. The single southern sample appeared to be genetically distinct from the others. Although some variation was observed between individual stand samples in expected heterozygosity, the consistently low values for all samples suggest that genetic drift has played a major role in the genetic history of the species in the Inland Empire, both through its glacial history in postulated refugia and through fire history in recent times.

Download Full-text

Genetic diversity, population structure, and effective population size in two yellow bat species in south Texas

PeerJ ◽

10.7717/peerj.10348 ◽

2020 ◽

Vol 8 ◽

pp. e10348

Author(s):

Austin S. Chipps ◽

Amanda M. Hale ◽

Sara P. Weaver ◽

Dean A. Williams

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

North America ◽

Wind Energy ◽

Population Size ◽

Effective Population Size ◽

Microsatellite Loci ◽

South Texas ◽

Population Substructure ◽

Effective Population

There are increasing concerns regarding bat mortality at wind energy facilities, especially as installed capacity continues to grow. In North America, wind energy development has recently expanded into the Lower Rio Grande Valley in south Texas where bat species had not previously been exposed to wind turbines. Our study sought to characterize genetic diversity, population structure, and effective population size in Dasypterus ega and D. intermedius, two tree-roosting yellow bats native to this region and for which little is known about their population biology and seasonal movements. There was no evidence of population substructure in either species. Genetic diversity at mitochondrial and microsatellite loci was lower in these yellow bat taxa than in previously studied migratory tree bat species in North America, which may be due to the non-migratory nature of these species at our study site, the fact that our study site is located at a geographic range end for both taxa, and possibly weak ascertainment bias at microsatellite loci. Historical effective population size (NEF) was large for both species, while current estimates of Ne had upper 95% confidence limits that encompassed infinity. We found evidence of strong mitochondrial differentiation between the two putative subspecies of D. intermedius (D. i. floridanus and D. i. intermedius) which are sympatric in this region of Texas, yet little differentiation using microsatellite loci. We suggest this pattern is due to secondary contact and hybridization and possibly incomplete lineage sorting at microsatellite loci. We also found evidence of some hybridization between D. ega and D. intermedius in this region of Texas. We recommend that our data serve as a starting point for the long-term genetic monitoring of these species in order to better understand the impacts of wind-related mortality on these populations over time.

Download Full-text

Genetic diversity and population structure of Hemibagrus guttatus (Bagridae, Siluriformes) in the larger subtropical Pearl River based on COI and Cyt b genes analysis

Annales de Limnologie - International Journal of Limnology ◽

10.1051/limn/2021005 ◽

2021 ◽

Vol 57 ◽

pp. 7

Author(s):

Tianxu Kuang ◽

Fangmin Shuai ◽

Xinhui Li ◽

Weitao Chen ◽

Sovan Lek

Keyword(s):

Genetic Diversity ◽

Population Structure ◽

Sustainable Use ◽

Haplotype Diversity ◽

Pearl River ◽

Germplasm Resources ◽

Geographical Populations ◽

Commercially Important ◽

The Pearl River ◽

F Statistics

Understanding the genetic diversity and population structure of fish species is crucial for the sustainable use and protection of fish germplasm resources. Hemibagrus guttatus (Bagridae, Siluriformes) is widely distributed in the large subtropical Pearl River (China) and is commercially important. It's population have been declining. The genetic diversity of wild H. guttatus is not clear, despite its important ecological significance. In this paper, genes mitochondrial cytochrome c oxidase subunit I (COI) and cytochrome b (Cyt b) were used to analyze the genetic structure of H. guttatus population collected from six geographical populations in the main streams of the Pearl River. The results showed that the nucleotide diversity (π) and haplotype diversity (Hd) of wild H. guttatus was low (π < 0.005; Hd < 0.5). In addition, H. guttatus haplotypes did not cluster into clades according to geographical distribution, as revealed by neighbor-joining tree analysis. Analysis of molecular variance analysis (AMOVA) and F-statistics (Fst) values showed high homogeneity among wild H. guttatus populations. Our results suggest that there is degradation in germplasm resources of H. guttatus that could destabilize the sustainable use of this species and there was an urgent need for conservation of this species in South China.

Download Full-text

Network Analysis of Linkage Disequilibrium Reveals Genome Architecture in Chum Salmon

G3 Genes|Genome|Genetics ◽

10.1534/g3.119.400972 ◽

2020 ◽

Vol 10 (5) ◽

pp. 1553-1561 ◽

Cited By ~ 8

Author(s):

Garrett McKinney ◽

Megan V. McPhee ◽

Carita Pascal ◽

James E. Seeb ◽

Lisa W. Seeb

Keyword(s):

Population Structure ◽

Linkage Disequilibrium ◽

Network Analysis ◽

Chum Salmon ◽

Sex Chromosome ◽

Natural Populations ◽

Population Substructure ◽

Chromosome Inversion ◽

Genomic Features ◽

Chromosome Inversions

Many studies exclude loci that exhibit linkage disequilibrium (LD); however, high LD can signal reduced recombination around genomic features such as chromosome inversions or sex-determining regions. Chromosome inversions and sex-determining regions are often involved in adaptation, allowing for the inheritance of co-adapted gene complexes and for the resolution of sexually antagonistic selection through sex-specific partitioning of genetic variants. Genomic features such as these can escape detection when loci with LD are removed; in addition, failing to account for these features can introduce bias to analyses. We examined patterns of LD using network analysis to identify an overlapping chromosome inversion and sex-determining region in chum salmon. The signal of the inversion was strong enough to show up as false population substructure when the entire dataset was analyzed, while the effect of the sex-determining region on population structure was only obvious after restricting analysis to the sex chromosome. Understanding the extent and geographic distribution of inversions is now a critically important part of genetic analyses of natural populations. Our results highlight the importance of analyzing and understanding patterns of LD in genomic dataset and the perils of excluding or ignoring loci exhibiting LD. Blindly excluding loci in LD would have prevented detection of the sex-determining region and chromosome inversion while failing to understand the genomic features leading to high-LD could have resulted in false interpretations of population structure.

Download Full-text

Population structure of five native sheep breeds of Sweden estimated with high density SNP genotypes

BMC Genetics ◽

10.1186/s12863-020-0827-8 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Christina Marie Rochus ◽

Elisabeth Jonas ◽

Anna M. Johansson

Keyword(s):

Population Structure ◽

Snp Array ◽

Principal Component ◽

Northern Sweden ◽

The Baltic Sea ◽

Sheep Breeds ◽

The North ◽

South West ◽

Branch Lengths ◽

The Baltic

Abstract Background Native Swedish sheep breeds are part of the North European short-tailed sheep group; characterized in part by their genetic uniqueness. Our objective was to study the population structure of native Swedish sheep. Five breeds were genotyped using the 600 K SNP array. Dalapäls and Klövsjö sheep are from the middle of Sweden; Gotland and Gute sheep from Gotland, an island in the Baltic Sea; and Fjällnäs sheep from northern Sweden. We studied population structure by: principal component analysis (PCA), cluster-based analysis of admixture, and an estimated population tree. Results The analyses of the five Swedish breeds revealed that these breeds are five distinct breeds, while Gute and Gotland are more closely related to each other as seen in all analyses. All breeds had long branch lengths in the population tree indicating they’ve been subjected to drift. We repeated our analyses using 39 K SNP and including 50 K SNP genotypes from other European and southwestern Asian breeds from the Sheep HapMap project and 600 K SNP genotypes from a dataset of French sheep. Results arranged breeds into five groups: south-west Asia, south-west Europe, central Europe, north Europe and north European short-tailed sheep. Within this last group, Norwegian and Icelandic breeds, Finn and Romanov sheep, Scottish breeds, and Gute and Gotland sheep were more closely related while the remaining Swedish breeds and Ouessant sheep were distinct from all breeds and had longer branches in the population tree. Conclusions We showed population structure of five Swedish breeds and their structure within European and southwestern Asian breeds. Swedish breeds are unique, distinct breeds that have been subjected to drift but group with other north European short-tailed sheep.

Download Full-text

In SilicoIdentification of Functional Protein Interfaces

Comparative and Functional Genomics ◽

10.1002/cfg.309 ◽

2003 ◽

Vol 4 (4) ◽

pp. 420-423 ◽

Cited By ~ 14

Author(s):

Rachel E. Bell ◽

Nir Ben-Tal

Keyword(s):

Phylogenetic Trees ◽

Large Scale ◽

Hypothetical Protein ◽

Evolutionary Information ◽

Structural Constraints ◽

Functional Protein ◽

Homologous Proteins ◽

Geometrical Properties ◽

Protein Interfaces ◽

Branch Lengths

Proteins perform many of their biological roles through protein–protein, protein–DNA or protein–ligand interfaces. The identification of the amino acids comprising these interfaces often enhances our understanding of the biological function of the proteins. Many methods for the detection of functional interfaces have been developed, and large-scale analyses have provided assessments of their accuracy. Among them are those that consider the size of the protein interface, its amino acid composition and its physicochemical and geometrical properties. Other methods to this effect use statistical potential functions of pairwise interactions, and evolutionary information. The rationale of the evolutionary approach is that functional and structural constraints impose selective pressure; hence, biologically important interfaces often evolve at a slower pace than do other external regions of the protein. Recently, an algorithm, Rate4Site, and a web-server, ConSurf (http://consurf.tau.ac.il/), for the identification of functional interfaces based on the evolutionary relations among homologous proteins as reflected in phylogenetic trees, were developed in our laboratory. The explicit use of the tree topology and branch lengths makes the method remarkably accurate and sensitive. Here we demonstrate its potency in the identification of the functional interfaces of a hypothetical protein, the structure of which was determined as part of the international structural genomics effort. Finally, we propose to combine complementary procedures, in order to enhance the overall performance of methods for the identification of functional interfaces in proteins.

Download Full-text