scholarly journals Inferring the ancestry of everyone

2018 ◽  
Author(s):  
Jerome Kelleher ◽  
Yan Wong ◽  
Patrick K. Albers ◽  
Anthony W. Wohns ◽  
Gil McVean

AbstractA central problem in evolutionary biology is to infer the full genealogical history of a set of DNA sequences. This history contains rich information about the forces that have influenced a sexually reproducing species. However, existing methods are limited: the most accurate is unable to cope with more than a few dozen samples. With modern genetic data sets rapidly approaching millions of genomes, there is an urgent need for efficient inference methods to exploit such rich resources. We introduce an algorithm to infer whole-genome history which has comparable accuracy to the state-of-the-art but can process around four orders of magnitude more sequences. Additionally, our method results in an “evolutionary encoding” of the original sequence data, enabling efficient access to genealogies and calculation of genetic statistics over the data. We apply this technique to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the genealogies we estimate are both rich in biological signal and efficient to process.

2017 ◽  
Author(s):  
B. Schaeffer ◽  
V. Nicolas ◽  
F. Austerlitz ◽  
C. Larédo

AbstractSeveral classes of methods have been proposed for inferring the history of populations from genetic polymorphism data. As connectivity is a key factor to explain the structure of populations, several graph-based methods have been developed to this aim, using population genetics data. Here we propose an original method based on graphical models that uses DNA sequences to provide relationships between populations. We tested our method on various simulated data sets, describing typical demographic scenarios, for different parameters values. We found that our method behaved noticeably well for realistic demographic evolutionary processes and recovered suitably the migration processes. Our method provides thus a complementary tool for investigating population history based on genetic material.


2017 ◽  
Author(s):  
Raúl Amado Cattáneo ◽  
Luis Diambra ◽  
Andrés Norman McCarthy

Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on the comparison of single DNA sequences, or a concatenation of a number of these. However, with the advent of next-generation DNA sequencing technologies, the approaches that consider large genomic data sets are of growing importance for the elucidation of evolutionary relationships among species. Among these approaches, the assembly and alignment-free methods which allow an efficient distance computation and phylogeny reconstruction are of great importance. However, it is not yet clear under what quality conditions and abundance of genomic data such methods are able to infer phylogenies accurately. In the present study we assess the method originally proposed by Fan et al. for whole genome data, in the elucidation of Tomatoes' chloroplast phylogenetics using short read sequences. We find that this assembly and alignment-free method is capable of reproducing previous results under conditions of high coverage, given that low frequency k-mers (i.e. error prone data) are effectively filter out. Finally, we present a complete chloroplast phylogeny for the best data quality candidates of the recently published 360 tomato genomes.


2016 ◽  
Vol 6 (12) ◽  
pp. 3927-3939 ◽  
Author(s):  
Xing-Xing Shen ◽  
Xiaofan Zhou ◽  
Jacek Kominek ◽  
Cletus P Kurtzman ◽  
Chris Todd Hittinger ◽  
...  

Abstract Understanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multilocus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing nine of the 11 known major lineages and 10 nonyeast fungal outgroups to generate a 1233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the nine major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. However, eight of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and the species Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.


Author(s):  
Haige Han ◽  
Kenneth Bryan ◽  
Wunierfu Shiraigol ◽  
Dongyi Bai ◽  
Yiping Zhao ◽  
...  

Abstract The Mongolian horse is one of the oldest extant horse populations and although domesticated, most animals are free-ranging and experience minimal human intervention. As an ancient population originating in one of the key domestication centers, the Mongolian horse may play a key role in understanding the origins and recent evolutionary history of horses. Here we describe an analysis of high-density genome-wide single-nucleotide polymorphism (SNP) data in 40 globally dispersed horse populations (n = 895). In particular, we have focused on new results from Chinese Mongolian horses (n = 100) that represent 5 distinct populations. These animals were genotyped for 670K SNPs and the data were analyzed in conjunction with 35K SNP data for 35 distinct breeds. Analyses of these integrated SNP data sets demonstrated that the Chinese Mongolian populations were genetically distinct from other modern horse populations. In addition, compared to other domestic horse breeds, the Chinese Mongolian horse populations exhibited relatively high genomic diversity. These results suggest that, in genetic terms, extant Chinese Mongolian horses may be the most similar modern populations to the animals originally domesticated in this region of Asia. Chinese Mongolian horse populations may therefore retain ancestral genetic variants from the earliest domesticates. Further genomic characterization of these populations in conjunction with archaeogenetic sequence data should be prioritized for understanding recent horse evolution and the domestication process that has led to the wealth of diversity observed in modern global horse breeds.


2017 ◽  
Author(s):  
Raúl Amado Cattáneo ◽  
Luis Diambra ◽  
Andrés Norman McCarthy

Phylogenetics and population genetics are central disciplines in evolutionary biology. Both are based on the comparison of single DNA sequences, or a concatenation of a number of these. However, with the advent of next-generation DNA sequencing technologies, the approaches that consider large genomic data sets are of growing importance for the elucidation of evolutionary relationships among species. Among these approaches, the assembly and alignment-free methods which allow an efficient distance computation and phylogeny reconstruction are of great importance. However, it is not yet clear under what quality conditions and abundance of genomic data such methods are able to infer phylogenies accurately. In the present study we assess the method originally proposed by Fan et al. for whole genome data, in the elucidation of Tomatoes' chloroplast phylogenetics using short read sequences. We find that this assembly and alignment-free method is capable of reproducing previous results under conditions of high coverage, given that low frequency k-mers (i.e. error prone data) are effectively filter out. Finally, we present a complete chloroplast phylogeny for the best data quality candidates of the recently published 360 tomato genomes.


Lankesteriana ◽  
2013 ◽  
Author(s):  
Rafael Arévalo ◽  
Kenneth M. Cameron

The Neotropical orchid genus Mormolyca Fenzl, as currently circumscribed, encompasses a diverse group of ca. 27species. Many of these were included traditionally in Maxillaria sect. Rufescens, when similarity of floral morphology was considered foremost in their classification rather than the evolutionary history of the taxa. In order to begin revising species delimitation and clarifying the evolution and biology of the genus, we present a phylogenetic hypothesis using sequence data from five plastid loci (rpoC1, matK gene and flanking trnK intron, atpB-rbcL intergenic spacer, and the 3’ portion of ycf1) and the nuclear ribosomal internal and external transcribed spacers (ITS, ETS). Resulting trees using both Bayesian and parsimony inference are congruent with each other, and generally well resolved. Based on current level of sampling across Maxillariinae, these molecular data support the monophyly of Mormolyca and shed light on the interspecific phylogenetic patterns within the genus. These include an early divergent paraphyletic grade of Mormolyca species successively sister to a clade with at least two definable subclades within. The latter are characterized by two different flower morphologies that are likely related to their pollination systems. Although not all relationships within the genus are fully resolved or supported, these results offer a first glimpse into the phylogeny of a small group of epiphytic orchids characterized by an unusually high level of variable vegetative characters, floral fragrance profiles, and pollination systems.


2016 ◽  
Author(s):  
Xing-Xing Shen ◽  
Xiaofan Zhou ◽  
Jacek Kominek ◽  
Cletus P. Kurtzman ◽  
Chris Todd Hittinger ◽  
...  

AbstractUnderstanding the phylogenetic relationships among the yeasts of the subphylum Saccharomycotina is a prerequisite for understanding the evolution of their metabolisms and ecological lifestyles. In the last two decades, the use of rDNA and multi-locus data sets has greatly advanced our understanding of the yeast phylogeny, but many deep relationships remain unsupported. In contrast, phylogenomic analyses have involved relatively few taxa and lineages that were often selected with limited considerations for covering the breadth of yeast biodiversity. Here we used genome sequence data from 86 publicly available yeast genomes representing 9 of the 11 major lineages and 10 non-yeast fungal outgroups to generate a 1,233-gene, 96-taxon data matrix. Species phylogenies reconstructed using two different methods (concatenation and coalescence) and two data matrices (amino acids or the first two codon positions) yielded identical and highly supported relationships between the 9 major lineages. Aside from the lineage comprised by the family Pichiaceae, all other lineages were monophyletic. Most interrelationships among yeast species were robust across the two methods and data matrices. However, 8 of the 93 internodes conflicted between analyses or data sets, including the placements of: the clade defined by species that have reassigned the CUG codon to encode serine, instead of leucine; the clade defined by a whole genome duplication; and of Ascoidea rubescens. These phylogenomic analyses provide a robust roadmap for future comparative work across the yeast subphylum in the disciplines of taxonomy, molecular genetics, evolutionary biology, ecology, and biotechnology. To further this end, we have also provided a BLAST server to query the 86 Saccharomycotina genomes, which can be found at http://y1000plus.org/blast.


2019 ◽  
Vol 4 (2) ◽  
pp. 108-123 ◽  
Author(s):  
Andrew M Ritchie ◽  
Simon Y W Ho

Abstract Bayesian phylogenetic methods derived from evolutionary biology can be used to reconstruct the history of human languages using databases of cognate words. These analyses have produced exciting results regarding the origins and dispersal of linguistic and cultural groups through prehistory. Bayesian lexical dating requires the specification of priors on all model parameters. This includes the use of a prior on divergence times, often combined with a prior on tree topology and referred to as a tree prior. Violation of the underlying assumptions of the tree prior can lead to an erroneous estimate of the timescale of language evolution. To investigate these impacts, we tested the sensitivity of Bayesian dating to the tree prior in analyses of four lexical data sets. Our results show that estimates of the origin times of language families are robust to the choice of tree prior for lexical data, though less so than when Bayesian phylogenetic methods are used to analyse genetic data sets. We also used the relative fit of speciation and coalescent tree priors to determine the ability of speciation models to describe language diversification at four different taxonomic levels. We found that speciation priors were preferred over a constant-size coalescent prior regardless of taxonomic scale. However, data sets with narrower taxonomic and geographic sampling exhibited a poorer fit to ideal birth–death model expectations. Our results encourage further investigation into the nature of language diversification at different sampling scales.


Phytotaxa ◽  
2019 ◽  
Vol 392 (1) ◽  
pp. 1
Author(s):  
GABRIEL F. GONÇALVES ◽  
ANNA VICTORIA S. R. MAUAD ◽  
GIULIANA TAQUES ◽  
ERIC C. SMIDT ◽  
FÁBIO DE BARROS

In order to evaluate the monophyly of the genus Orleanesia (Orchidaceae) and to assess its position within Laeliinae, a phylogenetic analysis was performed using molecular (nuclear ITS and plastid matK DNA sequences) and morphological data. A taxonomic revision of Orleanesia was also performed, with a description of the genus and its species using fresh living plants and 115 exsiccates from 31 herbaria. All phylogenetic analyses were highly congruent, and thus the sequence data from all three data sets were combined. The resulting phylogeny corroborated the monophyly of Orleanesia, with two strongly supported clades, and confirmed Caularthron as its sister group. Character analysis was not very informative due to a high degree of homoplasy. Two lectotypifications and three new synonyms were proposed for the genus, thereby reducing the number of accepted species to six. Although none of the species of Orleanesia are considered endangered, it is clear that some populations are threatened with deforestation and habitat reduction.


2017 ◽  
Author(s):  
Zvi Rosen ◽  
Anand Bhaskar ◽  
Sebastien Roch ◽  
Yun S. Song

AbstractThe sample frequency spectrum (SFS), which describes the distribution of mutant alleles in a sample of DNA sequences, is a widely used summary statistic in population genetics. The expected SFS has a strong dependence on the historical population demography and this property is exploited by popular statistical methods to infer complex demographic histories from DNA sequence data. Most, if not all, of these inference methods exhibit pathological behavior, however. Specifically, they often display runaway behavior in optimization, where the inferred population sizes and epoch durations can degenerate to 0 or diverge to infinity, and show undesirable sensitivity of the inferred demography to perturbations in the data. The goal of this paper is to provide theoretical insights into why such problems arise. To this end, we characterize the geometry of the expected SFS for piecewise-constant demographic histories and use our results to show that the aforementioned pathological behavior of popular inference methods is intrinsic to the geometry of the expected SFS. We provide explicit descriptions and visualizations for a toy model with sample size 4, and generalize our intuition to arbitrary sample sizes n using tools from convex and algebraic geometry. We also develop a universal characterization result which shows that the expected SFS of a sample of size n under an arbitrary population history can be recapitulated by a piecewise-constant demography with only κn epochs, where κn is between n/2 and 2n – 1. The set of expected SFS for piecewise-constant demographies with fewer than κn epochs is open and non-convex, which causes the above phenomena for inference from data.


Sign in / Sign up

Export Citation Format

Share Document