evolutionary inference
Recently Published Documents


TOTAL DOCUMENTS

74
(FIVE YEARS 20)

H-INDEX

20
(FIVE YEARS 4)

2021 ◽  
Author(s):  
Tom W Ouellette ◽  
Philip Awadalla

Variant allele frequencies (VAF) encode ongoing evolution and subclonal selection in growing tumours. However, existing methods that utilize VAF information for cancer evolutionary inference are compressive, slow, or incorrectly specify the underlying cancer evolutionary dynamics. Here, we provide a proof-of-principle synthetic supervised learning method, TumE, that integrates simulated models of cancer evolution with Bayesian neural networks, to infer ongoing selection in bulk-sequenced single tumour biopsies. Analyses in synthetic and patient tumours shows that TumE significantly improves both accuracy and inference time per sample when detecting positive selection, deconvoluting selected subclonal populations, and estimating subclone frequency. Importantly, we show how transfer learning can leverage stored knowledge within TumE models for related evolutionary inference tasks — substantially reducing data and computational time for further model development and providing a library of recyclable deep learning models for the cancer evolution community. This extensible framework provides a foundation and future directions for harnessing progressive computational methods for the benefit of cancer genomics and, in turn, the cancer patient. TumE is publicly available for use at https://github.com/tomouellette/TumE.


2021 ◽  
Vol 13 (9) ◽  
Author(s):  
George Sangster ◽  
Jolanda A Luksenburg

Abstract Authentic DNA sequences are crucial for reliable evolutionary inference. Concerns about the identification of DNA sequences have been voiced several times in the past but few quantitative studies exist. Mitogenomes play important roles in phylogenetics, phylogeography, population genetics, and DNA identification. However, the large number of mitogenomes being published routinely, often in brief data papers, has raised questions about their authenticity. In this study, we quantify problematic mitogenomes of birds and their reusage in other papers. Of 1,876 complete or partial mitogenomes of birds published until January 1, 2020, the authenticity of 1,559 could be assessed with sequences of conspecifics. Of these, 78 (5.0%) were found to be problematic, including 45 curated reference sequences. Problems were due to misidentification (33), chimeras of two or three species (23), sequencing errors/numts (18), incorrect sequence assembly (1), mislabeling at GenBank but not in the final paper (2), or vice versa (1). The number of problematic mitogenomes has increased sharply since 2012. Worryingly, these problematic sequences have been reused 436 times in other papers, including 385 times in phylogenies. No less than 53% of all mitogenomic phylogenies/networks published until January 1, 2020 included at least one problematic mitogenome. Problematic mitogenomes have resulted in incorrect phylogenetic hypotheses and proposals for unwarranted taxonomic revision, and may have compromised comparative analyses and measurements of divergence times. Our results indicate that a major upgrade of quality control measures is warranted. We propose a comprehensive set of measures that may serve as a new standard for publishing mitogenome sequences.


2021 ◽  
Author(s):  
Austin Alves Varela ◽  
Sammy Cheng ◽  
John Haynes Werren

Angiotensin-converting enzyme 2 (ACE2) is the human cell receptor that the coronavirus SARS-CoV-2 binds to and uses to enter and infect human cells. COVID-19, the pandemic disease caused by the coronavirus, involves diverse pathologies beyond those of a respiratory disease, including micro-thrombosis (micro-clotting), cytokine storms, and inflammatory responses affecting many organ systems. Longer-term chronic illness can persist for many months, often well after the pathogen is no longer detected. A better understanding of the proteins that ACE2 interacts with can reveal information relevant to these disease manifestations and possible avenues for treatment. We have undertaken a different approach to predict candidate ACE2 interacting proteins which uses evolutionary inference to identify a set of mammalian proteins that "coevolve" with ACE2. The approach, called evolutionary rate correlation (ERC), detects proteins that show highly correlated evolutionary rates during mammalian evolution. Such proteins are candidates for biological interactions with the ACE2 receptor. The approach has uncovered a number of key ACE2 protein interactions of potential relevance to COVID-19 pathologies. Some proteins have previously been reported to be associated with severe COVID-19, but are not currently known to interact directly with ACE2, while additional predicted novel interactors with ACE2 are of potential relevance to the disease. Using reciprocal rankings of protein ERCs, we have identified strongly interconnected ACE2 associated protein networks relevant to COVID-19 pathologies. ACE2 has clear connections to coagulation pathway proteins, such as coagulation factor V and fibrinogen components FGG, FGB, and FGA, the latter possibly mediated through ACE2 connections to Clusterin (which clears misfolded extracellular proteins) and GPR141 (whose functions are relatively unknown). Additionally, ACE2 has connections to proteins involved in cytokine signaling and immune response (e.g. IFNAR2, XCR1, and TLR8), and to Androgen Receptor (AR). The ERC prescreening approach has also elucidated possible functions for previously uncharacterized proteins and possible additional functions for well-characterized ones. Suggested validation approaches for ERC predicted ACE2 interacting proteins are discussed. We propose that ACE2 has novel protein interactions that are disrupted during SARS-CoV-2 infection, contributing to the spectrum of COVID-19 pathologies.


Author(s):  
David Gil ◽  
Yeshayahu Shen

Metaphors, a ubiquitous feature of human language, reflect mappings from one conceptual domain onto another. Although founded on bidirectional relations of similarity, their linguistic expression is typically unidirectional, governed by conceptual hierarchies pertaining to abstractness, animacy and prototypicality. The unidirectional nature of metaphors is a product of various asymmetries characteristic of grammatical structure, in particular, those related to thematic role assignment. This paper argues that contemporary metaphor unidirectionality is the outcome of an evolutionary journey whose origin lies in an earlier bidirectionality. Invoking the Complexity Covariance Hypothesis governing the correlation of linguistic and socio-political complexity, the Evolutionary Inference Principle suggests that simpler linguistic structures are evolutionarily prior to more complex ones, and accordingly that bidirectional metaphors evolved at an earlier stage than unidirectional ones. This paper presents the results of an experiment comparing the degree of metaphor unidirectionality in two languages: Hebrew and Abui (spoken by some 16 000 people on the island of Alor in Indonesia). The results of the experiment show that metaphor unidirectionality is significantly higher in Hebrew than in Abui. Whereas Hebrew is a national language, Abui is a regional language of relatively low socio-political complexity. In accordance with the Evolutionary Inference Principle, the lower degree of metaphor unidirectionality of Abui may accordingly be reconstructed to an earlier stage in the evolution of language. The evolutionary journey from bidirectionality to unidirectionality in metaphors argued for here may be viewed as part of a larger package, whereby the development of grammatical complexity in various domains is driven by the incremental increases in socio-political complexity that characterize the course of human prehistory. This article is part of the theme issue ‘Reconstructing prehistoric languages’.


Author(s):  
David Gil

This paper proposes a Complexity Covariance Hypothesis, whereby linguistic complexity covaries with cultural and socio-political complexity, and argues for an Evolutionary Inference Principle, in accordance with which, in domains where linguistic complexity correlates positively with cultural/socio-political complexity, simpler linguistic structures are evolutionarily prior to their more complex counterparts. Applying this methodology in a case study, the covariance of linguistic and cultural/socio-political complexity is examined by means of a cross-linguistic survey of tense–aspect–mood (TAM) marking in a worldwide sample of 868 languages. A novel empirical finding emerges: all else being equal, languages from small language families tend to have optional TAM marking, while languages from large language families are more likely to exhibit obligatory TAM marking. Since optional TAM marking is simpler than obligatory TAM marking, it can, therefore, be inferred that optional TAM marking is evolutionarily prior to obligatory TAM marking: a living fossil. In conclusion, it is argued that the presence of obligatory TAM marking, correlated with the more highly grammaticalized expression of thematic-role assignment, is a reflection of a deeper property of grammatical organization, namely, the grammaticalization of predication. Thus, it is suggested that the development of agriculture and resulting demographic expansions, resulting in the emergence of large language families, are a driving force in the evolution of predication in human language. This article is part of the theme issue ‘Reconstructing prehistoric languages’.


2021 ◽  
Author(s):  
Kylie Chen ◽  
David Welch ◽  
Alexei J. Drummond

Single-cell sequencing provides a new way to explore the evolutionary history of cancers. Compared to traditional bulk sequencing, which samples multiple heterogeneous cells, single-cell sequencing isolates and amplifies genetic material from a single cell. The ability to isolate a single cell makes it ideal for evolutionary inference. However, single-cell data is more error-prone due to the limited genomic material available per cell. Previous work using single-cell data to reconstruct the evolutionary history of cancers has not been integrated with standard evolutionary models. Here, we present error and mutation models for evolutionary inference of single-cell data within a mature and extensible Bayesian framework, BEAST2. Our framework enables integration with biologically informative models such as relaxed molecular clocks and population dynamic models. We reconstruct the phylogenetic history for a myeloproliferative cancer patient and two colorectal cancer patients. We find that the estimated times of terminal splitting events are shifted forward in time compared to models which ignore errors. Furthermore, we estimate 50% - 70% of the evolutionary distance between samples can be explained by sequencing error. Our simulation studies show that ignoring errors leads to inaccurate estimates of divergence times, mutation parameters and population parameters. Our work opens the potential for integrative Bayesian models capable of combining multiple sources of data.


2021 ◽  
Author(s):  
Aparna Prasad ◽  
Eline D Lorenzen ◽  
Michael V Westbury

AbstractWhen a high-quality genome assembly of a target species is unavailable, an option to avoid the costly de novo assembly process is a mapping-based assembly. However, mapping shotgun data to a distant relative may lead to biased or erroneous evolutionary inference. Here, we used short-read data from a mammal and a bird species (beluga and rowi kiwi) to evaluate whether reference genome phylogenetic distance can impact downstream demographic (PSMC) and genetic diversity (heterozygosity, runs of homozygosity) analyses. We mapped to assemblies of species of varying phylogenetic distance (conspecific to genome-wide divergence of >7%), and de novo assemblies created using cross-species scaffolding. We show that while reference genome phylogenetic distance has an impact on demographic analyses, it is not pronounced until using a reference genome with >3% divergence from the target species. When mapping to cross-species scaffolded assemblies, we are unable to replicate the original beluga demographic analyses, but can with the rowi kiwi, presumably reflecting the more fragmented nature of the beluga assemblies. As for genetic diversity estimates, we find that increased phylogenetic distance has a pronounced impact; heterozygosity estimates deviate incrementally as phylogenetic distance increases. Moreover, runs of homozygosity are removed when mapping to any non-conspecific assembly. However, these biases can be reduced when mapping to a cross-species scaffolded assembly. Taken together, our results show that caution should be exercised when selecting the reference genome for mapping assemblies. Cross-species scaffolding may offer a way to avoid a costly, traditional de novo assembly, while still producing robust, evolutionary inference.


2021 ◽  
Author(s):  
Tom W. Ouellette ◽  
Jim Shaw ◽  
Philip Awadalla

AbstractQuantifying evolutionary change among viral genomes is an important clinical device to track critical adaptations geographically and temporally. We built image-based haplotype-guided evolutionary inference (ImHapE) to quantify adaptations in expanding populations of non-recombining SARS-CoV-2 genomes. By combining classic population genetic summaries with image-based deep learning methods, we show that different rates of positive selection are driving evolutionary fitness and dispersal of SARS-CoV-2 globally. A 1.35-fold increase in evolutionary fitness is observed within the UK, associated with expansion of both the B.1.177 and B.1.1.7 SARS-CoV-2 lineages.


2020 ◽  
Author(s):  
Mackenzie M. Johnson ◽  
Claus O. Wilke

AbstractIn many applications of evolutionary inference, a model of protein evolution needs to be fitted to the amino acid variation at individual sites in a multiple sequence alignment. Most existing models fall into one of two extremes: Either they provide a coarse-grained description that lacks biophysical realism (e.g. dN/dS models), or they require a large number of parameters to be fitted (e.g. mutation–selection models). Here, we ask whether a middle ground is possible: Can we obtain a realistic description of site-specific amino acid frequencies while severely restricting the number of free parameters in the model? We show that a distribution with a single free parameter can accurately capture the variation in amino acid frequency at most sites in an alignment, as long as we are willing to restrict our analysis to predicting amino acid frequencies by rank rather than by amino acid identity. This result holds equally well both in alignments of empirical protein sequences and of sequences evolved under a biophysically realistic all-atom force field. Our analysis reveals a near universal shape of the frequency distributions of amino acids. This insight has the potential to lead to new models of evolution that have both increased realism and a limited number of free parameters.


Sign in / Sign up

Export Citation Format

Share Document