The Interaction between Base Compositional Heterogeneity and Among-Site Rate Variation in Models of Molecular Evolution

ISRN Evolutionary Biology ◽

10.5402/2013/391561 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Nathan C. Sheffield

Keyword(s):

Bayesian Inference ◽

Molecular Evolution ◽

Phylogenetic Inference ◽

Heterogeneous Data ◽

Rate Variation ◽

Compositional Heterogeneity ◽

Heterogeneous Datasets ◽

Compositional Homogeneity

Many commonly used models of molecular evolution assume homogeneous nucleotide frequencies. A deviation from this assumption has been shown to cause problems for phylogenetic inference. However, some claim that only extreme heterogeneity affects phylogenetic accuracy and suggest that violations of other model assumptions, such as variable rates among sites, are more problematic. In order to explore the interaction between compositional heterogeneity and variable rates among sites, I reanalyzed 3 real heterogeneous datasets using several models. My Bayesian inference recovers accurate topologies under variable rates-among-sites models, but fails under some models that account for compositional heterogeneity. I also ran simulations and found that accounting for rates among sites improves topology accuracy in compositionally heterogeneous data. This indicates that in some cases, models accounting for among-site rate variation can improve outcomes for data that violates the assumption of compositional homogeneity.

Download Full-text

When phylogenetic assumptions are violated: base compositional heterogeneity and among-site rate variation in beetle mitochondrial phylogenomics

Systematic Entomology ◽

10.1111/j.1365-3113.2009.00517.x ◽

2010 ◽

Vol 35 (3) ◽

pp. 429-448 ◽

Cited By ~ 92

Author(s):

HOJUN SONG ◽

NATHAN C. SHEFFIELD ◽

STEPHEN L. CAMERON ◽

KELLY B. MILLER ◽

MICHAEL F. WHITING

Keyword(s):

Rate Variation ◽

Compositional Heterogeneity

Download Full-text

Ribosomal DNA: Molecular Evolution and Phylogenetic Inference

The Quarterly Review of Biology ◽

10.1086/417338 ◽

1991 ◽

Vol 66 (4) ◽

pp. 411-453 ◽

Cited By ~ 1486

Author(s):

David M. Hillis ◽

Michael T. Dixon

Keyword(s):

Molecular Evolution ◽

Ribosomal Dna ◽

Phylogenetic Inference

Download Full-text

Rates and patterns of molecular evolution in marine animals following the Isthmian emergence

The Paleontological Society Special Publications ◽

10.1017/s2475262200006286 ◽

1992 ◽

Vol 6 ◽

pp. 68-68

Author(s):

Timothy Collins

Keyword(s):

Molecular Evolution ◽

Time Scale ◽

Published Data ◽

Rate Variation ◽

Marine Animals ◽

Sequence Comparisons ◽

Western Atlantic ◽

Taxonomic Groups ◽

A Site ◽

Rates Of Molecular Evolution

The marine vicariant event resulting from the Pliocene emergence of the Central American Isthmus presents a unique opportunity for calibrating rates of molecular evolution. The synchronous fragmentation of the ranges of previously widespread taxa into Western Atlantic and Eastern Pacific components (geminates) enables one to make comparisons of rates among higher taxa on the same time scale and to evaluate the regularity of rates of molecular evolution among all species sampled. Other advantages of this approach are that the time scale (approximately 3 Ma) is one of particular interest for evolutionary biologists concerned with speciation and one that minimizes the ambiguities associated with augmentation of divergence values to account for multiple hits at a site. The divergence values derived for geminate pairs are independent, allowing statistical evaluation of variance in rates.The current popularity of the relative rates test as the final arbiter of questions regarding rates and rate variation is primarily a matter of convenience and not a reflection of methodological superiority. A review of the commonly used techniques for calibrating rates of molecular evolution shows that each approach has limitations. Temporally based calibrations of rates are necessary complements to time-independent comparisons.Interpretation of transisthmian molecular comparisons in the literature have in many cases been unduly influenced and confused by molecular clock assumptions and the restriction of studies to single higher-level taxa. Analysis of the apparently contradictory published data as well as new results from sequence comparisons of fishes, urchins and snails suggests a synthesis: taxon specific rates of molecular evolution, with reduced variance within taxonomic groups and great variance among all groups sampled.

Download Full-text

Bayesian inference for finite mixtures of univariate and multivariate skew-normal and skew-t distributions

Biostatistics ◽

10.1093/biostatistics/kxp062 ◽

2010 ◽

Vol 11 (2) ◽

pp. 317-336 ◽

Cited By ~ 97

Author(s):

Sylvia Frühwirth-Schnatter ◽

Saumyadipta Pyne

Keyword(s):

Bayesian Inference ◽

Mixture Models ◽

Random Effects ◽

Data Augmentation ◽

Heterogeneous Data ◽

High Dimensional ◽

Finite Mixtures ◽

Truncated Normal ◽

T Distribution ◽

Skew Normal

Abstract Skew-normal and skew-t distributions have proved to be useful for capturing skewness and kurtosis in data directly without transformation. Recently, finite mixtures of such distributions have been considered as a more general tool for handling heterogeneous data involving asymmetric behaviors across subpopulations. We consider such mixture models for both univariate as well as multivariate data. This allows robust modeling of high-dimensional multimodal and asymmetric data generated by popular biotechnological platforms such as flow cytometry. We develop Bayesian inference based on data augmentation and Markov chain Monte Carlo (MCMC) sampling. In addition to the latent allocations, data augmentation is based on a stochastic representation of the skew-normal distribution in terms of a random-effects model with truncated normal random effects. For finite mixtures of skew normals, this leads to a Gibbs sampling scheme that draws from standard densities only. This MCMC scheme is extended to mixtures of skew-t distributions based on representing the skew-t distribution as a scale mixture of skew normals. As an important application of our new method, we demonstrate how it provides a new computational framework for automated analysis of high-dimensional flow cytometric data. Using multivariate skew-normal and skew-t mixture models, we could model non-Gaussian cell populations rigorously and directly without transformation or projection to lower dimensions.

Download Full-text

The genome as a life-history character: why rate of molecular evolution varies between mammal species

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2011.0014 ◽

2011 ◽

Vol 366 (1577) ◽

pp. 2503-2513 ◽

Cited By ~ 121

Author(s):

Lindell Bromham

Keyword(s):

Life History ◽

Molecular Evolution ◽

Dna Sequences ◽

Small Body ◽

Rate Variation ◽

Mammal Species ◽

Range Restriction ◽

High Fecundity ◽

Rate Of Molecular Evolution ◽

The Impact

DNA sequences evolve at different rates in different species. This rate variation has been most closely examined in mammals, revealing a large number of characteristics that can shape the rate of molecular evolution. Many of these traits are part of the mammalian life-history continuum: species with small body size, rapid generation turnover, high fecundity and short lifespans tend to have faster rates of molecular evolution. In addition, rate of molecular evolution in mammals might be influenced by behaviour (such as mating system), ecological factors (such as range restriction) and evolutionary history (such as diversification rate). I discuss the evidence for these patterns of rate variation, and the possible explanations of these correlations. I also consider the impact of these systematic patterns of rate variation on the reliability of the molecular date estimates that have been used to suggest a Cretaceous radiation of modern mammals, before the final extinction of the dinosaurs.

Download Full-text

Rate variation during molecular evolution: creationism and the cytochrome c molecular clock

Evolution Education and Outreach ◽

10.1186/s12052-017-0064-4 ◽

2017 ◽

Vol 10 (1) ◽

Cited By ~ 1

Author(s):

James R. Hofmann

Keyword(s):

Molecular Evolution ◽

Cytochrome C ◽

Molecular Clock ◽

Rate Variation

Download Full-text

Diffusion enables integration of heterogeneous data and user-driven learning in a desktop knowledge-base

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009283 ◽

2021 ◽

Vol 17 (8) ◽

pp. e1009283

Author(s):

Tomasz Konopka ◽

Sandra Ng ◽

Damian Smedley

Keyword(s):

Knowledge Base ◽

Nearest Neighbor ◽

Heterogeneous Data ◽

Data Types ◽

Specific Knowledge ◽

Domain Specific ◽

Heterogeneous Datasets ◽

Domain Specific Knowledge ◽

High Throughput Experiments ◽

Gene Symbols

Integrating reference datasets (e.g. from high-throughput experiments) with unstructured and manually-assembled information (e.g. notes or comments from individual researchers) has the potential to tailor bioinformatic analyses to specific needs and to lead to new insights. However, developing bespoke analysis pipelines from scratch is time-consuming, and general tools for exploring such heterogeneous data are not available. We argue that by treating all data as text, a knowledge-base can accommodate a range of bioinformatic data types and applications. We show that a database coupled to nearest-neighbor algorithms can address common tasks such as gene-set analysis as well as specific tasks such as ontology translation. We further show that a mathematical transformation motivated by diffusion can be effective for exploration across heterogeneous datasets. Diffusion enables the knowledge-base to begin with a sparse query, impute more features, and find matches that would otherwise remain hidden. This can be used, for example, to map multi-modal queries consisting of gene symbols and phenotypes to descriptions of diseases. Diffusion also enables user-driven learning: when the knowledge-base cannot provide satisfactory search results in the first instance, users can improve the results in real-time by adding domain-specific knowledge. User-driven learning has implications for data management, integration, and curation.

Download Full-text

On the shoulder of giants: Mitogenome recovery from non‐targeted genome projects for phylogenetic inference and molecular evolution studies

Journal of Zoological Systematics & Evolutionary Research ◽

10.1111/jzs.12415 ◽

2020 ◽

Vol 59 (1) ◽

pp. 5-30

Author(s):

Silvia Adrián‐Serrano ◽

Jesus Lozano‐Fernandez ◽

Joan Pons ◽

Julio Rozas ◽

Miquel A. Arnedo

Keyword(s):

Molecular Evolution ◽

Phylogenetic Inference ◽

Genome Projects

Download Full-text

Evaluation of k-nearest neighbour classifier performance for heterogeneous data sets

SN Applied Sciences ◽

10.1007/s42452-019-1356-9 ◽

2019 ◽

Vol 1 (12) ◽

Cited By ~ 10

Author(s):

Najat Ali ◽

Daniel Neagu ◽

Paul Trundle

Keyword(s):

Binary Data ◽

Similarity Measures ◽

Numerical Data ◽

Test Sample ◽

Heterogeneous Data ◽

Data Sets ◽

Nearest Neighbour ◽

Classification Problems ◽

Heterogeneous Datasets ◽

Nearest Neighbour Classifier

Abstract Distance-based algorithms are widely used for data classification problems. The k-nearest neighbour classification (k-NN) is one of the most popular distance-based algorithms. This classification is based on measuring the distances between the test sample and the training samples to determine the final classification output. The traditional k-NN classifier works naturally with numerical data. The main objective of this paper is to investigate the performance of k-NN on heterogeneous datasets, where data can be described as a mixture of numerical and categorical features. For the sake of simplicity, this work considers only one type of categorical data, which is binary data. In this paper, several similarity measures have been defined based on a combination between well-known distances for both numerical and binary data, and to investigate k-NN performances for classifying such heterogeneous data sets. The experiments used six heterogeneous datasets from different domains and two categories of measures. Experimental results showed that the proposed measures performed better for heterogeneous data than Euclidean distance, and that the challenges raised by the nature of heterogeneous data need personalised similarity measures adapted to the data characteristics.

Download Full-text

Unusually Expanded SSU Ribosomal DNA of Primary Osmotrophic Euglenids: Molecular Evolution and Phylogenetic Inference

Journal of Molecular Evolution ◽

10.1007/s00239-002-2371-8 ◽

2002 ◽

Vol 55 (6) ◽

pp. 757-767 ◽

Cited By ~ 15

Author(s):

Ingo Busse ◽

Angelika Preisfeld

Keyword(s):

Molecular Evolution ◽

Ribosomal Dna ◽

Phylogenetic Inference

Download Full-text