scholarly journals Neglecting model selection alters phylogenetic inference

2019 ◽  
Author(s):  
Michael Gerth

ABSTRACTMolecular phylogenetics is a standard tool in modern biology that informs the evolutionary history of genes, organisms, and traits, and as such is important in a wide range of disciplines from medicine to palaeontology. Maximum likelihood phylogenetic reconstruction involves assumptions about the evolutionary processes that underlie the dataset to be analysed. These assumptions must be specified in forms of an evolutionary model, and a number of criteria may be used to identify the best-fitting from a plethora of available models of DNA evolution. Using many empirical and simulated nucleotide sequence alignments, Abadi et al.1 have recently found that phylogenetic inferences using best models identified by six different model selection criteria are, on average, very similar to each other. They further claimed that using the model GTR+I+G4 without prior model-fitting results in similarly accurate phylogenetic estimates, and consequently that skipping model selection entirely has no negative impact on many phylogenetic applications. Focussing on this claim, I here revisit and re-analyse some of the data put forward by Abadi et al. I argue that while the presented analyses are sound, the results are misrepresented and in fact - in line with previous work - demonstrate that model selection consistently leads to different phylogenetic estimates compared with using fixed models.

2021 ◽  
Author(s):  
Stephanie J Spielman ◽  
Molly Miraglia

Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. In this study, we assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA. We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.


Insects ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 518
Author(s):  
Bronwyn Egan ◽  
Zwannda Nethavhani ◽  
Barbara van Asch

Macrotermes termites play important ecological roles and are consumed by many communities as a delicacy and dietary complement throughout Africa. However, lack of reliable morphological characters has hampered studies of Macrotermes diversity in a wide range of scientific fields including ecology, phylogenetics and food science. In order to place our preliminary assessment of the diversity of Macrotermes in South Africa in context, we analysed a comprehensive dataset of COI sequences for African species including new and publicly available data. Phylogenetic reconstruction and estimates of genetic divergence showed a high level of incongruity between species names and genetic groups, as well as several instances of cryptic diversity. We identified three main clades and 17 genetic groups in the dataset. We propose that this structure be used as a background for future surveys of Macrotermes diversity in Africa, thus mitigating the negative impact of the present taxonomic uncertainties in the genus. The new specimens collected in Limpopo fell into four distinct genetic groups, suggesting that the region harbours remarkable Macrotermes diversity relative to other African regions surveyed in previous studies. This work shows that African Macrotermes have been understudied across the continent, and that the genus contains cryptic diversity undetectable by classic taxonomy. Furthermore, these results may inform future taxonomic revisions in Macrotermes, thus contributing to advances in termitology.


2021 ◽  
Author(s):  
Caitlin Cherryh ◽  
Bui Quang Minh ◽  
Rob Lanfear

AbstractMost phylogenetic analyses assume that the evolutionary history of an alignment (either that of a single locus, or of multiple concatenated loci) can be described by a single bifurcating tree, the so-called the treelikeness assumption. Treelikeness can be violated by biological events such as recombination, introgression, or incomplete lineage sorting, and by systematic errors in phylogenetic analyses. The incorrect assumption of treelikeness may then mislead phylogenetic inferences. To quantify and test for treelikeness in alignments, we develop a test statistic which we call the tree proportion. This statistic quantifies the proportion of the edge weights in a phylogenetic network that are represented in a bifurcating phylogenetic tree of the same alignment. We extend this statistic to a statistical test of treelikeness using a parametric bootstrap. We use extensive simulations to compare tree proportion to a range of related approaches. We show that tree proportion successfully identifies non-treelikeness in a wide range of simulation scenarios, and discuss its strengths and weaknesses compared to other approaches. The power of the tree-proportion test to reject non-treelike alignments can be lower than some other approaches, but these approaches tend to be limited in their scope and/or the ease with which they can be interpreted. Our recommendation is to test treelikeness of sequence alignments with both tree proportion and mosaic methods such as 3Seq. The scripts necessary to replicate this study are available at https://github.com/caitlinch/treelikeness


Author(s):  
Mohamed Saleh

This chapter investigates a long-standing puzzle in the economic history of the Middle East and North Africa (MENA) region: why do MENA’s native non-Muslim minorities have better socioeconomic (SES) outcomes than the Muslim majority, both historically and today? Focusing on the case of Coptic Christians in Egypt, the largest non-Muslim minority in absolute number in the region, and employing a wide range of novel archival data sources, the chapter argues that Copts’ superior SES can be explained neither by Islam’s negative impact on Muslims’ SES (where Islam is defined as a set of beliefs or institutions) nor by colonization’s preferential treatment of Copts. Instead, the chapter traces the phenomenon to self-selection on SES during Egypt’s historical conversion from Coptic Christianity to Islam in the aftermath of the Arab Conquest of the then-Coptic Egypt in 641 CE. The argument is that the regressivity-in-income of the poll tax on non-Muslims (initially all Egyptians) that was imposed continuously from 641 to 1856 led to the shrinkage of (non-convert) Copts into a better-off minority. The Coptic-Muslim SES gap then persisted due to group restrictions on access to white-collar and artisanal skills. The chapter opens new areas of research on non-Muslim minorities in the MENA region and beyond.


2019 ◽  
Author(s):  
Stephanie J. Spielman

AbstractIt is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness-of-fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. While it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models. This strategy allows us to investigate how protein models performs when they are mis-specified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR model, whose amino-acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, while relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Stephanie J. Spielman ◽  
Molly L. Miraglia

Abstract Background Multiple sequence alignments (MSAs) represent the fundamental unit of data inputted to most comparative sequence analyses. In phylogenetic analyses in particular, errors in MSA construction have the potential to induce further errors in downstream analyses such as phylogenetic reconstruction itself, ancestral state reconstruction, and divergence time estimation. In addition to providing phylogenetic methods with an MSA to analyze, researchers must also specify a suitable evolutionary model for the given analysis. Most commonly, researchers apply relative model selection to select a model from candidate set and then provide both the MSA and the selected model as input to subsequent analyses. While the influence of MSA errors has been explored for most stages of phylogenetics pipelines, the potential effects of MSA uncertainty on the relative model selection procedure itself have not been explored. Results We assessed the consistency of relative model selection when presented with multiple perturbed versions of a given MSA. We find that while relative model selection is mostly robust to MSA uncertainty, in a substantial proportion of circumstances, relative model selection identifies distinct best-fitting models from different MSAs created from the same set of sequences. We find that this issue is more pervasive for nucleotide data compared to amino-acid data. However, we also find that it is challenging to predict whether relative model selection will be robust or sensitive to uncertainty in a given MSA. Conclusions We find that that MSA uncertainty can affect virtually all steps of phylogenetic analysis pipelines to a greater extent than has previously been recognized, including relative model selection.


2021 ◽  
Vol 15 (02) ◽  
pp. 280-288
Author(s):  
Silvia Angeletti ◽  
Domenico Benvenuto ◽  
Marta Fogolari ◽  
Cecilia De Flora ◽  
Giancarlo Ceccarelli ◽  
...  

Introduction: Salivirus (SalV) represents an emerging problem in public health especially during the recent years. In this study, the Bayesian evolutionary history and the spread of the virus through the different countries have been reported. Methodology: a database of 81 sequences of SalV structural VP1 fragment were downloaded from GenBank, aligned and manually edited by Bioedit Software. ModelTest v. 3.7 software was used to estimate the simplest evolutionary model fitting the sequence dataset. A Maximum-Likelihood tree has been generated using MEGA-X to test the “clockliness” signal using TempEst 1.5.1. The Bayesian phylogenetic tree was built by BEAST. Homology modelling was performed by SWISS-Model and protein variability evaluated by ConSurf server. Results: the phylogenetic tree showed a clade of SalV A2 and three main clades of SalV A1, revealing several infections in humans in South Korea, India, Tunisia, China, Nigeria, Ethiopia and USA. The Bayesian maximum clade credibility tree and the time of the most common recent ancestor dated back the root of the tree to the year 1788 with the probable origin in USA. Selective pressure analysis revealed two positive selection sites, His at 100th and Leu at 116th positions that at the homology modelling resulted important to guarantee protein stability and variability. This could contribute to the development of new mutations modifying the clinical features of this evolving virus. Conclusions: Bayesian phylogenetic and phylodynamic represented a useful tool to follow the transmission dynamic of SalV and to prevent new epidemics worldwide.


2020 ◽  
Vol 37 (7) ◽  
pp. 2110-2123 ◽  
Author(s):  
Stephanie J Spielman

Abstract It is regarded as best practice in phylogenetic reconstruction to perform relative model selection to determine an appropriate evolutionary model for the data. This procedure ranks a set of candidate models according to their goodness of fit to the data, commonly using an information theoretic criterion. Users then specify the best-ranking model for inference. Although it is often assumed that better-fitting models translate to increase accuracy, recent studies have shown that the specific model employed may not substantially affect inferences. We examine whether there is a systematic relationship between relative model fit and topological inference accuracy in protein phylogenetics, using simulations and real sequences. Simulations employed site-heterogeneous mechanistic codon models that are distinct from protein-level phylogenetic inference models, allowing us to investigate how protein models performs when they are misspecified to the data, as will be the case for any real sequence analysis. We broadly find that phylogenies inferred across models with vastly different fits to the data produce highly consistent topologies. We additionally find that all models infer similar proportions of false-positive splits, raising the possibility that all available models of protein evolution are similarly misspecified. Moreover, we find that the parameter-rich GTR (general time reversible) model, whose amino acid exchangeabilities are free parameters, performs similarly to models with fixed exchangeabilities, although the inference precision associated with GTR models was not examined. We conclude that, although relative model selection may not hinder phylogenetic analysis on protein data, it may not offer specific predictable improvements and is not a reliable proxy for accuracy.


2021 ◽  
Author(s):  
Sebastian Burgstaller-Muehlbacher ◽  
Stephen M Crotty ◽  
Heiko A Schmidt ◽  
Tamara Drucks ◽  
Arndt von Haeseler

Selecting the best model of sequence evolution for a multiple sequence alignment (MSA) constitutes the first step of phylogenetic tree reconstruction. Common approaches for inferring nucleotide models typically apply maximum likelihood (ML) methods, with discrimination between models determined by one of several information criteria. This requires tree reconstruction and optimisation which can be computationally expensive. We demonstrate that neural networks can be used to perform model selection, without the need to reconstruct trees, optimise parameters, or calculate likelihoods. We introduce ModelRevelator, a model selection tool underpinned by two deep neural networks. The first neural network, NNmodelfind, recommends one of six commonly used models of sequence evolution, ranging in complexity from JC to GTR. The second, NNalphafind, recommends whether or not a Γ--distributed rate heterogeneous model should be incorporated, and if so, provides an estimate of the shape parameter, ɑ. Users can simply input an MSA into ModelRevelator, and swiftly receive output recommending the evolutionary model, inclusive of the presence or absence of rate heterogeneity, and an estimate of ɑ. We show that ModelRevelator performs comparably with likelihood-based methods over a wide range of parameter settings, with significant potential savings in computational effort. Further, we show that this performance is not restricted to the alignments on which the networks were trained, but is maintained even on unseen empirical data. ModelRevelator will be made freely available in the forthcoming version of IQ-Tree (http://www.iqtree.org), and we expect it will provide a valuable alternative for phylogeneticists, especially where traditional methods of model selection are computationally prohibitive.


Author(s):  
Nina Ochirova ◽  
◽  
Nadmidyn Sukhebaatar ◽  

Introduction This article within the framework of Russian-Mongolian relations examines the regional aspect of cultural cooperation between Mongolia and Kalmykia in the Soviet and post-Soviet periods. The authors investigated a wide range of problems related to the place and role of Kalmykia in the history of Russian-Mongolian relations, studied the development of multifaceted interaction between two kindred peoples. Methods and materials. From a methodological point of view, this study is an experience of building a comprehensive vision of the problem. An interdisciplinary, comprehensive approach to solving current research problems makes it possible to synthesize all relevant aspects of studying the historical and cultural aspects of regional cooperation between Kalmykia and Mongolia within the framework of Russian-Mongolian relations. Analysis. 2021 marks the 100th anniversary of the establishment of Soviet-Mongolian official diplomatic relations. The Agreement between Mongolia and Russia signed on November 5, 1921 strengthened the military-political cooperation between the two countries, served as a broad international recognition of Mongolia as a sovereign state and played an important stabilizing role in the difficult situation in the Far East. In the 90s of the last century, Russia and Mongolia engaged in profound transformations. The scale of the work carried out by our countries demanded to shift all their attention to solving internal problems, which undoubtedly had a negative impact on the level of relations between the two states. Later, having solved the problems of radical transformations of society, Russia and Mongolia began to restore relations, but on completely new principles. In these conditions, along with other industries, the sphere of cultural interaction between Russia and Mongolia, the development of regional cooperation, becomes significant. One of the Russian regions is Kalmykia, which is linked with Mongolia by ancient historical roots, the unity of culture, religion, language and tradition. These factors play an important role in the further strengthening of good neighborly relations between Russia and Mongolia, in the development of regional cultural cooperation. Results. Studying the history of interaction between the two fraternal peoples in the past and present in the aspect of Russian-Mongolian relations provides rich material for an objective assessment of events in specific historical conditions. Kalmykia, like the border regions of Russia, makes a certain contribution to the strengthening of Russian-Mongolian relations.


Sign in / Sign up

Export Citation Format

Share Document