scholarly journals Triplet-based similarity score for fully multi-labeled trees with poly-occurring labels

Author(s):  
Simone Ciccolella ◽  
Giulia Bernardini ◽  
Luca Denti ◽  
Paola Bonizzoni ◽  
Marco Previtali ◽  
...  

AbstractThe latest advances in cancer sequencing, and the availability of a wide range of methods to infer the evolutionary history of tumors, have made it important to evaluate, reconcile and cluster different tumor phylogenies.Recently, several notions of distance or similarities have been proposed in the literature, but none of them has emerged as the golden standard. Moreover, none of the known similarity measures is able to manage mutations occurring multiple times in the tree, a circumstance often occurring in real cases.To overcome these limitations, in this paper we propose MP3, the first similarity measure for tumor phylogenies able to effectively manage cases where multiple mutations can occur at the same time and mutations can occur multiple times. Moreover, a comparison of MP3 with other measures shows that it is able to classify correctly similar and dissimilar trees, both on simulated and on real data.

Author(s):  
Simone Ciccolella ◽  
Giulia Bernardini ◽  
Luca Denti ◽  
Paola Bonizzoni ◽  
Marco Previtali ◽  
...  

Abstract Motivation The latest advances in cancer sequencing, and the availability of a wide range of methods to infer the evolutionary history of tumors, have made it important to evaluate, reconcile and cluster different tumor phylogenies. Recently, several notions of distance or similarities have been proposed in the literature, but none of them has emerged as the golden standard. Moreover, none of the known similarity measures is able to manage mutations occurring multiple times in the tree, a circumstance often occurring in real cases. Results To overcome these limitations, in this article, we propose MP3, the first similarity measure for tumor phylogenies able to effectively manage cases where multiple mutations can occur at the same time and mutations can occur multiple times. Moreover, a comparison of MP3 with other measures shows that it is able to classify correctly similar and dissimilar trees, both on simulated and on real data. Availability and implementation An open source implementation of MP3 is publicly available at https://github.com/AlgoLab/mp3treesim. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Caitlin Cherryh ◽  
Bui Quang Minh ◽  
Rob Lanfear

AbstractMost phylogenetic analyses assume that the evolutionary history of an alignment (either that of a single locus, or of multiple concatenated loci) can be described by a single bifurcating tree, the so-called the treelikeness assumption. Treelikeness can be violated by biological events such as recombination, introgression, or incomplete lineage sorting, and by systematic errors in phylogenetic analyses. The incorrect assumption of treelikeness may then mislead phylogenetic inferences. To quantify and test for treelikeness in alignments, we develop a test statistic which we call the tree proportion. This statistic quantifies the proportion of the edge weights in a phylogenetic network that are represented in a bifurcating phylogenetic tree of the same alignment. We extend this statistic to a statistical test of treelikeness using a parametric bootstrap. We use extensive simulations to compare tree proportion to a range of related approaches. We show that tree proportion successfully identifies non-treelikeness in a wide range of simulation scenarios, and discuss its strengths and weaknesses compared to other approaches. The power of the tree-proportion test to reject non-treelike alignments can be lower than some other approaches, but these approaches tend to be limited in their scope and/or the ease with which they can be interpreted. Our recommendation is to test treelikeness of sequence alignments with both tree proportion and mosaic methods such as 3Seq. The scripts necessary to replicate this study are available at https://github.com/caitlinch/treelikeness


2020 ◽  
Vol 66 (3-4) ◽  
pp. 142-150
Author(s):  
Jessica Worthington Wilmer ◽  
Andrew P. Amey ◽  
Carmel McDougall ◽  
Melanie Venz ◽  
Stephen Peck ◽  
...  

Sclerophyll woodlands and open forests once covered vast areas of eastern Australia, but have been greatly fragmented and reduced in extent since European settlement. The biogeographic and evolutionary history of the biota of eastern Australia’s woodlands also remains poorly known, especially when compared to rainforests to the east, or the arid biome to the west. Here we present an analysis of patterns of mitochondrial genetic diversity in two species of Pygopodid geckos with distributions centred on the Brigalow Belt Bioregion of eastern Queensland. One moderately large and semi-arboreal species, Paradelma orientalis, shows low genetic diversity and no clear geographic structuring across its wide range. In contrast a small and semi-fossorial species, Delma torquata, consists of two moderately divergent clades, one from the ranges and upland of coastal areas of south-east Queensland, and other centred in upland areas further inland. These data point to varying histories of geneflow and refugial persistance in eastern Australia’s vast but now fragmented open woodlands. The Carnarvon Ranges of central Queensland are also highlighted as a zone of persistence for cool and/or wet-adapted taxa, however the evolutionary history and divergence of most outlying populations in these mountains remains unstudied.


2019 ◽  
Vol 8 (3) ◽  
pp. 6756-6762

A recommendation algorithm comprises of two important steps: 1) Predicting rates, and 2) Recommendation. Rate prediction is a cumulative function of the similarity score between two movies and rate history of those movies by other users. There are various methods for rate prediction such as weighted sum method, regression, deviation based etc. All these methods rely on finding similar items to the items previously viewed/rated by target user, with assumption that user tends to have similar rating for similar items. Computing the similarities can be done using various similarity measures such as Euclidian Distance, Cosine Similarity, Adjusted Cosine Similarity, Pearson Correlation, Jaccard Similarity etc. All of these well-known approaches calculate similarity score between two movies using simple rating based data. Hence, such similarity measures could not accurately model rating behavior of user. In this paper, we will show that the accuracy in rate prediction can be enhanced by incorporating ontological domain knowledge in similarity computation. This paper introduces a new ontological semantic similarity measure between two movies. For experimental evaluation, the performance of proposed approach is compared with two existing approaches: 1) Adjusted Cosine Similarity (ACS), and 2) Weighted Slope One (WSO) algorithm, in terms of two performance measures: 1) Execution time and 2) Mean Absolute Error (MAE). The open-source Movielens (ml-1m) dataset is used for experimental evaluation. As our results show, the ontological semantic similarity measure enhances the performance of rate prediction as compared to the existing-well known approaches.


2019 ◽  
Vol 35 (20) ◽  
pp. 4072-4080 ◽  
Author(s):  
Timo M Deist ◽  
Andrew Patti ◽  
Zhaoqi Wang ◽  
David Krane ◽  
Taylor Sorenson ◽  
...  

Abstract Motivation In a predictive modeling setting, if sufficient details of the system behavior are known, one can build and use a simulation for making predictions. When sufficient system details are not known, one typically turns to machine learning, which builds a black-box model of the system using a large dataset of input sample features and outputs. We consider a setting which is between these two extremes: some details of the system mechanics are known but not enough for creating simulations that can be used to make high quality predictions. In this context we propose using approximate simulations to build a kernel for use in kernelized machine learning methods, such as support vector machines. The results of multiple simulations (under various uncertainty scenarios) are used to compute similarity measures between every pair of samples: sample pairs are given a high similarity score if they behave similarly under a wide range of simulation parameters. These similarity values, rather than the original high dimensional feature data, are used to build the kernel. Results We demonstrate and explore the simulation-based kernel (SimKern) concept using four synthetic complex systems—three biologically inspired models and one network flow optimization model. We show that, when the number of training samples is small compared to the number of features, the SimKern approach dominates over no-prior-knowledge methods. This approach should be applicable in all disciplines where predictive models are sought and informative yet approximate simulations are available. Availability and implementation The Python SimKern software, the demonstration models (in MATLAB, R), and the datasets are available at https://github.com/davidcraft/SimKern. Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 2012 ◽  
pp. 1-17 ◽  
Author(s):  
Anton Novikov ◽  
Georgiy Smyshlyaev ◽  
Olga Novikova

Chromodomain-containing LTR retrotransposons are one of the most successful groups of mobile elements in plant genomes. Previously, we demonstrated that two types of chromodomains (CHDs) are carried by plant LTR retrotransposons. Chromodomains from group I (CHD_I) were detected only in Tcn1-like LTR retrotransposons from nonseed plants such as mosses (including the model moss species Physcomitrella) and lycophytes (the Selaginella species). LTR retrotransposon chromodomains from group II (CHD_II) have been described from a wide range of higher plants. In the present study, we performed computer-based mining of plant LTR retrotransposon CHDs from diverse plants with an emphasis on spike-moss Selaginella. Our extended comparative and phylogenetic analysis demonstrated that two types of CHDs are present only in the Selaginella genome, which puts this species in a unique position among plants. It appears that a transition from CHD_I to CHD_II and further diversification occurred in the evolutionary history of plant LTR retrotransposons at approximately 400 MYA and most probably was associated with the evolution of chromatin organization.


2019 ◽  
Vol 36 (11) ◽  
pp. 2604-2619 ◽  
Author(s):  
Elodie Laine ◽  
Yasaman Karami ◽  
Alessandra Carbone

Abstract The systematic and accurate description of protein mutational landscapes is a question of utmost importance in biology, bioengineering, and medicine. Recent progress has been achieved by leveraging on the increasing wealth of genomic data and by modeling intersite dependencies within biological sequences. However, state-of-the-art methods remain time consuming. Here, we present Global Epistatic Model for predicting Mutational Effects (GEMME) (www.lcqb.upmc.fr/GEMME), an original and fast method that predicts mutational outcomes by explicitly modeling the evolutionary history of natural sequences. This allows accounting for all positions in a sequence when estimating the effect of a given mutation. GEMME uses only a few biologically meaningful and interpretable parameters. Assessed against 50 high- and low-throughput mutational experiments, it overall performs similarly or better than existing methods. It accurately predicts the mutational landscapes of a wide range of protein families, including viral ones and, more generally, of much conserved families. Given an input alignment, it generates the full mutational landscape of a protein in a matter of minutes. It is freely available as a package and a webserver at www.lcqb.upmc.fr/GEMME/.


GigaScience ◽  
2021 ◽  
Vol 10 (5) ◽  
Author(s):  
Mengni Liu ◽  
Jianyu Chen ◽  
Xin Wang ◽  
Chengwei Wang ◽  
Xiaolong Zhang ◽  
...  

Abstract Background Multi-region sequencing (MRS) has been widely used to analyze intra-tumor heterogeneity (ITH) and cancer evolution. However, comprehensive analysis of mutational data from MRS is still challenging, necessitating complicated integration of a plethora of computational and statistical approaches. Findings Here, we present MesKit, an R/Bioconductor package that can assist in characterizing genetic ITH and tracing the evolutionary history of tumors based on somatic alterations detected by MRS. MesKit provides a wide range of analysis and visualization modules, including ITH evaluation, metastatic route inference, and mutational signature identification. In addition, MesKit implements an auto-layout algorithm to generate phylogenetic trees based on somatic mutations. The application of MesKit for 2 reported MRS datasets of hepatocellular carcinoma and colorectal cancer identified known heterogeneous features and evolutionary patterns, together with potential driver events during cancer evolution. Conclusions In summary, MesKit is useful for interpreting ITH and tracing evolutionary trajectory based on MRS data. MesKit is implemented in R and available at https://bioconductor.org/packages/MesKit under the GPL v3 license.


BMC Biology ◽  
2022 ◽  
Vol 20 (1) ◽  
Author(s):  
Diego Angosto-Bazarra ◽  
Cristina Alarcón-Vila ◽  
Laura Hurtado-Navarro ◽  
María C. Baños ◽  
Jack Rivers-Auty ◽  
...  

Abstract Background Gasdermins are ancient (>500million-years-ago) proteins, constituting a family of pore-forming proteins that allow the release of intracellular content including proinflammatory cytokines. Despite their importance in the immune response, and although gasdermin and gasdermin-like genes have been identified across a wide range of animal and non-animal species, there is limited information about the evolutionary history of the gasdermin family, and their functional roles after infection. In this study, we assess the lytic functions of different gasdermins across Metazoa species, and use a mouse model of sepsis to evaluate the expression of the different gasdermins during infection. Results We show that the majority of gasdermin family members from distantly related animal clades are pore-forming, in line with the function of the ancestral proto-gasdermin and gasdermin-like proteins of Bacteria. We demonstrate the first expansion of this family occurred through a duplication of the ancestral gasdermin gene which formed gasdermin E and pejvakin prior to the divergence of cartilaginous fish and bony fish ~475 mya. We show that pejvakin from cartilaginous fish and mammals lost the pore-forming functionality and thus its role in cell lysis. We describe that the pore-forming gasdermin A formed ~320 mya as a duplication of gasdermin E prior to the divergence of the Sauropsida clade (the ancestral lineage of reptiles, turtles, and birds) and the Synapsid clade (the ancestral lineage of mammals). We then demonstrate that the gasdermin A gene duplicated to form the rest of the gasdermin family including gasdermins B, C, and D: pore-forming proteins that present a high variation of the exons in the linker sequence, which in turn allows for diverse activation pathways. Finally, we describe expression of murine gasdermin family members in different tissues in a mouse sepsis model, indicating function during infection response. Conclusions In this study we explored the evolutionary history of the gasdermin proteins in animals and demonstrated that the pore-formation functionality has been conserved from the ancient proto-gasdermin protein. We also showed that one gasdermin family member, pejvakin, lost its pore-forming functionality, but that all gasdermin family members, including pejvakin, likely retained a role in inflammation and the physiological response to infection.


2013 ◽  
Vol 94 (4) ◽  
pp. 738-748 ◽  
Author(s):  
Ying Tao ◽  
Mang Shi ◽  
Christina Conrardy ◽  
Ivan V. Kuzmin ◽  
Sergio Recuenco ◽  
...  

Polyomaviruses (PyVs) have been identified in a wide range of avian and mammalian species. However, little is known about their occurrence, genetic diversity and evolutionary history in bats, even though bats are important reservoirs for many emerging viral pathogens. This study screened 380 specimens from 35 bat species from Kenya and Guatemala for the presence of PyVs by semi-nested pan-PyV PCR assays. PyV DNA was detected in 24 of the 380 bat specimens. Phylogenetic analysis revealed that the bat PyV sequences formed 12 distinct lineages. Full-genome sequences were obtained for seven representative lineages and possessed similar genomic features to known PyVs. Strikingly, this evolutionary analysis revealed that the bat PyVs were paraphyletic, suggestive of multiple species jumps between bats and other mammalian species, such that the theory of virus–host co-divergence for mammalian PyVs as a whole could be rejected. In addition, evidence was found for strong heterogeneity in evolutionary rate and potential recombination in a number of PyV complete genomes, which complicates both phylogenetic analysis and virus classification. In summary, this study revealed that bats are important reservoirs of PyVs and that these viruses have a complex evolutionary history.


Sign in / Sign up

Export Citation Format

Share Document