ConTreeDP: A consensus method of tumor trees based on maximum directed partition support problem

Mapping Intimacies ◽

10.1101/2021.10.13.463978 ◽

2021 ◽

Author(s):

Russell Schwartz ◽

Xuecong Fu

Keyword(s):

Genomic Data ◽

Phylogenetic Inference ◽

Heterogeneous Data ◽

Consensus Method ◽

Evolutionary Time ◽

Consensus Tree ◽

Consensus Trees ◽

Tree Algorithms ◽

New Algorithms ◽

Tree Method

Phylogenetic inference has become a crucial tool for interpreting cancer genomic data, but continuing advances in our understanding of somatic mutability in cancer, genomic technolo- gies for profiling it, and the scale of data available have created a persistent need for new algorithms able to deal with these challenges. One particular need has been for new forms of consensus tree algorithms, which present special challenges in the cancer space for dealing with heterogeneous data, short evolutionary time scales, and rapid mutation by a wide variety of somatic mutability mechanisms. We develop a new consensus tree method for clonal phy- logenetics, ConTreeDP, based on a formulation of the Maximum Directed Partition Support Consensus Tree (MDPSCT) problem. We demonstrate theoretically and empirically that our approach can efficiently and accurately compute clonal consensus trees from cancer genomic data.

Download Full-text

Summarizing the solution space in tumor phylogeny inference by multiple consensus trees

Bioinformatics ◽

10.1093/bioinformatics/btz312 ◽

2019 ◽

Vol 35 (14) ◽

pp. i408-i416 ◽

Cited By ~ 12

Author(s):

Nuraini Aguse ◽

Yuanyuan Qi ◽

Mohammed El-Kebir

Keyword(s):

Solution Space ◽

Simulated Data ◽

Exact Algorithm ◽

Real Data ◽

Supplementary Information ◽

Mixed Integer ◽

Consensus Tree ◽

Large Solution ◽

Consensus Trees ◽

Topological Features

Abstract Motivation Cancer phylogenies are key to studying tumorigenesis and have clinical implications. Due to the heterogeneous nature of cancer and limitations in current sequencing technology, current cancer phylogeny inference methods identify a large solution space of plausible phylogenies. To facilitate further downstream analyses, methods that accurately summarize such a set T of cancer phylogenies are imperative. However, current summary methods are limited to a single consensus tree or graph and may miss important topological features that are present in different subsets of candidate trees. Results We introduce the Multiple Consensus Tree (MCT) problem to simultaneously cluster T and infer a consensus tree for each cluster. We show that MCT is NP-hard, and present an exact algorithm based on mixed integer linear programming (MILP). In addition, we introduce a heuristic algorithm that efficiently identifies high-quality consensus trees, recovering all optimal solutions identified by the MILP in simulated data at a fraction of the time. We demonstrate the applicability of our methods on both simulated and real data, showing that our approach selects the number of clusters depending on the complexity of the solution space T. Availability and implementation https://github.com/elkebir-group/MCT. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Phylogeny and reclassification of the tribe Inuleae (Asteraceae)

Canadian Journal of Botany ◽

10.1139/b89-292 ◽

1989 ◽

Vol 67 (8) ◽

pp. 2277-2296 ◽

Cited By ~ 58

Author(s):

Arne A. Anderberg

Keyword(s):

Monophyletic Group ◽

Consensus Tree ◽

Consensus Trees ◽

Taxonomic Implication ◽

New Tribe ◽

The Many ◽

Critical Investigation

The tribe Inuleae Cass. has been subject to a critical investigation. The many technical characters that are traditionally used in classification of the Inuleae are scrutinized, discussed, and analysed by means of a computerized parsimony program (PAUP). With one representative from each of the tribes Vernonieae, Liabeae, and Lactuceae as outgroups, three different analyses have been performed. Strict consensus trees for the three separate analyses are presented and discussed. A consensus tree based on the cladogram topologies obtained from all the three analyses is also presented. The taxonomic implication of the analyses is that the tribe Inuleae is an unnatural, not monophyletic group, which must be divided in better defined monophyletic tribes. Hence, three tribes are recognized and the majority of the described genera of the Inuleae are tentatively referred to one of these tribes. The tribes Gnaphalieae Rydb. (comprising the Inuleae–Gnaphaliinae and the Inuleae–Athrixiinae sensu Merxmüller et al.) and the Inuleae s.str., are accepted. Furthermore, the former subtribe Inuleae–Plucheinae Benth. is recognized and described as the new tribe Plucheae (Benth.) A. Anderb.

Download Full-text

cognac: rapid generation of concatenated gene alignments for phylogenetic inference from large whole genome sequencing datasets

10.1101/2020.10.15.340901 ◽

2020 ◽

Author(s):

Ryan D. Crawford ◽

Evan S. Snitkin

Keyword(s):

Phylogenetic Analysis ◽

Genome Sequencing ◽

Software Package ◽

Genomic Data ◽

Phylogenetic Inference ◽

R Package ◽

Core Gene ◽

Whole Genome ◽

Rapid Generation ◽

User Friendly

AbstractThe quantity of genomic data is expanding at an increasing rate. Tools for phylogenetic analysis which scale to the quantity of available data are required. We present cognac, a user-friendly software package to rapidly generate concatenated gene alignments for phylogenetic analysis. We applied this tool to generate core gene alignments for very large genomic datasets, including a dataset of over 11,000 genomes from the genus Escherichia containing 1,353 genes, which was constructed in less than 17 hours. We have released cognac as an R package (https://github.com/rdcrawford/cognac) with customizable parameters for adaptation to diverse applications.

Download Full-text

Allele-specific multi-sample copy number segmentation

10.1101/166017 ◽

2017 ◽

Author(s):

Edith M. Ross ◽

Kerstin Haase ◽

Peter Van Loo ◽

Florian Markowetz

Keyword(s):

Copy Number ◽

Genomic Data ◽

Phylogenetic Inference ◽

R Package ◽

Copy Number Alterations ◽

Allele Specific ◽

Multiple Samples

AbstractMotivationAllele-specific copy number alterations are commonly used to trace the evolution of tumours. A key step of the analysis is to segment genomic data into regions of constant copy number. For precise phylogenetic inference, breakpoints shared between samples need to be aligned to each other.ResultsHere we present asmultipcf, an algorithm for allele-specific segmentation of multiple samples that infers private and shared segment boundaries of phylogenetically related samples. The output of this algorithm can directly be used for allele-specific copy number calling using ASCAT.Availabilityasmultipcf is available as part of the ASCAT R package (version 2.5) from github.com/Crick-CancerGenomics/ascat

Download Full-text

IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era

Molecular Biology and Evolution ◽

10.1093/molbev/msaa015 ◽

2020 ◽

Vol 37 (5) ◽

pp. 1530-1534 ◽

Cited By ~ 236

Author(s):

Bui Quang Minh ◽

Heiko A Schmidt ◽

Olga Chernomor ◽

Dominik Schrempf ◽

Michael D Woodhams ◽

...

Keyword(s):

Maximum Likelihood ◽

Software Package ◽

Genomic Data ◽

Phylogenetic Inference ◽

Sequence Evolution ◽

Computational Approaches ◽

New Models ◽

User Friendly

Abstract IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.

Download Full-text

The first chromosome-level gecko genome reveals dynamic sex chromosomes in Neotropical leaf-litter geckos (Sphaerodactylidae: Sphaerodactylus)

10.1101/2021.08.13.456260 ◽

2021 ◽

Author(s):

Brendan J. Pinto ◽

Shannon E Keating ◽

Stuart V Nielsen ◽

Daniel P Scantlebury ◽

Juan D Daza ◽

...

Keyword(s):

Sex Chromosomes ◽

Sex Chromosome ◽

Genomic Data ◽

Linkage Groups ◽

Evolutionary Time ◽

Future Studies ◽

Linkage Information ◽

Sex Chromosome System ◽

Time Periods ◽

Chromosome Level

Sex chromosomes have evolved many times across eukaryotes, indicating both their importance and their evolutionary flexibility. Some vertebrate groups, such as mammals and birds, have maintained a single, conserved sex chromosome system across long evolutionary time periods. By contrast, many reptiles, amphibians, and fish have undergone frequent sex chromosome transitions, most of which remain to be catalogued. Among reptiles, gecko lizards (infraorder Gekkota) have shown an exceptional lability with regard to sex chromosome transitions and may possess the majority of transitions within squamates (lizards and snakes). However—across geckos—information about sex chromosome linkage is expressly lacking, leaving large gaps in our understanding of the evolutionary processes at play in this system. To address this gap, we assembled the first chromosome-level genome for a gecko and use this linkage information to survey six Sphaerodactylus species using a variety of genomic data, including whole-genome re-sequencing, RADseq, and RNAseq. Previous work has identified XY systems in two species of Sphaerodactylus geckos. We expand upon that work to identify between two and four sex chromosome cis-transitions (XY to XY) within the genus. Interestingly, we confirmed two linkage groups as XY sex chromosome systems that were previously unknown to act as sex chromosomes in tetrapods (syntenic with Gallus 3 and Gallus 18/30/33). We highlight the increasing evidence that most (if not all) linkage groups will likely be identified as a sex chromosome in future studies given thorough enough sampling.

Download Full-text

Opening the Black Box: Interpretable Machine Learning for Geneticists

10.20944/preprints202002.0239.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Christina B. Azodi ◽

Jiliang Tang ◽

Shin-Han Shiu

Keyword(s):

Machine Learning ◽

Genomic Data ◽

Heterogeneous Data ◽

Black Box ◽

High Dimensional ◽

Future Directions ◽

Making Sense ◽

Interpretable Machine Learning ◽

Genetics And Genomics ◽

Complex Patterns

Machine learning (ML) has emerged as a critical tool for making sense of the growing amount of genetic and genomic data available because of its ability to find complex patterns in high dimensional and heterogeneous data. While the complexity of ML models is what makes them powerful, it also makes them difficult to interpret. Fortunately, recent efforts to develop approaches that make the inner workings of ML models understandable to humans have improved our ability to make novel biological insights using ML. Here we discuss the importance of interpretable ML, different strategies for interpreting ML models, and examples of how these strategies have been applied. Finally, we identify challenges and promising future directions for interpretable ML in genetics and genomics.

Download Full-text

IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era

10.1101/849372 ◽

2019 ◽

Cited By ~ 8

Author(s):

Bui Quang Minh ◽

Heiko Schmidt ◽

Olga Chernomor ◽

Dominik Schrempf ◽

Michael Woodhams ◽

...

Keyword(s):

Maximum Likelihood ◽

Software Package ◽

Genomic Data ◽

Phylogenetic Inference ◽

Sequence Evolution ◽

Computational Approaches ◽

Link Type ◽

New Models ◽

User Friendly

AbstractIQ-TREE (http://www.iqtree.org) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.

Download Full-text

Multispecies coalescent analysis unravels the non-monophyly and controversial relationships of Hexapoda

10.1101/187997 ◽

2017 ◽

Author(s):

Lucas A. Freitas ◽

Beatriz Mello ◽

Carlos G. Schrago

Keyword(s):

Phylogenetic Relationships ◽

Gene Tree ◽

Genomic Data ◽

Population Level ◽

Phylogenetic Inference ◽

Evolutionary Relationships ◽

Effective Population ◽

Multispecies Coalescent ◽

Population Sizes ◽

Phylogenomic Analyses

AbstractWith the increase in the availability of genomic data, sequences from different loci are usually concatenated in a supermatrix for phylogenetic inference. However, as an alternative to the supermatrix approach, several implementations of the multispecies coalescent (MSC) have been increasingly used in phylogenomic analyses due to their advantages in accommodating gene tree topological heterogeneity by taking account population-level processes. Moreover, the development of faster algorithms under the MSC is enabling the analysis of thousands of loci/taxa. Here, we explored the MSC approach for a phylogenomic dataset of Insecta. Even with the challenges posed by insects, due to large effective population sizes coupled with short deep internal branches, our MSC analysis could recover several orders and evolutionary relationships in agreement with current insect systematics. However, some phylogenetic relationships were not recovered by MSC methods. Most noticeable, a remiped crustacean was positioned within the Insecta. Additionally, the interordinal relationships within Polyneoptera and Neuropteroidea contradicted recent works, by suggesting the non-monophyly of Neuroptera. We notice, however, that these phylogenetic arrangements were also poorly supported by previous analyses and that they were sensitive to gene sampling.

Download Full-text

Estimating Bifurcating Consensus Phylogenetic Trees Using Evolutionary Imperialist Competitive Algorithm

Current Bioinformatics ◽

10.2174/1574893614666190225145620 ◽

2019 ◽

Vol 14 (8) ◽

pp. 728-739

Author(s):

Vageehe Nikkhah ◽

Seyed M. Babamir ◽

Seyed S. Arab

Keyword(s):

Evolutionary Algorithms ◽

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Imperialist Competitive Algorithm ◽

Species Level ◽

Consensus Method ◽

Gene Trees ◽

Consensus Tree ◽

Large Space ◽

Competitive Algorithm

Background:One of the important goals of phylogenetic studies is the estimation of species-level phylogeny. A phylogenetic tree is an evolutionary classification of different species of creatures. There are several methods to generate such trees, where each method may produce a number of different trees for the species. By choosing the same proteins of all species, it is possible that the topology and arrangement of trees would be different.Objective:There are methods by which biologists summarize different phylogenetic trees to a tree, called consensus tree. A consensus method deals with the combination of gene trees to estimate a species tree. As the phylogenetic trees grow and their number is increased, estimating a consensus tree based on the species-level phylogenetic trees becomes a challenge.Methods:The current study aims at using the Imperialist Competitive Algorithm (ICA) to estimate bifurcating consensus trees. Evolutionary algorithms like ICA are suitable to resolve problems with the large space of candidate solutions.Results:The obtained consensus tree has more similarity to the native phylogenetic tree than related studies.Conclusion:The proposed method enjoys mechanisms and policies that enable us more than other evolutionary algorithms in tuning the proposed algorithm. Thanks to these policies and the mechanisms, the algorithm enjoyed efficiently in obtaining the optimum consensus tree. The algorithm increased the possibility of selecting an optimum solution by imposing some changes in its parameters.

Download Full-text