scholarly journals Network science inspires novel tree shape statistics

PLoS ONE ◽  
2021 ◽  
Vol 16 (12) ◽  
pp. e0259877
Author(s):  
Leonid Chindelevitch ◽  
Maryam Hayati ◽  
Art F. Y. Poon ◽  
Caroline Colijn

The shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package at http://github.com/Leonardini/treeCentrality.

2019 ◽  
Author(s):  
Leonid Chindelevitch ◽  
Maryam Hayati ◽  
Art F. Y. Poon ◽  
Caroline Colijn

1AbstractThe shape of phylogenetic trees can be used to gain evolutionary insights. A tree’s shape specifies the connectivity of a tree, while its branch lengths reflect either the time or genetic distance between branching events; well-known measures of tree shape include the Colless and Sackin imbalance, which describe the asymmetry of a tree. In other contexts, network science has become an important paradigm for describing structural features of networks and using them to understand complex systems, ranging from protein interactions to social systems. Network science is thus a potential source of many novel ways to characterize tree shape, as trees are also networks. Here, we tailor tools from network science, including diameter, average path length, and betweenness, closeness, and eigenvector centrality, to summarize phylogenetic tree shapes. We thereby propose tree shape summaries that are complementary to both asymmetry and the frequencies of small configurations. These new statistics can be computed in linear time and scale well to describe the shapes of large trees. We apply these statistics, alongside some conventional tree statistics, to phylogenetic trees from three very different viruses (HIV, dengue fever and measles), from the same virus in different epidemiological scenarios (influenza A and HIV) and from simulation models known to produce trees with different shapes. Using mutual information and supervised learning algorithms, we find that the statistics adapted from network science perform as well as or better than conventional statistics. We describe their distributions and prove some basic results about their extreme values in a tree. We conclude that network science-based tree shape summaries are a promising addition to the toolkit of tree shape features. All our shape summaries, as well as functions to select the most discriminating ones for two sets of trees, are freely available as an R package athttp://github.com/Leonardini/treeCentrality.


2018 ◽  
Vol 7 (4) ◽  
pp. 515-528 ◽  
Author(s):  
Desmond J Higham

Abstract The friendship paradox states that, on average, our friends have more friends than we do. In network terms, the average degree over the nodes can never exceed the average degree over the neighbours of nodes. This effect, which is a classic example of sampling bias, has attracted much attention in the social science and network science literature, with variations and extensions of the paradox being defined, tested and interpreted. Here, we show that a version of the paradox holds rigorously for eigenvector centrality: on average, our friends are more important than us. We then consider general matrix-function centrality, including Katz centrality, and give sufficient conditions for the paradox to hold. We also discuss which results can be generalized to the cases of directed and weighted edges. In this way, we add theoretical support for a field that has largely been evolving through empirical testing.


2019 ◽  
Vol 1 (1) ◽  
Author(s):  
D C Blackburn ◽  
G Giribet ◽  
D E Soltis ◽  
E L Stanley

Abstract Although our inventory of Earth’s biodiversity remains incomplete, we still require analyses using the Tree of Life to understand evolutionary and ecological patterns. Because incomplete sampling may bias our inferences, we must evaluate how future additions of newly discovered species might impact analyses performed today. We describe an approach that uses taxonomic history and phylogenetic trees to characterize the impact of past species discoveries on phylogenetic knowledge using patterns of branch-length variation, tree shape, and phylogenetic diversity. This provides a framework for assessing the relative completeness of taxonomic knowledge of lineages within a phylogeny. To demonstrate this approach, we use recent large phylogenies for amphibians, reptiles, flowering plants, and invertebrates. Well-known clades exhibit a decline in the mean and range of branch lengths that are added each year as new species are described. With increased taxonomic knowledge over time, deep lineages of well-known clades become known such that most recently described new species are added close to the tips of the tree, reflecting changing tree shape over the course of taxonomic history. The same analyses reveal other clades to be candidates for future discoveries that could dramatically impact our phylogenetic knowledge. Our work reveals that species are often added non-randomly to the phylogeny over multiyear time-scales in a predictable pattern of taxonomic maturation. Our results suggest that we can make informed predictions about how new species will be added across the phylogeny of a given clade, thus providing a framework for accommodating unsampled undescribed species in evolutionary analyses.


2012 ◽  
Vol 93 (9) ◽  
pp. 1996-2007 ◽  
Author(s):  
Kim B. Westgeest ◽  
Miranda de Graaf ◽  
Mathieu Fourment ◽  
Theo M. Bestebroer ◽  
Ruud van Beek ◽  
...  

Each year, influenza viruses cause epidemics by evading pre-existing humoral immunity through mutations in the major glycoproteins: the haemagglutinin (HA) and the neuraminidase (NA). In 2004, the antigenic evolution of HA of human influenza A (H3N2) viruses was mapped (Smith et al., Science 305, 371–376, 2004) from its introduction in humans in 1968 until 2003. The current study focused on the genetic evolution of NA and compared it with HA using the dataset of Smith and colleagues, updated to the epidemic of the 2009/2010 season. Phylogenetic trees and genetic maps were constructed to visualize the genetic evolution of NA and HA. The results revealed multiple reassortment events over the years. Overall rates of evolutionary change were lower for NA than for HA1 at the nucleotide level. Selection pressures were estimated, revealing an abundance of negatively selected sites and sparse positively selected sites. The differences found between the evolution of NA and HA1 warrant further analysis of the evolution of NA at the phenotypic level, as has been done previously for HA.


Author(s):  
O. Smutko ◽  
L. Radchenko ◽  
A. Mironenko

The aim of the present study was identifying of molecular and genetic changes in hemaglutinin (HA), neuraminidase (NA) and non-structure protein (NS1) genes of pandemic influenza A(H1N1)pdm09 strains, that circulated in Ukraine during 2015-2016 epidemic season. Samples (nasopharyngeal swabs from patients) were analyzed using real-time polymerase chain reaction (RTPCR). Phylogenetic trees were constructed using MEGA 7 software. 3D structures were constructed in Chimera 1.11.2rc software. Viruses were collected in 2015-2016 season fell into genetic group 6B and in two emerging subgroups, 6B.1 and 6B.2 by gene of HA and NA. Subgroups 6B.1 and 6B.2 are defined by the following amino acid substitutions. In the NS1 protein were identified new amino acid substitutions D2E, N48S, and E125D in 2015-2016 epidemic season. Specific changes were observed in HA protein antigenic sites, but viruses saved similarity to vaccine strain. NS1 protein acquired substitution associated with increased virulence of the influenza virus.


2021 ◽  
Author(s):  
Muhsen Hammoud ◽  
Charles Morphy Santos ◽  
Joao Paulo Gois

Current side-by-side phylogenetic trees comparison frameworks face two issues: (1) accepting binary trees as input, and (2) assuming input trees having identical or highly overlapping taxa. We present a task abstraction of the problem of side-by-side comparison of two phylogenetic trees and propose a set-based measure for detailed structural comparison between two phylogenetic trees, which can be non-binary and not highly overlapping. iPhyloC is an interactive web-based framework including automatic identification of the common taxa in both trees, comparing input trees in several modes, intuitive design, high usability, scalability to large trees, and cross-platform support. iPhyloC was tested in hypothetical and real biological examples.


Pneumologia ◽  
2021 ◽  
Vol 69 (3) ◽  
pp. 151-158
Author(s):  
Raluca Ioana Dospinescu Arcana ◽  
Radu Crișan-Dabija ◽  
Anda Tesloianu ◽  
Daniela Robu Popa ◽  
Oana-Elena Rohozneanu ◽  
...  

Abstract Considering the increased prevalence of influenza infections in the cold season and the pandemic evolution of severe acute respiratory syndrome-CoV-2 (SARS-CoV-2), the medical staffs are facing potential viral co-infection with SARS-CoV-2 and influenza virus. Both viruses belong to the category of ribonucleic acid (RNA) viruses, having common structural features, causing a similar immune response, with a related mode of transmission and with both respiratory and general symptoms. SARS-CoV-2 and influenza viruses cause contagious infections and the protective measures against them are the same: wearing masks in crowded spaces, proper hand hygiene and avoiding crowded places. Co-infections with influenza A and B viruses and SARS-CoV-2 virus involve additional precautions regarding the therapeutic and evolution approach. Studies show that patients who have been vaccinated against influenza have developed milder forms of confirmed SARS-CoV-2 infection. In elderly patients, increased influenza vaccination coverage has shown to be associated with a decrease in mortality rate and also reduced the heavy impact of double infection. The Influenza vaccine can trigger early immune mechanisms in order to facilitate early detection of SARS-CoV-2 as well as its clearance. Influenza vaccination should now be seen, more than ever, as a strategy to combat the growing SARS-CoV-2 pandemic, especially in vulnerable populations (elderly and people with associated comorbidities).


BMC Genomics ◽  
2020 ◽  
Vol 21 (S10) ◽  
Author(s):  
Samuel Briand ◽  
Christophe Dessimoz ◽  
Nadia El-Mabrouk ◽  
Manuel Lafond ◽  
Gabriela Lobinska

Abstract Background The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). Results We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. Conclusions We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf.


2020 ◽  
Vol 21 (S6) ◽  
Author(s):  
Sriram P. Chockalingam ◽  
Jodh Pannu ◽  
Sahar Hooshmand ◽  
Sharma V. Thankachan ◽  
Srinivas Aluru

Abstract Background Alignment-free methods for sequence comparisons have become popular in many bioinformatics applications, specifically in the estimation of sequence similarity measures to construct phylogenetic trees. Recently, the average common substring measure, ACS, and its k-mismatch counterpart, ACSk, have been shown to produce results as effective as multiple-sequence alignment based methods for reconstruction of phylogeny trees. Since computing ACSk takes O(n logkn) time and hence impractical for large datasets, multiple heuristics that can approximate ACSk have been introduced. Results In this paper, we present a novel linear-time heuristic to approximate ACSk, which is faster than computing the exact ACSk while being closer to the exact ACSk values compared to previously published linear-time greedy heuristics. Using four real datasets, containing both DNA and protein sequences, we evaluate our algorithm in terms of accuracy, runtime and demonstrate its applicability for phylogeny reconstruction. Our algorithm provides better accuracy than previously published heuristic methods, while being comparable in its applications to phylogeny reconstruction. Conclusions Our method produces a better approximation for ACSk and is applicable for the alignment-free comparison of biological sequences at highly competitive speed. The algorithm is implemented in Rust programming language and the source code is available at https://github.com/srirampc/adyar-rs.


2017 ◽  
Vol 91 (12) ◽  
Author(s):  
Netanel Tzarum ◽  
Ryan McBride ◽  
Corwin M. Nycholat ◽  
Wenjie Peng ◽  
James C. Paulson ◽  
...  

ABSTRACT Influenza A H15 viruses are members of a subgroup (H7-H10-H15) of group 2 hemagglutinin (HA) subtypes that include H7N9 and H10N8 viruses that were isolated from humans during 2013. The isolation of avian H15 viruses is, however, quite rare and, until recently, geographically restricted to wild shorebirds and waterfowl in Australia. The HAs of H15 viruses contain an insertion in the 150-loop (loop beginning at position 150) of the receptor-binding site common to this subgroup and a unique insertion in the 260-loop compared to any other subtype. Here, we show that the H15 HA has a high preference for avian receptor analogs by glycan array analyses. The H15 HA crystal structure reveals that it is structurally closest to H7N9 HA, but the head domain of the H15 trimer is wider than all other HAs due to a tilt and opening of the HA1 subunits of the head domain. The extended 150-loop of the H15 HA retains the conserved conformation as in H7 and H10 HAs. Furthermore, the elongated 260-loop increases the exposed HA surface and can contribute to antigenic variation in H15 HAs. Since avian-origin H15 HA viruses have been shown to cause enhanced disease in mammalian models, further characterization and immune surveillance of H15 viruses are warranted. IMPORTANCE In the last 2 decades, an apparent increase has been reported for cases of human infection by emerging avian influenza A virus subtypes, including H7N9 and H10N8 viruses isolated during 2013. H15 is the other member of the subgroup of influenza A virus group 2 hemagglutinins (HAs) that also include H7 and H10. H15 viruses have been restricted to Australia, but recent isolation of H15 viruses in western Siberia suggests that they could be spread more globally via the avian flyways that converge and emanate from this region. Here we report on characterization of the three-dimensional structure and receptor specificity of the H15 hemagglutinin, revealing distinct features and specificities that can aid in global surveillance of such viruses for potential spread and emerging threat to the human population.


Sign in / Sign up

Export Citation Format

Share Document