patristic distance
Recently Published Documents


TOTAL DOCUMENTS

7
(FIVE YEARS 4)

H-INDEX

1
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Christian Julian Villabona Arenas ◽  
Stephane Hue ◽  
James Baxter ◽  
Matthew Hall ◽  
Katrina A Lythgoe ◽  
...  

Inferring the direction of transmission between linked individuals living with HIV provides unparalleled power to understand the epidemiology that determines transmission. State-of-the-art approaches to infer directionality use phylogenetic ancestral state reconstruction to identify the individual in whom the most recent common ancestor of the virus populations originated. However, these methods vary in their accuracy when applied to different datasets and it is currently unclear under what circumstances inferring directionality is inaccurate and when bias is more likely. To evaluate the performance of phylogenetic ancestral state reconstruction, we inferred directionality for 112 HIV transmission pairs where the direction of transmission was known, and detailed additional information was available. Next, we fit a statistical model to evaluate the extent to which epidemiological, sampling, genetic and phylogenetic factors influenced the outcome of the inference. Third, we repeated the analysis under real-life conditions when only routinely collected data are available. We found that the inference of directionality depends principally on the topology class and branch length characteristics of the phylogeny. Specifically, directionality is most correctly inferred when the phylogenetic diversity and the minimum root-to-tip distance in the transmitter is greater than that of the recipient partner and when the minimum inter-host patristic distance is large. Similarly, under real-life conditions, the probability of identifying the correct transmitter increases from 52%--when a monophyletic-monophyletic or paraphyletic-polyphyletic tree topology is observed, when the sample size in both partners is small and when the tip closest to the root does not agree with the state at the root--to 93% when a paraphyletic-monophyletic topology is observed, when the sample size is large and when the tip closest to the root agrees with the state at the root. Our results support two conclusions. First, that discordance between previous studies in inferring transmission direction can be explained by differences in key phylogenetic properties that arise due to different evolutionary, epidemiological and sampling processes; and second that easily calculated metrics from the phylogenetic tree of the transmission pair can be used to evaluate the accuracy of inferring directionality under real-life conditions for use in population-wide studies. However, given that these methods entail considerable uncertainty, we strongly advise against using these methods for individual pair-level analysis.


2020 ◽  
Vol 6 (2) ◽  
Author(s):  
Kaat Ramaekers ◽  
Annabel Rector ◽  
Lize Cuypers ◽  
Philippe Lemey ◽  
Els Keyaerts ◽  
...  

Abstract Since the first human respiratory syncytial virus (HRSV) genotype classification in 1998, inconsistent conclusions have been drawn regarding the criteria that define HRSV genotypes and their nomenclature, challenging data comparisons between research groups. In this study, we aim to unify the field of HRSV genotype classification by reviewing the different methods that have been used in the past to define HRSV genotypes and by proposing a new classification procedure, based on well-established phylogenetic methods. All available complete HRSV genomes (>12,000 bp) were downloaded from GenBank and divided into the two subgroups: HRSV-A and HRSV-B. From whole-genome alignments, the regions that correspond to the open reading frame of the glycoprotein G and the second hypervariable region (HVR2) of the ectodomain were extracted. In the resulting partial alignments, the phylogenetic signal within each fragment was assessed. Maximum likelihood phylogenetic trees were reconstructed using the complete genome alignments. Patristic distances were calculated between all pairs of tips in the phylogenetic tree and summarized as a density plot in order to determine a cutoff value at the lowest point following the major distance peak. Our data show that neither the HVR2 fragment nor the G gene contains sufficient phylogenetic signal to perform reliable phylogenetic reconstruction. Therefore, whole-genome alignments were used to determine HRSV genotypes. We define a genotype using the following criteria: a bootstrap support of ≥70 per cent for the respective clade and a maximum patristic distance between all members of the clade of ≤0.018 substitutions per site for HRSV-A or ≤0.026 substitutions per site for HRSV-B. By applying this definition, we distinguish twenty-three genotypes within subtype HRSV-A and six genotypes within subtype HRSV-B. Applying the genotype criteria on subsampled data sets confirmed the robustness of the method.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Kimia Kamelian ◽  
Vincent Montoya ◽  
Andrea Olmstead ◽  
Winnie Dong ◽  
Richard Harrigan ◽  
...  

Abstract In 2018, the World Health Organization identified the Zika virus (ZIKV) as a pathogen that should be prioritized for public health research due to its epidemic potential. In this study, whole-genome sequencing (WGS) of travel-acquired ZIKV infections was used to examine the limitations of phylogenetic analysis. WGS and phylogenetic analysis were performed to investigate geographic clustering of samples from five Canadians with travel-acquired ZIKV infections and to assess the limitations of phylogenetic analysis of ZIKV sequences using a phylogenetic cluster approach. Genomic variability of ZIKV samples was assessed and for context, compared with hepatitis C virus (HCV) samples. Phylogenetic analysis confirmed the suspected region of ZIKV infection for one of five samples and one sample failed to cluster with sequences from its suspected country of infection. Travel-acquired ZIKV samples depicted low genomic variability relative to HCV samples. A floating patristic distance threshold classified all pre-2000 ZIKV sequences into separate clusters, while only Cambodian, Peruvian, Malaysian, and South Korean sequences were similarly classifiable. While phylogenetic analysis of ZIKV data can identify the broad geographical region of ZIKV infection, ZIKV’s low genomic variability is likely to limit precise interpretations of phylogenetic analysis of the origins of travel-related cases.


2019 ◽  
Author(s):  
Luna L. Sánchez-Reyes ◽  
Brian C. O’Meara

abstractThe combination of new analytical techniques, availability of more fossil and molecular data, and better practices in data sharing has resulted in a steady accumulation of chronograms in public and open databases such as TreeBASE, Dryad, and Open Tree of Life for a large quantity and diversity of organisms in the last few decades. However, getting a tree with branch lengths proportional to time remains difficult for many biologists and the non-academic community, despite its importance in many areas of research, education, and science communication. datelife is a service implemented via an R package and a web site (http://www.datelife.org/) for efficient reuse, summary and reanalysis of published data on lineage divergence times. The main workflow starts with at least two taxon names as input, either as tip labels on a tree, or as a simple comma separated character string. A name search is then performed across the chronogram database and positively identified source trees are pruned to maintain queried taxa only and stored as a named list of patristic distance matrices. Source chronogram data can be summarised using branch length summary statistics or variance minimizing approaches to generate a single summary chronogram. Source chronogram data can also be used as calibration points to date a tree containing some or all names from the initial query. If there is no information available for any queried taxa, data can be simulated. All source and summary chronograms can be saved in formats that permit easy reuse and reanalysis. Summary and newly generated trees are potentially useful to evaluate evolutionary hypothesis in different areas of research in biology. How well this trees work for this purpose still needs to be tested. datelife will be useful to increase awereness on the existing variation in expert time of divergence data, and might foster exploration of the effect of alternative divergence time hypothesis on the results of analyses, nurturing a culture of more cautious interpretation of evolutionary results.


2018 ◽  
Author(s):  
Alvin X. Han ◽  
Edyth Parker ◽  
Frits Scholer ◽  
Sebastian Maurer-Stroh ◽  
Colin A. Russell

AbstractSub-species nomenclature systems of pathogens are increasingly based on sequence data. The use of phylogenetics to identify and differentiate between clusters of genetically similar pathogens is particularly prevalent in virology from the nomenclature of human papillomaviruses to highly pathogenic avian influenza (HPAI) H5Nx viruses. These nomenclature systems rely on absolute genetic distance thresholds to define the maximum genetic divergence tolerated between viruses designated as closely related. However, the phylogenetic clustering methods used in these nomenclature systems are limited by the arbitrariness of setting intra- and inter-cluster diversity thresholds. The lack of a consensus ground truth to define well-delineated, meaningful phylogenetic subpopulations amplifies the difficulties in identifying an informative distance threshold. Consequently, phylogenetic clustering often becomes an exploratory, ad-hoc exercise.Phylogenetic Clustering by Linear Integer Programming (PhyCLIP) was developed to provide a statistically-principled phylogenetic clustering framework that negates the need for an arbitrarily-defined distance threshold. Using the pairwise patristic distance distributions of an input phylogeny, PhyCLIP parameterises the intra- and inter-cluster divergence limits as statistical bounds in an integer linear programming model which is subsequently optimised to cluster as many sequences as possible. When applied to the haemagglutinin phylogeny of HPAI H5Nx viruses, PhyCLIP was not only able to recapitulate the current WHO/OIE/FAO H5 nomenclature system but also further delineated informative higher resolution clusters that capture geographically-distinct subpopulations of viruses. PhyCLIP is pathogen-agnostic and can be generalised to a wide variety of research questions concerning the identification of biologically informative clusters in pathogen phylogenies. PhyCLIP is freely available at http://github.com/alvinxhan/PhyCLIP.


Sign in / Sign up

Export Citation Format

Share Document