Analysis of The Nucleotide Sequence Diversity of the Lassa Virus and Augmenting its Phylogenetic Tree

Lassa Virus (LASV) is the etiological catalyst for Lassa fever, an acute hemorrhagic disease with a mortality rate of 15%. Many aspects of the Lassa virus are not understood, like the causation of deafness in ⅓ of surviving patients or why symptoms are benign for 80% of those infected with the virus. Ambiguities like these suggest that there might exist some genomic heterogeneity among infecting viruses and demonstrate a need to quantify and analyze polymorphisms within LASV. Patterns that emerge from phylogenetic trees can be used to assess the structure of a population while also providing insights to the genetic makeup. The purpose of this investigation was to develop a more streamlined means of calculating nucleotide diversity within a subpopulation of Lassa virus strains and to augment a phylogenetic tree of the Lassa Virus glycoprotein precursor (GPC) segment. A total of 25 partial and complete data sequences of LASV strains were obtained from the Genbank Archives. During phase one of this investigation, the sequence data was inputted into MEGA analytical software and the sequence diversity was derived on a nucleotide level. Data from the individual strand sequences was used to augment a phylogenetic tree using Treeview X software. In phase two of this investigation, an algorithm was created using RStudio, with BSGenome and BioStrings extensions. The sequence diversity derived from the statistical analyses on MEGA was compared to that of the algorithm created. A p-value of 0.08 was found, which deviates from the accepted range of non-medical p-value of 0.00 to 0.05. It is suggested that future research focuses on creating a refurbished version of the algorithm to calculate a nucleotide diversity within a percent error of 5%.

Download Full-text

Identification of Protective Lassa Virus Epitopes That Are Restricted by HLA-A2

Journal of Virology ◽

10.1128/jvi.00896-06 ◽

2006 ◽

Vol 80 (17) ◽

pp. 8351-8361 ◽

Cited By ~ 44

Author(s):

Jason Botten ◽

Jeff Alexander ◽

Valerie Pasquetto ◽

John Sidney ◽

Polly Barrowman ◽

...

Keyword(s):

Neutralizing Antibodies ◽

Sequence Data ◽

Recombinant Vaccinia Virus ◽

Target Cells ◽

Lassa Virus ◽

Primary Role ◽

Emerging Pathogens ◽

Glycoprotein Precursor ◽

Predictive Algorithm ◽

Positive Target

ABSTRACT Recovery from Lassa virus (LASV) infection usually precedes the appearance of neutralizing antibodies, indicating that cellular immunity plays a primary role in viral clearance. To date, the role of LASV-specific CD8+ T cells has not been evaluated in humans. To facilitate such studies, we utilized a predictive algorithm to identify candidate HLA-A2 supertype epitopes from the LASV nucleoprotein and glycoprotein precursor (GPC) genes. We identified three peptides (GPC42-50, GLVGLVTFL; GPC60-68, SLYKGVYEL; and GPC441-449, YLISIFLHL) that displayed high-affinity binding (≤98 nM) to HLA-A*0201, induced CD8+ T-cell responses of high functional avidity in HLA-A*0201 transgenic mice, and were naturally processed from native LASV GPC in human HLA-A*0201-positive target cells. HLA-A*0201 mice immunized with either GPC42-50 or GPC60-68 were protected against challenge with a recombinant vaccinia virus that expressed LASV GPC. The epitopes identified in this study represent potential diagnostic reagents and candidates for inclusion in epitope-based vaccine constructs. Our approach is applicable to any pathogen with existing sequence data, does not require manipulation of the actual pathogen or access to immune human donors, and should therefore be generally applicable to category A through C agents and other emerging pathogens.

Download Full-text

Techniques for the verification of minimal phylogenetic trees illustrated with ten mammalian haemoglobin sequences

Biochemical Journal ◽

10.1042/bj1870065 ◽

1980 ◽

Vol 187 (1) ◽

pp. 65-74 ◽

Cited By ~ 12

Author(s):

D Penny ◽

M D Hendy ◽

L R Foulds

Keyword(s):

Amino Acid ◽

Phylogenetic Tree ◽

Protein Sequence ◽

Phylogenetic Trees ◽

Sequence Data ◽

Protein Sequences ◽

Nucleotide Sequences ◽

Amino Acid Sequences ◽

Minimal Tree ◽

Protein Sequence Data

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.

Download Full-text

Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

10.7287/peerj.preprints.1106v1 ◽

2015 ◽

Author(s):

Jennifer Fouquier ◽

Jai R Rideout ◽

Evan Bolyen ◽

John H Chase ◽

Arron Shiffer ◽

...

Keyword(s):

Phylogenetic Tree ◽

Genetic Marker ◽

Phylogenetic Trees ◽

Phylogenetic Diversity ◽

Sequence Data ◽

Fungal Species ◽

Bioinformatics Tool ◽

Hybrid Gene ◽

Fungal Database ◽

Taxonomic Groups

Ghost-tree is a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach uses one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families) as a “foundation” phylogeny. A second, more rapidly evolving genetic marker is then used to build “extension” phylogenies for more closely related organisms (e.g., fungal species or strains) that are then grafted on to the foundation tree by mapping taxonomic names. We apply ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. The result is a phylogenetic tree, compatible with the commonly used UNITE fungal database, that supports phylogenetic diversity analysis (e.g., UniFrac) of fungal communities profiled using ITS markers. Availability: ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree.

Download Full-text

Phylogenetic tree shapes resolve disease transmission patterns

10.1101/003194 ◽

2014 ◽

Cited By ~ 1

Author(s):

Caroline Colijn ◽

Jennifer Gardy

Keyword(s):

Phylogenetic Tree ◽

Real World ◽

Disease Transmission ◽

Phylogenetic Trees ◽

Sequence Data ◽

Communicable Disease ◽

Disease Outbreaks ◽

Transmission Dynamics ◽

Topological Features ◽

Computationally Intensive

AbstractWhole genome sequencing is becoming popular as a tool for understanding outbreaks of communicable diseases, with phylogenetic trees being used to identify individual transmission events or to characterize outbreak-level overall transmission dynamics. Existing methods to infer transmission dynamics from sequence data rely on well-characterised infectious periods, epidemiological and clinical meta-data which may not always be available, and typically require computationally intensive analysis focussing on the branch lengths in phylogenetic trees. We sought to determine whether the topological structures of phylogenetic trees contain signatures of the overall transmission patterns underyling an outbreak. Here we use simulated outbreaks to train and then test computational classifiers. We test the method on data from two real-world outbreaks. We find that different transmission patterns result in quantitatively different phylogenetic tree shapes. We describe five topological features that summarize a phylogeny’s structure and find that computational classifiers based on these are capable of predicting an outbreak’s transmission dynamics. The method is robust to variations in the transmission parameters and network types, and recapitulates known epidemiology of previously characterized real-world outbreaks. We conclude that there are simple structural properties of phylogenetic trees which, when combined, can distinguish communicable disease outbreaks with a super-spreader, homogeneous transmission, and chains of transmission. This is possible using genome data alone, and can be done during an outbreak. We discuss the implications for management of outbreaks.

Download Full-text

Determining a novel feature-space for SARS-CoV-2 sequence data

10.37044/osf.io/xt7gw ◽

2020 ◽

Author(s):

Francesco Ballesio ◽

Ali Haider Bangash ◽

Didier Barradas-Bautista ◽

Justin Barton ◽

Andrea Guarracino ◽

...

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Mhc Class I ◽

Phylogenetic Trees ◽

Data Science ◽

Sequence Data ◽

Protein Sequences ◽

Feature Space ◽

Future Research ◽

Alignment Free

The pandemicity & the ability of the SARS-COV-2 to reinfect a cured subject, among other damaging characteristics of it, took everybody by surprise. A global collaborative scientific effort was direly required to bring learned people from different niches of medicine & data science together. Such a platform was provided by COVID19 Virtual BioHackathon, organized from the 5th to the 11th of April, 2020, to ponder on the related pressing issues varying in their diversity from text mining to genomics. Under the "Machine learning" track, we determined optimal k-mer length for feature extraction, constructed continuous distributed representations for protein sequences to create phylogenetic trees in an alignment-free manner, and clustered predicted MHC class I and II binding affinity to aid in vaccine design. All the related work in available in a Github repository under an MIT license for future research.

Download Full-text

Compressing Streams of Phylogenetic Trees

10.1101/440644 ◽

2018 ◽

Author(s):

Axel Trefzer ◽

Alexandros Stamatakis

Keyword(s):

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Sequence Data ◽

Branch Length ◽

Distinct Species ◽

Mcmc Methods ◽

Molecular Sequence Data ◽

Molecular Sequence ◽

Posterior Probability Distribution ◽

Tree Compression

AbstractBayesian Markov-Chain Monte Carlo (MCMC) methods for phylogenetic tree inference, that is, inference of the evolutionary history of distinct species using their molecular sequence data, typically generate large sets of phylogenetic trees. The trees generated by the MCMC procedure are samples of the posterior probability distribution that MCMC methods approximate. Thus, they generate a stream of correlated binary trees that need to be stored. Here, we adapt state-of-the art algorithms for binary tree compression to phylogenetic tree data streams and extend them to also store the required meta-data. On a phylogenetic tree stream containing 1, 000 trees with 500 leaves including branch length values, we achieve a compression rate of 5.4 compared to the uncompressed tree files and of 1.8 compared to bzip2-compressed tree files. For compressing the same trees, but without branch length values, our compression method is approximately an order of magnitude better than bzip2. A prototype implementation is available at https://github.com/axeltref/tree-compression.git.

Download Full-text

SeqDistK: a Novel Tool for Alignment-free Phylogenetic Analysis

10.1101/2021.08.16.456500 ◽

2021 ◽

Author(s):

Xuemei Liu ◽

Wen Li ◽

Guanda Huang ◽

Tianlai Huang ◽

Qingang Xiong ◽

...

Keyword(s):

Phylogenetic Analysis ◽

16S Rrna ◽

Phylogenetic Tree ◽

Phylogenetic Trees ◽

Large Scale ◽

Sequence Data ◽

Ground Truth ◽

Group Method ◽

Metagenomic Sequence ◽

Alignment Free

Algorithms for constructing phylogenetic trees are fundamental to study the evolution of viruses, bacteria, and other microbes. Established multiple alignment-based algorithms are inefficient for large scale metagenomic sequence data because of their high requirement of inter-sequence correlation and high computational complexity. In this paper, we present SeqDistK, a novel tool for alignment-free phylogenetic analysis. SeqDistK computes the dissimilarity matrix for phylogenetic analysis, incorporating seven k-mer based dissimilarity measures, namely d2, d2S, d2star, Euclidean, Manhattan, CVTree, and Chebyshev. Based on these dissimilarities, SeqDistK constructs phylogenetic tree using the Unweighted Pair Group Method with Arithmetic Mean algorithm. Using a golden standard dataset of 16S rRNA and its associated phylogenetic tree, we compared SeqDistK to Muscle - a multi sequence aligner. We found SeqDistK was not only 38 times faster than Muscle in computational efficiency but also more accurate. SeqDistK achieved the smallest symmetric difference between the inferred and ground truth trees with a range between 13 to 18, while that of Muscle was 62. When measures d2, d2star, d2S, Euclidean, and k-mer size k=5 were used, SeqDistK consistently inferred phylogenetic tree almost identical to the ground truth tree. We also performed clustering of 16S rRNA sequences using SeqDistK and found the clustering was highly consistent with known biological taxonomy. Among all the measures, d2S (k=5, M=2) showed the best accuracy as it correctly clustered and classified all sample sequences. In summary, SeqDistK is a novel, fast and accurate alignment-free tool for large-scale phylogenetic analysis. SeqDistK software is freely available at https://github.com/htczero/SeqDistK.

Download Full-text

Ghost-tree: creating hybrid-gene phylogenetic trees for diversity analyses

10.7287/peerj.preprints.1106 ◽

2015 ◽

Author(s):

Jennifer Fouquier ◽

Jai R Rideout ◽

Evan Bolyen ◽

John H Chase ◽

Arron Shiffer ◽

...

Keyword(s):

Phylogenetic Tree ◽

Genetic Marker ◽

Phylogenetic Trees ◽

Phylogenetic Diversity ◽

Sequence Data ◽

Fungal Species ◽

Bioinformatics Tool ◽

Hybrid Gene ◽

Fungal Database ◽

Taxonomic Groups

Download Full-text

Computing nearest neighbour interchange distances between ranked phylogenetic trees

Journal of Mathematical Biology ◽

10.1007/s00285-021-01567-5 ◽

2021 ◽

Vol 82 (1-2) ◽

Author(s):

Lena Collienne ◽

Alex Gavryushkin

Keyword(s):

Cancer Research ◽

Computational Complexity ◽

Phylogenetic Tree ◽

Shortest Path ◽

Phylogenetic Trees ◽

Shortest Paths ◽

Nearest Neighbour ◽

Tree Inference ◽

Subtree Prune And Regraft ◽

Comparison Algorithms

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).

Download Full-text

Contracture Severity at Hospital Discharge in Children: A Burn Model System Database Study

Journal of Burn Care & Research ◽

10.1093/jbcr/iraa169 ◽

2020 ◽

Author(s):

Miranda Yelvington ◽

Matthew Godleski ◽

Austin F Lee ◽

Jeremy Goverman ◽

Ingrid Parry ◽

...

Keyword(s):

Length Of Stay ◽

Range Of Motion ◽

Hospital Discharge ◽

Injury Severity ◽

Model System ◽

Future Research ◽

P Value ◽

Local Practices ◽

Improvement Measures ◽

Joint Motions

Abstract Contractures can complicate burn recovery. There are limited studies examining the prevalence of contractures following burns in pediatrics. This study investigates contracture outcomes by location, injury, severity, length of stay, and developmental stage. Data were obtained from the Burn Model System between 1994 and 2003. All patients younger than the age of 18 with at least one joint contracture at hospital discharge were included. Sixteen areas of impaired movement from the shoulder, elbow, wrist, hand, hip, knee, and ankle joints were examined. Analysis of variance was used to assess the association between contracture severity, burn size, and length of stay. Age groupings were evaluated for developmental patterns. A P value of less than .05 was considered statistically significant. Data from 225 patients yielded 1597 contractures (758 in the hand) with a mean of 7.1 contractures (median 4) per patient. Mean contracture severity ranged from 17% (elbow extension) to 41% (ankle plantarflexion) loss of movement. Statistically significant associations were found between active range of motion loss and burn size, length of stay, and age groupings. The data illustrate quantitative assessment of burn contractures in pediatric patients at discharge in a multicenter database. Size of injury correlates with range of motion loss for many joint motions, reflecting the anticipated morbidity of contracture for pediatric burn survivors. These results serve as a potential reference for range of motion outcomes in the pediatric burn population, which could serve as a comparison for local practices, quality improvement measures, and future research.

Download Full-text