scholarly journals Analysis of The Nucleotide Sequence Diversity of the Lassa Virus and Augmenting its Phylogenetic Tree

2018 ◽  
Vol 4 (1) ◽  
pp. 21-26
Author(s):  
Sean Oddoye

Lassa Virus (LASV) is the etiological catalyst for Lassa fever, an acute hemorrhagic disease with a mortality rate of 15%. Many aspects of the Lassa virus are not understood, like the causation of deafness in ⅓ of surviving patients or why symptoms are benign for 80% of those infected with the virus. Ambiguities like these suggest that there might exist some genomic heterogeneity among infecting viruses and demonstrate a need to quantify and analyze polymorphisms within LASV. Patterns that emerge from phylogenetic trees can be used to assess the structure of a population while also providing insights to the genetic makeup. The purpose of this investigation was to develop a more streamlined means of calculating nucleotide diversity within a subpopulation of Lassa virus strains and to augment a phylogenetic tree of the Lassa Virus glycoprotein precursor (GPC) segment. A total of 25 partial and complete data sequences of LASV strains were obtained from the Genbank Archives. During phase one of this investigation, the sequence data was inputted into MEGA analytical software and the sequence diversity was derived on a nucleotide level. Data from the individual strand sequences was used to augment a phylogenetic tree using Treeview X software. In phase two of this investigation, an algorithm was created using RStudio, with BSGenome and BioStrings extensions. The sequence diversity derived from the statistical analyses on MEGA was compared to that of the algorithm created. A p-value of 0.08 was found, which deviates from the accepted range of non-medical p-value of 0.00 to 0.05. It is suggested that future research focuses on creating a refurbished version of the algorithm to calculate a nucleotide diversity within a percent error of 5%.

2006 ◽  
Vol 80 (17) ◽  
pp. 8351-8361 ◽  
Author(s):  
Jason Botten ◽  
Jeff Alexander ◽  
Valerie Pasquetto ◽  
John Sidney ◽  
Polly Barrowman ◽  
...  

ABSTRACT Recovery from Lassa virus (LASV) infection usually precedes the appearance of neutralizing antibodies, indicating that cellular immunity plays a primary role in viral clearance. To date, the role of LASV-specific CD8+ T cells has not been evaluated in humans. To facilitate such studies, we utilized a predictive algorithm to identify candidate HLA-A2 supertype epitopes from the LASV nucleoprotein and glycoprotein precursor (GPC) genes. We identified three peptides (GPC42-50, GLVGLVTFL; GPC60-68, SLYKGVYEL; and GPC441-449, YLISIFLHL) that displayed high-affinity binding (≤98 nM) to HLA-A*0201, induced CD8+ T-cell responses of high functional avidity in HLA-A*0201 transgenic mice, and were naturally processed from native LASV GPC in human HLA-A*0201-positive target cells. HLA-A*0201 mice immunized with either GPC42-50 or GPC60-68 were protected against challenge with a recombinant vaccinia virus that expressed LASV GPC. The epitopes identified in this study represent potential diagnostic reagents and candidates for inclusion in epitope-based vaccine constructs. Our approach is applicable to any pathogen with existing sequence data, does not require manipulation of the actual pathogen or access to immune human donors, and should therefore be generally applicable to category A through C agents and other emerging pathogens.


1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


2015 ◽  
Author(s):  
Jennifer Fouquier ◽  
Jai R Rideout ◽  
Evan Bolyen ◽  
John H Chase ◽  
Arron Shiffer ◽  
...  

Ghost-tree is a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach uses one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families) as a “foundation” phylogeny. A second, more rapidly evolving genetic marker is then used to build “extension” phylogenies for more closely related organisms (e.g., fungal species or strains) that are then grafted on to the foundation tree by mapping taxonomic names. We apply ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. The result is a phylogenetic tree, compatible with the commonly used UNITE fungal database, that supports phylogenetic diversity analysis (e.g., UniFrac) of fungal communities profiled using ITS markers. Availability: ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree.


2014 ◽  
Author(s):  
Caroline Colijn ◽  
Jennifer Gardy

AbstractWhole genome sequencing is becoming popular as a tool for understanding outbreaks of communicable diseases, with phylogenetic trees being used to identify individual transmission events or to characterize outbreak-level overall transmission dynamics. Existing methods to infer transmission dynamics from sequence data rely on well-characterised infectious periods, epidemiological and clinical meta-data which may not always be available, and typically require computationally intensive analysis focussing on the branch lengths in phylogenetic trees. We sought to determine whether the topological structures of phylogenetic trees contain signatures of the overall transmission patterns underyling an outbreak. Here we use simulated outbreaks to train and then test computational classifiers. We test the method on data from two real-world outbreaks. We find that different transmission patterns result in quantitatively different phylogenetic tree shapes. We describe five topological features that summarize a phylogeny’s structure and find that computational classifiers based on these are capable of predicting an outbreak’s transmission dynamics. The method is robust to variations in the transmission parameters and network types, and recapitulates known epidemiology of previously characterized real-world outbreaks. We conclude that there are simple structural properties of phylogenetic trees which, when combined, can distinguish communicable disease outbreaks with a super-spreader, homogeneous transmission, and chains of transmission. This is possible using genome data alone, and can be done during an outbreak. We discuss the implications for management of outbreaks.


2020 ◽  
Author(s):  
Francesco Ballesio ◽  
Ali Haider Bangash ◽  
Didier Barradas-Bautista ◽  
Justin Barton ◽  
Andrea Guarracino ◽  
...  

The pandemicity & the ability of the SARS-COV-2 to reinfect a cured subject, among other damaging characteristics of it, took everybody by surprise. A global collaborative scientific effort was direly required to bring learned people from different niches of medicine & data science together. Such a platform was provided by COVID19 Virtual BioHackathon, organized from the 5th to the 11th of April, 2020, to ponder on the related pressing issues varying in their diversity from text mining to genomics. Under the "Machine learning" track, we determined optimal k-mer length for feature extraction, constructed continuous distributed representations for protein sequences to create phylogenetic trees in an alignment-free manner, and clustered predicted MHC class I and II binding affinity to aid in vaccine design. All the related work in available in a Github repository under an MIT license for future research.


2018 ◽  
Author(s):  
Axel Trefzer ◽  
Alexandros Stamatakis

AbstractBayesian Markov-Chain Monte Carlo (MCMC) methods for phylogenetic tree inference, that is, inference of the evolutionary history of distinct species using their molecular sequence data, typically generate large sets of phylogenetic trees. The trees generated by the MCMC procedure are samples of the posterior probability distribution that MCMC methods approximate. Thus, they generate a stream of correlated binary trees that need to be stored. Here, we adapt state-of-the art algorithms for binary tree compression to phylogenetic tree data streams and extend them to also store the required meta-data. On a phylogenetic tree stream containing 1, 000 trees with 500 leaves including branch length values, we achieve a compression rate of 5.4 compared to the uncompressed tree files and of 1.8 compared to bzip2-compressed tree files. For compressing the same trees, but without branch length values, our compression method is approximately an order of magnitude better than bzip2. A prototype implementation is available at https://github.com/axeltref/tree-compression.git.


2021 ◽  
Author(s):  
Xuemei Liu ◽  
Wen Li ◽  
Guanda Huang ◽  
Tianlai Huang ◽  
Qingang Xiong ◽  
...  

Algorithms for constructing phylogenetic trees are fundamental to study the evolution of viruses, bacteria, and other microbes. Established multiple alignment-based algorithms are inefficient for large scale metagenomic sequence data because of their high requirement of inter-sequence correlation and high computational complexity. In this paper, we present SeqDistK, a novel tool for alignment-free phylogenetic analysis. SeqDistK computes the dissimilarity matrix for phylogenetic analysis, incorporating seven k-mer based dissimilarity measures, namely d2, d2S, d2star, Euclidean, Manhattan, CVTree, and Chebyshev. Based on these dissimilarities, SeqDistK constructs phylogenetic tree using the Unweighted Pair Group Method with Arithmetic Mean algorithm. Using a golden standard dataset of 16S rRNA and its associated phylogenetic tree, we compared SeqDistK to Muscle - a multi sequence aligner. We found SeqDistK was not only 38 times faster than Muscle in computational efficiency but also more accurate. SeqDistK achieved the smallest symmetric difference between the inferred and ground truth trees with a range between 13 to 18, while that of Muscle was 62. When measures d2, d2star, d2S, Euclidean, and k-mer size k=5 were used, SeqDistK consistently inferred phylogenetic tree almost identical to the ground truth tree. We also performed clustering of 16S rRNA sequences using SeqDistK and found the clustering was highly consistent with known biological taxonomy. Among all the measures, d2S (k=5, M=2) showed the best accuracy as it correctly clustered and classified all sample sequences. In summary, SeqDistK is a novel, fast and accurate alignment-free tool for large-scale phylogenetic analysis. SeqDistK software is freely available at https://github.com/htczero/SeqDistK.


2015 ◽  
Author(s):  
Jennifer Fouquier ◽  
Jai R Rideout ◽  
Evan Bolyen ◽  
John H Chase ◽  
Arron Shiffer ◽  
...  

Ghost-tree is a bioinformatics tool that integrates sequence data from two genetic markers into a single phylogenetic tree that can be used for diversity analyses. Our approach uses one genetic marker whose sequences can be aligned across organisms spanning divergent taxonomic groups (e.g., fungal families) as a “foundation” phylogeny. A second, more rapidly evolving genetic marker is then used to build “extension” phylogenies for more closely related organisms (e.g., fungal species or strains) that are then grafted on to the foundation tree by mapping taxonomic names. We apply ghost-tree to graft fungal extension phylogenies derived from ITS sequences onto a foundation phylogeny derived from fungal 18S sequences. The result is a phylogenetic tree, compatible with the commonly used UNITE fungal database, that supports phylogenetic diversity analysis (e.g., UniFrac) of fungal communities profiled using ITS markers. Availability: ghost-tree is pip-installable. All source code, documentation, and test code are available under the BSD license at https://github.com/JTFouquier/ghost-tree.


2021 ◽  
Vol 82 (1-2) ◽  
Author(s):  
Lena Collienne ◽  
Alex Gavryushkin

AbstractMany popular algorithms for searching the space of leaf-labelled (phylogenetic) trees are based on tree rearrangement operations. Under any such operation, the problem is reduced to searching a graph where vertices are trees and (undirected) edges are given by pairs of trees connected by one rearrangement operation (sometimes called a move). Most popular are the classical nearest neighbour interchange, subtree prune and regraft, and tree bisection and reconnection moves. The problem of computing distances, however, is $${\mathbf {N}}{\mathbf {P}}$$ N P -hard in each of these graphs, making tree inference and comparison algorithms challenging to design in practice. Although anked phylogenetic trees are one of the central objects of interest in applications such as cancer research, immunology, and epidemiology, the computational complexity of the shortest path problem for these trees remained unsolved for decades. In this paper, we settle this problem for the ranked nearest neighbour interchange operation by establishing that the complexity depends on the weight difference between the two types of tree rearrangements (rank moves and edge moves), and varies from quadratic, which is the lowest possible complexity for this problem, to $${\mathbf {N}}{\mathbf {P}}$$ N P -hard, which is the highest. In particular, our result provides the first example of a phylogenetic tree rearrangement operation for which shortest paths, and hence the distance, can be computed efficiently. Specifically, our algorithm scales to trees with tens of thousands of leaves (and likely hundreds of thousands if implemented efficiently).


Author(s):  
Miranda Yelvington ◽  
Matthew Godleski ◽  
Austin F Lee ◽  
Jeremy Goverman ◽  
Ingrid Parry ◽  
...  

Abstract Contractures can complicate burn recovery. There are limited studies examining the prevalence of contractures following burns in pediatrics. This study investigates contracture outcomes by location, injury, severity, length of stay, and developmental stage. Data were obtained from the Burn Model System between 1994 and 2003. All patients younger than the age of 18 with at least one joint contracture at hospital discharge were included. Sixteen areas of impaired movement from the shoulder, elbow, wrist, hand, hip, knee, and ankle joints were examined. Analysis of variance was used to assess the association between contracture severity, burn size, and length of stay. Age groupings were evaluated for developmental patterns. A P value of less than .05 was considered statistically significant. Data from 225 patients yielded 1597 contractures (758 in the hand) with a mean of 7.1 contractures (median 4) per patient. Mean contracture severity ranged from 17% (elbow extension) to 41% (ankle plantarflexion) loss of movement. Statistically significant associations were found between active range of motion loss and burn size, length of stay, and age groupings. The data illustrate quantitative assessment of burn contractures in pediatric patients at discharge in a multicenter database. Size of injury correlates with range of motion loss for many joint motions, reflecting the anticipated morbidity of contracture for pediatric burn survivors. These results serve as a potential reference for range of motion outcomes in the pediatric burn population, which could serve as a comparison for local practices, quality improvement measures, and future research.


Sign in / Sign up

Export Citation Format

Share Document