scholarly journals Fast Whole-Genome Phylogeny of the COVID-19 Virus SARS-CoV-2 by Compression

2020 ◽  
Author(s):  
Rudi L. Cilibrasi ◽  
Paul M.B. Vitányi

AbstractWe analyze the whole genome phylogeny and taxonomy of the SARS-CoV-2 virus using compression. This is a new fast alignment-free method called the “normalized compression distance” (NCD) method. It discovers all effective similarities based on Kolmogorov complexity. The latter being incomputable we approximate it by a good compressor such as the modern zpaq. The results comprise that the SARS-CoV-2 virus is closest to the RaTG13 virus and similar to two bat SARS-like coronaviruses bat-SL-CoVZXC21 and bat-SL-CoVZC4. The similarity is quantified and compared with the same quantified similarities among the mtDNA of certain species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The compression method is simpler and possibly faster than any other whole genome method, which makes it the ideal tool to explore phylogeny.

2021 ◽  
Author(s):  
Paul Vitanyi ◽  
Rudi Cilibrasi

We analyze the whole-genome phylogeny and taxonomy of the SARS-CoV-2 virus, causing the COVID-19 disease, using compression in the form of the alignment-free NCD (Normalized Compression Distance) method to assess similarity. We compare the SARS-CoV-2 virus with a database of over 6,500 viruses. The results comprise that the SARS-CoV-2 virus is closest in that database to the RaTG13 virus and rather close to the bat SARS-like corona viruses bat-SL-CoVZXC21 and bat-SL-CoVZC45. Over 6,500 viruses are identified (given by their registration code) with larger NCD's. The NCD's are compared with the NCD's between the mtDNA's of familiar species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The NCD method or shortly the {\em compression method} is simpler and possibly faster than any other whole-genome method, which makes it the ideal tool to explore phylogeny. Here we use it for the complex case of determining this similarity between the COVID-19 virus SARS-CoV-2 and many other viruses. The resulting phylogeny and taxonomy closely matches earlier efforts by alignment-based methods and a machine-learning method, providing the most compelling evidence to date for the compression method showing that one can achieve equivalent results both simply and fast.


2021 ◽  
Author(s):  
Paul Vitanyi ◽  
Rudi Cilibrasi

We analyze the whole-genome phylogeny and taxonomy of the SARS-CoV-2 virus, causing the COVID-19 disease, using compression in the form of the alignment-free NCD (Normalized Compression Distance) method to assess similarity. We compare the SARS-CoV-2 virus with a database of over 6,500 viruses. The results comprise that the SARS-CoV-2 virus is closest in that database to the RaTG13 virus and rather close to the bat SARS-like corona viruses bat-SL-CoVZXC21 and bat-SL-CoVZC45. Over 6,500 viruses are identified (given by their registration code) with larger NCD's. The NCD's are compared with the NCD's between the mtDNA's of familiar species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The NCD method or shortly the {\em compression method} is simpler and possibly faster than any other whole-genome method, which makes it the ideal tool to explore phylogeny. Here we use it for the complex case of determining this similarity between the COVID-19 virus SARS-CoV-2 and many other viruses. The resulting phylogeny and taxonomy closely matches earlier efforts by alignment-based methods and a machine-learning method, providing the most compelling evidence to date for the compression method showing that one can achieve equivalent results both simply and fast.


2021 ◽  
Author(s):  
Rudi L. Cilibrasi ◽  
Paul M.B. Vitanyi

Abstract We analyze the whole-genome phylogeny and taxonomy of the SARS-CoV-2 virus, causing the COVID-19 disease, using compres- sion in the form of the alignment-free NCD (Normalized Compression Distance) method to assess similarity. We compare the SARS-CoV-2 virus with a database of over 6,500 viruses. The results comprise that the SARS- CoV-2 virus is closest in that database to the RaTG13 virus and rather close to the bat SARS-like corona viruses bat-SL-CoVZXC21 and bat-SL- CoVZC45. Over 6,500 viruses are identified (given by their registration code) with larger NCD’s. The NCD’s are compared with the NCD’s between the mtDNA’s of familiar species. We treat the question whether Pangolins are involved in the SARS-CoV-2 virus. The NCD method or shortly the compression method is simpler and possibly faster than any other whole-genome method, which makes it the ideal tool to explore phylogeny. Here we use it for the complex case of determining this similarity between the COVID-19 virus SARS-CoV-2 and many other viruses. The resulting phylogeny and taxonomy closely matches earlier efforts by alignment-based methods and a machine-learning method, providing the most compelling evidence to date for the compression method showing that one can achieve equivalent results both simply and fast.


2013 ◽  
Vol 30 (5) ◽  
pp. 1032-1037 ◽  
Author(s):  
Jinkui Cheng ◽  
Fuliang Cao ◽  
Zhihua Liu

Abstract Phylogenetic analysis based on alignment method meets huge challenges when dealing with whole-genome sequences, for example, recombination, shuffling, and rearrangement of sequences. Thus, various alignment-free methods for phylogeny construction have been proposed. However, most of these methods have not been implemented as tools or web servers. Researchers cannot use these methods easily with their data sets. To facilitate the usage of various alignment-free methods, we implemented most of the popular alignment-free methods and constructed a user-friendly web server for alignment-free genome phylogeny (AGP). AGP integrated the phylogenetic tree construction, visualization, and comparison functions together. Both AGP and all source code of the methods are available at http://www.herbbol.org:8000/agp (last accessed February 26, 2013). AGP will facilitate research in the field of whole-genome phylogeny and comparison.


2010 ◽  
Vol 192 (7) ◽  
pp. 1751-1760 ◽  
Author(s):  
Esther Julián ◽  
Mónica Roldán ◽  
Alejandro Sánchez-Chardi ◽  
Oihane Astola ◽  
Gemma Agustí ◽  
...  

ABSTRACT The aggregation of mycobacterial cells in a definite order, forming microscopic structures that resemble cords, is known as cord formation, or cording, and is considered a virulence factor in the M ycobacterium tuberculosis complex and the species M ycobacterium marinum. In the 1950s, cording was related to a trehalose dimycolate lipid that, consequently, was named the cord factor. However, modern techniques of microbial genetics have revealed that cording can be affected by mutations in genes not directly involved in trehalose dimycolate biosynthesis. Therefore, questions such as “How does mycobacterial cord formation occur?” and “Which molecular factors play a role in cord formation?” remain unanswered. At present, one of the problems in cording studies is the correct interpretation of cording morphology. Using optical microscopy, it is sometimes difficult to distinguish between cording and clumping, which is a general property of mycobacteria due to their hydrophobic surfaces. In this work, we provide a new way to visualize cords in great detail using scanning electron microscopy, and we show the first scanning electron microscopy images of the ultrastructure of mycobacterial cords, making this technique the ideal tool for cording studies. This technique has enabled us to affirm that nonpathogenic mycobacteria also form microscopic cords. Finally, we demonstrate that a strong correlation exists between microscopic cords, rough colonial morphology, and increased persistence of mycobacteria inside macrophages.


Sign in / Sign up

Export Citation Format

Share Document