A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees

Abstract The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently-proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils—a command-line utility for rapidly querying, interpreting and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.

Download Full-text

Iroki: automatic customization and visualization of phylogenetic trees

10.1101/106138 ◽

2017 ◽

Cited By ~ 7

Author(s):

Ryan M. Moore ◽

Amelia O. Harrison ◽

Sean M. McAllister ◽

Shawn W. Polson ◽

K. Eric Wommack

Keyword(s):

Web Application ◽

Phylogenetic Trees ◽

Evolutionary History ◽

Phylogenetic Analyses ◽

Community Diversity ◽

Sequencing Data ◽

Link Type ◽

Large Trees ◽

History Of ◽

Microbial Groups

ABSTRACTPhylogenetic trees are an important analytical tool for evaluating community diversity and evolutionary history. In the case of microorganisms, the decreasing cost of sequencing has enabled researchers to generate ever-larger sequence datasets, which in turn have begun to fill gaps in the evolutionary history of microbial groups. However, phylogenetic analyses of these types of datasets create complex trees that can be challenging to interpret. Scientific inferences made by visual inspection of phylogenetic trees can be simplified and enhanced by customizing various parts of the tree. Yet, manual customization is time-consuming and error prone, and programs designed to assist in batch tree customization often require programming experience or complicated file formats for annotation. Iroki, a user-friendly web interface for tree visualization, addresses these issues by providing automatic customization of large trees based on metadata contained in tab-separated text files. Iroki’s utility for exploring biological and ecological trends in sequencing data was demonstrated through a variety of microbial ecology applications in which trees with hundreds to thousands of leaf nodes were customized according to extensive collections of metadata. The Iroki web application and documentation are available at https://www.iroki.net or through the VIROME portal (http://virome.dbi.udel.edu). Iroki’s source code is released under the MIT license and is available at https://github.com/mooreryan/iroki.

Download Full-text

Iroki: automatic customization and visualization of phylogenetic trees

PeerJ ◽

10.7717/peerj.8584 ◽

2020 ◽

Vol 8 ◽

pp. e8584 ◽

Cited By ~ 5

Author(s):

Ryan M. Moore ◽

Amelia O. Harrison ◽

Sean M. McAllister ◽

Shawn W. Polson ◽

K. Eric Wommack

Keyword(s):

Web Application ◽

Phylogenetic Trees ◽

Evolutionary History ◽

Phylogenetic Analyses ◽

Community Diversity ◽

Sequencing Data ◽

Link Type ◽

Large Trees ◽

History Of ◽

Microbial Groups

Phylogenetic trees are an important analytical tool for evaluating community diversity and evolutionary history. In the case of microorganisms, the decreasing cost of sequencing has enabled researchers to generate ever-larger sequence datasets, which in turn have begun to fill gaps in the evolutionary history of microbial groups. However, phylogenetic analyses of these types of datasets create complex trees that can be challenging to interpret. Scientific inferences made by visual inspection of phylogenetic trees can be simplified and enhanced by customizing various parts of the tree. Yet, manual customization is time-consuming and error prone, and programs designed to assist in batch tree customization often require programming experience or complicated file formats for annotation. Iroki, a user-friendly web interface for tree visualization, addresses these issues by providing automatic customization of large trees based on metadata contained in tab-separated text files. Iroki’s utility for exploring biological and ecological trends in sequencing data was demonstrated through a variety of microbial ecology applications in which trees with hundreds to thousands of leaf nodes were customized according to extensive collections of metadata. The Iroki web application and documentation are available at https://www.iroki.net or through the VIROME portal http://virome.dbi.udel.edu. Iroki’s source code is released under the MIT license and is available at https://github.com/mooreryan/iroki.

Download Full-text

Phylommand - a command line software package for phylogenetics

F1000Research ◽

10.12688/f1000research.10446.1 ◽

2016 ◽

Vol 5 ◽

pp. 2903

Author(s):

Martin Ryberg

Keyword(s):

Software Package ◽

Evolutionary Biology ◽

Phylogenetic Trees ◽

Phylogenetic Analyses ◽

Command Line ◽

File Formats

Phylogenetics is an intrinsic part of many analyses in evolutionary biology and ecology, and as the amount of data available for these analyses is increasing rapidly the need for automated pipelines to deal with the data also increases. Phylommand is a package of four programs to create, manipulate, and/or analyze phylogenetic trees or pairwise alignments. It is built to be easily implemented in software workflows, both directly on the command prompt, and executed using scripts. Inputs can be taken from standard input or a file, and the behavior of the programs can be changed through switches. By using standard file formats for phylogenetic analyses, such as newick, nexus, phylip, and fasta, phylommand is widely compatible with other software.

Download Full-text

BuddySuite: Command-line toolkits for manipulating sequences, alignments, and phylogenetic trees

10.1101/040675 ◽

2016 ◽

Author(s):

Stephen R. Bond ◽

Karl E. Keat ◽

Sofia N. Barreira ◽

Andreas D. Baxevanis

Keyword(s):

Sequence Alignment ◽

Phylogenetic Trees ◽

Phylogenetic Reconstruction ◽

General Purpose ◽

Command Line ◽

Link Type ◽

File Formats ◽

Downstream Analysis ◽

Python Package ◽

Common Sequence

AbstractThe ability to manipulate sequence, alignment, and phylogenetic tree files has become an increasingly important skill in the life sciences, whether to generate summary information or to prepare data for further downstream analysis. The command line can be an extremely powerful environment for interacting with these resources, but only if the user has the appropriate general-purpose tools on hand. BuddySuite is a collection of four independent yet interrelated command-line toolkits that facilitate each step in the workflow of sequence discovery, curation, alignment, and phylogenetic reconstruction. Most common sequence, alignment, and tree file formats are automatically detected and parsed, and over 100 tools have been implemented for manipulating these data. The project has been engineered to easily accommodate the addition of new tools, it is written in the popular programming language Python, and is hosted on the Python Package Index and GitHub to maximize accessibility. Documentation for each BuddySuite tool, including usage examples, is available at http://tiny.cc/buddysuite_wiki. All software is open source and freely available through http://research.nhgri.nih.gov/software/BuddySuite.

Download Full-text

sangeranalyseR: simple and Interactive Processing of Sanger Sequencing Data in R

Genome Biology and Evolution ◽

10.1093/gbe/evab028 ◽

2021 ◽

Author(s):

Kuan-Hao Chao ◽

Kirston Barton ◽

Sarah Palmer ◽

Robert Lanfear

Keyword(s):

Open Source ◽

Phylogenetic Trees ◽

Input Data ◽

Sanger Sequencing ◽

R Package ◽

Command Line ◽

Sequencing Data ◽

Fasta Format ◽

Online Documentation ◽

Wide Range

Abstract sangeranalyseR is feature-rich, free, and open-source R package for processing Sanger sequencing data. It allows users to go from loading reads to saving aligned contigs in a few lines of R code by using sensible defaults for most actions. It also provides complete flexibility for determining how individual reads and contigs are processed, both at the command-line in R and via interactive Shiny applications. sangeranalyseR provides a wide range of options for all steps in Sanger processing pipelines including trimming reads, detecting secondary peaks, viewing chromatograms, detecting indels and stop codons, aligning contigs, estimating phylogenetic trees, and more. Input data can be in either ABIF or FASTA format. sangeranalyseR comes with extensive online documentation and outputs aligned and unaligned reads and contigs in FASTA format, along with detailed interactive HTML reports. sangeranalyseR supports the use of colourblind-friendly palettes for viewing alignments and chromatograms. It is released under an MIT licence and available for all platforms on Bioconductor (https://bioconductor.org/packages/sangeranalyseR) and on Github (https://github.com/roblanf/sangeranalyseR).

Download Full-text

Improvement, identification, and target prediction for miRNAs in the porcine genome by using massive, public high-throughput sequencing data

Journal of Animal Science ◽

10.1093/jas/skab018 ◽

2021 ◽

Vol 99 (2) ◽

Author(s):

Yuhua Fu ◽

Pengyu Fan ◽

Lu Wang ◽

Ziqiang Shu ◽

Shilin Zhu ◽

...

Keyword(s):

High Throughput Sequencing ◽

Target Genes ◽

Target Prediction ◽

Large Data ◽

Sequencing Data ◽

Regulate Gene Expression ◽

High Throughput Sequencing Data ◽

Annotation Information ◽

Public Data ◽

Broad Variety

Abstract Despite the broad variety of available microRNA (miRNA) research tools and methods, their application to the identification, annotation, and target prediction of miRNAs in nonmodel organisms is still limited. In this study, we collected nearly all public sRNA-seq data to improve the annotation for known miRNAs and identify novel miRNAs that have not been annotated in pigs (Sus scrofa). We newly annotated 210 mature sequences in known miRNAs and found that 43 of the known miRNA precursors were problematic due to redundant/missing annotations or incorrect sequences. We also predicted 811 novel miRNAs with high confidence, which was twice the current number of known miRNAs for pigs in miRBase. In addition, we proposed a correlation-based strategy to predict target genes for miRNAs by using a large amount of sRNA-seq and RNA-seq data. We found that the correlation-based strategy provided additional evidence of expression compared with traditional target prediction methods. The correlation-based strategy also identified the regulatory pairs that were controlled by nonbinding sites with a particular pattern, which provided abundant complementarity for studying the mechanism of miRNAs that regulate gene expression. In summary, our study improved the annotation of known miRNAs, identified a large number of novel miRNAs, and predicted target genes for all pig miRNAs by using massive public data. This large data-based strategy is also applicable for other nonmodel organisms with incomplete annotation information.

Download Full-text

BiSulfite Bolt: A bisulfite sequencing analysis platform

GigaScience ◽

10.1093/gigascience/giab033 ◽

2021 ◽

Vol 10 (5) ◽

Author(s):

Colin Farrell ◽

Michael Thompson ◽

Anela Tosevska ◽

Adewale Oyetunde ◽

Matteo Pellegrini

Keyword(s):

Data Aggregation ◽

Bisulfite Sequencing ◽

Low Complexity ◽

Sequencing Analysis ◽

Command Line ◽

Sequencing Data ◽

Bisulfite Sequencing Data ◽

Analysis Platform ◽

Python Package ◽

Bisulfite Sequencing Analysis

Abstract Background Bisulfite sequencing is commonly used to measure DNA methylation. Processing bisulfite sequencing data is often challenging owing to the computational demands of mapping a low-complexity, asymmetrical library and the lack of a unified processing toolset to produce an analysis-ready methylation matrix from read alignments. To address these shortcomings, we have developed BiSulfite Bolt (BSBolt), a fast and scalable bisulfite sequencing analysis platform. BSBolt performs a pre-alignment sequencing read assessment step to improve efficiency when handling asymmetrical bisulfite sequencing libraries. Findings We evaluated BSBolt against simulated and real bisulfite sequencing libraries. We found that BSBolt provides accurate and fast bisulfite sequencing alignments and methylation calls. We also compared BSBolt to several existing bisulfite alignment tools and found BSBolt outperforms Bismark, BSSeeker2, BISCUIT, and BWA-Meth based on alignment accuracy and methylation calling accuracy. Conclusion BSBolt offers streamlined processing of bisulfite sequencing data through an integrated toolset that offers support for simulation, alignment, methylation calling, and data aggregation. BSBolt is implemented as a Python package and command line utility for flexibility when building informatics pipelines. BSBolt is available at https://github.com/NuttyLogic/BSBolt under an MIT license.

Download Full-text

Alview: Portable Software for Viewing Sequence Reads in BAM Formatted Files

Cancer Informatics ◽

10.4137/cin.s26470 ◽

2015 ◽

Vol 14 ◽

pp. CIN.S26470 ◽

Cited By ~ 2

Author(s):

Richard P. Finney ◽

Qing-Rong Chen ◽

Cu V. Nguyen ◽

Chih Hao Hsu ◽

Chunhua Yan ◽

...

Keyword(s):

Graphical User Interface ◽

Reference Genome ◽

Source Code ◽

Software Tool ◽

Command Line ◽

Sequencing Data ◽

Genome Data ◽

Command Line Tool ◽

Portable Software ◽

Microsoft Windows

The name Alview is a contraction of the term Alignment Viewer. Alview is a compiled to native architecture software tool for visualizing the alignment of sequencing data. Inputs are files of short-read sequences aligned to a reference genome in the SAM/BAM format and files containing reference genome data. Outputs are visualizations of these aligned short reads. Alview is written in portable C with optional graphical user interface (GUI) code written in C, C++, and Objective-C. The application can run in three different ways: as a web server, as a command line tool, or as a native, GUI program. Alview is compatible with Microsoft Windows, Linux, and Apple OS X. It is available as a web demo at https://cgwb.nci.nih.gov/cgi-bin/alview . The source code and Windows/Mac/Linux executables are available via https://github.com/NCIP/alview .

Download Full-text

An emergent clade of SARS-CoV-2 linked to returned travellers from Iran

10.1101/2020.03.15.992818 ◽

2020 ◽

Cited By ~ 20

Author(s):

John-Sebastian Eden ◽

Rebecca Rockett ◽

Ian Carter ◽

Hossinur Rahman ◽

Joep de Ligt ◽

...

Keyword(s):

New Zealand ◽

Infectious Diseases ◽

Genome Sequencing ◽

Phylogenetic Analyses ◽

Emerging Infectious Diseases ◽

Whole Genome Sequencing Data ◽

Viral Diversity ◽

Whole Genome ◽

Sequencing Data ◽

Public Data

AbstractThe SARS-CoV-2 epidemic has rapidly spread outside China with major outbreaks occurring in Italy, South Korea and Iran. Phylogenetic analyses of whole genome sequencing data identified a distinct SARS-CoV-2 clade linked to travellers returning from Iran to Australia and New Zealand. This study highlights potential viral diversity driving the epidemic in Iran, and underscores the power of rapid genome sequencing and public data sharing to improve the detection and management of emerging infectious diseases.

Download Full-text

UCEasy: A software package for automating and simplifying the analysis of ultraconserved elements (UCEs)

Biodiversity Data Journal ◽

10.3897/bdj.9.e78132 ◽

2021 ◽

Vol 9 ◽

Author(s):

Caio Ribeiro ◽

Lucas Oliveira ◽

Romina Batista ◽

Marcos De Sousa

Keyword(s):

Best Practices ◽

Software Package ◽

Phylogenetic Trees ◽

Computational Analysis ◽

Data Matrix ◽

Command Line ◽

Command Line Interface ◽

Ultraconserved Elements ◽

Research Software ◽

Different Levels

The use of Ultraconserved Elements (UCEs) as genetic markers in phylogenomics has become popular and has provided promising results. Although UCE data can be easily obtained from targeted enriched sequencing, the protocol for in silico analysis of UCEs consist of the execution of heterogeneous and complex tools, a challenge for scientists without training in bioinformatics. Developing tools with the adoption of best practices in research software can lessen this problem by improving the execution of computational experiments, thus promoting better reproducibility. We present UCEasy, an easy-to-install and easy-to-use software package with a simple command line interface that facilitates the computational analysis of UCEs from sequencing samples, following the best practices of research software. UCEasy is a wrapper that standardises, automates and simplifies the quality control of raw reads, assembly and extraction and alignment of UCEs, generating at the end a data matrix with different levels of completeness that can be used to infer phylogenetic trees. We demonstrate the functionalities of UCEasy by reproducing the published results of phylogenomic studies of the bird genus Turdus (Aves) and of Adephaga families (Coleoptera) containing genomic datasets to efficiently extract UCEs.

Download Full-text