sangeranalyseR: simple and Interactive Processing of Sanger Sequencing Data in R

Abstract sangeranalyseR is feature-rich, free, and open-source R package for processing Sanger sequencing data. It allows users to go from loading reads to saving aligned contigs in a few lines of R code by using sensible defaults for most actions. It also provides complete flexibility for determining how individual reads and contigs are processed, both at the command-line in R and via interactive Shiny applications. sangeranalyseR provides a wide range of options for all steps in Sanger processing pipelines including trimming reads, detecting secondary peaks, viewing chromatograms, detecting indels and stop codons, aligning contigs, estimating phylogenetic trees, and more. Input data can be in either ABIF or FASTA format. sangeranalyseR comes with extensive online documentation and outputs aligned and unaligned reads and contigs in FASTA format, along with detailed interactive HTML reports. sangeranalyseR supports the use of colourblind-friendly palettes for viewing alignments and chromatograms. It is released under an MIT licence and available for all platforms on Bioconductor (https://bioconductor.org/packages/sangeranalyseR) and on Github (https://github.com/roblanf/sangeranalyseR).

Download Full-text

sangeranalyseR: simple and interactive analysis of Sanger sequencing data in R

10.1101/2020.05.18.102459 ◽

2020 ◽

Author(s):

Kuan-Hao Chao ◽

Kirston Barton ◽

Sarah Palmer ◽

Robert Lanfear

Keyword(s):

Sanger Sequencing ◽

Reference Sequence ◽

Supplementary Information ◽

File Format ◽

Bioconductor Package ◽

Sequencing Data ◽

Interactive Analysis ◽

Link Type ◽

Online Documentation ◽

Wide Range

AbstractSummarysangeranalyseR is an interactive R/Bioconductor package and two associated Shiny applications designed for analysing Sanger sequencing from data from the ABIF file format in R. It allows users to go from loading reads to saving aligned contigs in a few lines of R code. sangeranalyseR provides a wide range of options for a number of commonly-performed actions including read trimming, detecting secondary peaks, viewing chromatograms, and detecting indels using a reference sequence. All parameters can be adjusted interactively either in R or in the associated Shiny applications. sangeranalyseR comes with extensive online documentation, and outputs detailed interactive HTML reports.Availability and implementationsangeranalyseR is implemented in R and released under an MIT license. It is available for all platforms on Bioconductor (https://bioconductor.org/packages/sangeranalyseR) and on Github (https://github.com/roblanf/sangeranalyseR)[email protected] informationDocumentation at https://sangeranalyser.readthedocs.io/.

Download Full-text

An open-source R-package and web application for high-quality probabilistic predictions in hydrology

10.5194/egusphere-egu21-8549 ◽

2021 ◽

Author(s):

Jason Hunter ◽

Mark Thyer ◽

Dmitri Kavetski ◽

David McInerney

Keyword(s):

Open Source ◽

Web Application ◽

R Package ◽

Error Model ◽

Objective Functions ◽

High Quality ◽

Wide Range ◽

Probabilistic Error

Probabilistic predictions provide crucial information regarding the uncertainty of hydrological predictions, which are a key input for risk-based decision-making. However, they are often excluded from hydrological modelling applications because suitable probabilistic error models can be both challenging to construct and interpret, and the quality of results are often reliant on the objective function used to calibrate the hydrological model.We present an open-source R-package and an online web application that achieves the following two aims. Firstly, these resources are easy-to-use and accessible, so that users need not have specialised knowledge in probabilistic modelling to apply them. Secondly, the probabilistic error model that we describe provides high-quality probabilistic predictions for a wide range of commonly-used hydrological objective functions, which it is only able to do by including a new innovation that resolves a long-standing issue relating to model assumptions that previously prevented this broad application. &#160;We demonstrate our methods by comparing our new probabilistic error model with an existing reference error model in an empirical case study that uses 54 perennial Australian catchments, the hydrological model GR4J, 8 common objective functions and 4 performance metrics (reliability, precision, volumetric bias and errors in the flow duration curve). The existing reference error model introduces additional flow dependencies into the residual error structure when it is used with most of the study objective functions, which in turn leads to poor-quality probabilistic predictions. In contrast, the new probabilistic error model achieves high-quality probabilistic predictions for all objective functions used in this case study.The new probabilistic error model and the open-source software and web application aims to facilitate the adoption of probabilistic predictions in the hydrological modelling community, and to improve the quality of predictions and decisions that are made using those predictions. In particular, our methods can be used to achieve high-quality probabilistic predictions from hydrological models that are calibrated with a wide range of common objective functions.

Download Full-text

Motif: an open-source R tool for pattern-based spatial analysis

10.32942/osf.io/kj7fu ◽

2020 ◽

Author(s):

Jakub Nowosad

Keyword(s):

Spatial Analysis ◽

Land Cover ◽

Open Source ◽

Spatial Patterns ◽

Forest Cover ◽

R Package ◽

Growth Monitoring ◽

Forest Cover Change ◽

Land Cover Data ◽

Wide Range

*Context* Pattern-based spatial analysis provides methods to describe and quantitatively compare spatial patterns for categorical raster datasets. It allows for spatial search, change detection, and clustering of areas with similar patterns. *Objectives* We developed an R package **motif** as a set of open-source tools for pattern-based spatial analysis. *Methods* This package provides most of the functionality of existing software (except spatial segmentation), but also extends the existing ideas through support for multi-layer raster datasets. It accepts larger-than-RAM datasets and works across all of the major operating systems. *Results* In this study, we describe the software design of the tool, its capabilities, and present four case studies. They include calculation of spatial signatures based on land cover data for regular and irregular areas, search for regions with similar patterns of geomorphons, detection of changes in land cover patterns, and clustering of areas with similar spatial patterns of land cover and landforms. *Conclusions* The methods implemented in **motif** should be useful in a wide range of applications, including land management, sustainable development, environmental protection, forest cover change and urban growth monitoring, and agriculture expansion studies. The **motif** package homepage is https://nowosad.github.io/motif.

Download Full-text

Real-time monitoring and analysis of SARS-CoV-2 nanopore sequencing with minoTour.

10.1101/2021.09.13.459777 ◽

2021 ◽

Author(s):

Rory James Munro ◽

Nadine Holmes ◽

Christopher Moore ◽

Matthew Carlile ◽

Alex Payne ◽

...

Keyword(s):

Real Time ◽

Phylogenetic Trees ◽

Sequencing Data ◽

Time Analysis ◽

Real Time Analysis ◽

Oxford Nanopore ◽

Individual Snps ◽

Wide Range ◽

Time Required ◽

Viral Sequencing

Motivation: The ongoing SARS-CoV-2 pandemic has demonstrated the utility of real-time analysis of sequencing data, with a wide range of databases and resources for analysis now available. Here we show how the real-time nature of Oxford Nanopore Technologies sequencers can accelerate consensus generation, lineage and variant status assignment. We exploit the fact that multiplexed viral sequencing libraries quickly generate sufficient data for the majority of samples, with diminishing returns on remaining samples as the sequencing run progresses. We demonstrate methods to determine when a sequencing run has passed this point in order to reduce the time required and cost of sequencing. Results: We extended MinoTour, our real-time analysis and monitoring platform for nanopore sequencers, to provide SARS-CoV2 analysis using ARTIC network pipelines. We additionally developed an algorithm to predict which samples will achieve sufficient coverage, automatically running the ARTIC medaka informatics pipeline once specific coverage thresholds have been reached on these samples. After testing on run data, we find significant run time savings are possible, enabling flow cells to be used more efficiently and enabling higher throughput data analysis. The resultant consensus genomes are assigned both PANGO lineage and variant status as defined by Public Health England. Samples from within individual runs are used to generate phylogenetic trees incorporating optional background samples as well as summaries of individual SNPs. As minoTour uses ARTIC pipelines, new primer schemes and pathogens can be added to allow minoTour to aid in real-time analysis of pathogens in the future.

Download Full-text

iTALK: an R Package to Characterize and Illustrate Intercellular Communication

10.1101/507871 ◽

2019 ◽

Cited By ~ 26

Author(s):

Yuanxin Wang ◽

Ruiping Wang ◽

Shaojun Zhang ◽

Shumei Song ◽

Changying Jiang ◽

...

Keyword(s):

Intercellular Communication ◽

Cell Communication ◽

R Package ◽

Therapy Resistance ◽

Computational Approach ◽

Sequencing Data ◽

Communication Signals ◽

Cellular Processes ◽

Wide Range ◽

Single Cell Rna Sequencing

ABSTRACTCrosstalk between tumor cells and other cells within the tumor microenvironment (TME) plays a crucial role in tumor progression, metastases, and therapy resistance. We present iTALK, a computational approach to characterize and illustrate intercellular communication signals in the multicellular tumor ecosystem using single-cell RNA sequencing data. iTALK can in principle be used to dissect the complexity, diversity, and dynamics of cell-cell communication from a wide range of cellular processes.

Download Full-text

PhySortR: a fast, flexible tool for sorting phylogenetic trees in R

10.7287/peerj.preprints.1609v1 ◽

2015 ◽

Author(s):

Timothy G Stephens ◽

Debashish Bhattacharya ◽

Mark A Ragan ◽

Cheong Xin Chan

Keyword(s):

Phylogenetic Trees ◽

R Package ◽

Command Line ◽

Flexible Tool ◽

Command Line Tool ◽

Whole Tree

A frequent bottleneck in interpreting phylogenomic output is the need to screen often thousands of trees for features of interest, such as robust clades of specific taxa, as evidence of monophyletic relationship and/or reticulated evolution. Here we present PhySortR, a fast, flexible R package for sorting phylogenetic trees. Unlike existing utilities, PhySortR allows for identification of both exclusive and non-exclusive clades uniting the target taxa, with customisable options to assess clades within the context of the whole tree. PhySortR is a command-line tool that is freely available, highly scalable, and easily automatable.

Download Full-text

animalcules: interactive microbiome analytics and visualization in R

Microbiome ◽

10.1186/s40168-021-01013-0 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Yue Zhao ◽

Anthony Federico ◽

Tyler Faits ◽

Solaiappan Manimaran ◽

Daniel Segrè ◽

...

Keyword(s):

16S Rrna ◽

Microbial Communities ◽

R Package ◽

Command Line ◽

Data Generation ◽

Sequencing Data ◽

Shotgun Metagenomics ◽

Microbiome Analysis ◽

Link Type ◽

R Shiny

Abstract Background Microbial communities that live in and on the human body play a vital role in health and disease. Recent advances in sequencing technologies have enabled the study of microbial communities at unprecedented resolution. However, these advances in data generation have presented novel challenges to researchers attempting to analyze and visualize these data. Results To address some of these challenges, we have developed animalcules, an easy-to-use interactive microbiome analysis toolkit for 16S rRNA sequencing data, shotgun DNA metagenomics data, and RNA-based metatranscriptomics profiling data. This toolkit combines novel and existing analytics, visualization methods, and machine learning models. For example, the toolkit features traditional microbiome analyses such as alpha/beta diversity and differential abundance analysis, combined with new methods for biomarker identification are. In addition, animalcules provides interactive and dynamic figures that enable users to understand their data and discover new insights. animalcules can be used as a standalone command-line R package or users can explore their data with the accompanying interactive R Shiny interface. Conclusions We present animalcules, an R package for interactive microbiome analysis through either an interactive interface facilitated by R Shiny or various command-line functions. It is the first microbiome analysis toolkit that supports the analysis of all 16S rRNA, DNA-based shotgun metagenomics, and RNA-sequencing based metatranscriptomics datasets. animalcules can be freely downloaded from GitHub at https://github.com/compbiomed/animalcules or installed through Bioconductor at https://www.bioconductor.org/packages/release/bioc/html/animalcules.html.

Download Full-text

Making WAVES in Breedbase: An Integrated Spectral Data Storage and Analysis Pipeline for Plant Breeding Programs

10.1101/2020.09.18.278549 ◽

2020 ◽

Author(s):

Jenna Hershberger ◽

Nicolas Morales ◽

Christiano C. Simoes ◽

Bryan Ellerbrock ◽

Guillaume Bauchet ◽

...

Keyword(s):

Plant Breeding ◽

Open Source ◽

Spectral Data ◽

Data Storage ◽

Cross Validation ◽

R Package ◽

List Type ◽

Breeding Programs ◽

Wide Range ◽

Prediction Functions

ABSTRACTVisible and near-infrared (vis-NIRS) spectroscopy is a promising tool for increasing phenotyping throughput in plant breeding programs, but existing analysis software packages are not optimized for a breeding context. Additionally, commercial software options are often outside of budget constraints for some breeding and research programs. To that end, we developed an open-source R package, waves, for the streamlined analysis of spectral data with several cross-validation schemes to assess prediction accuracy. Waves is compatible with a wide range of spectrometer models and performs visualization, filtering, aggregation, cross-validation set formation, model training, and prediction functions for the association of vis-NIRS spectra with reference measurements. Furthermore, we have integrated this package into the Breedbase family of open-source databases, expanding the analysis capabilities of this growing digital ecosystem to a number of crop species. Taken together, the standalone and Breedbase versions of waves enhance the accessibility of tools for the analysis of spectral data during the plant breeding process.Core ideaswaves is an open-source R package for spectral data analysis in plant breedingBreeding relevant cross-validation schemes to evaluate predictive accuracy of modelsExtension of Breedbase—an open-source database—to support spectral data storageGraphical user interface developed for implementation of waves in Breedbase

Download Full-text

Bayesian inference of ancestral dates on bacterial phylogenetic trees

10.1101/347385 ◽

2018 ◽

Author(s):

Xavier Didelot ◽

Nicholas J Croucher ◽

Stephen D Bentley ◽

Simon R Harris ◽

Daniel J Wilson

Keyword(s):

Phylogenetic Trees ◽

Single Species ◽

R Package ◽

Bacterial Genomes ◽

Phylogenetic Methods ◽

Bacterial Genomics ◽

Wide Range ◽

Genomic Studies ◽

Dated Phylogeny ◽

Phylogenetic Method

ABSTRACTThe sequencing and comparative analysis of a collection of bacterial genomes from a single species or lineage of interest can lead to key insights into its evolution, ecology or epidemiology. The tool of choice for such a study is often to build a phylogenetic tree, and more specifically when possible a dated phylogeny, in which the dates of all common ancestors are estimated. Here we propose a new Bayesian methodology to construct dated phylogenies which is specifically designed for bacterial genomics. Unlike previous Bayesian methods aimed at building dated phylogenies, we consider that the phylogenetic relationships between the genomes have been previously evaluated using a standard phylogenetic method, which makes our methodology much faster and scalable. This two-steps approach also allows us to directly exploit existing phylogenetic methods that detect bacterial recombination, and therefore to account for the effect of recombination in the construction of a dated phylogeny. We analysed many simulated datasets in order to benchmark the performance of our approach in a wide range of situations. Furthermore, we present applications to three different real datasets from recent bacterial genomic studies. Our methodology is implemented in a R package called BactDating which is freely available for download at https://github.com/xavierdidelot/BactDating.

Download Full-text

A daily-updated database and tools for comprehensive SARS-CoV-2 mutation-annotated trees

Molecular Biology and Evolution ◽

10.1093/molbev/msab264 ◽

2021 ◽

Author(s):

Jakob McBroome ◽

Bryan Thornlow ◽

Angie S Hinrichs ◽

Alexander Kramer ◽

Nicola De Maio ◽

...

Keyword(s):

Phylogenetic Trees ◽

Evolutionary History ◽

Command Line ◽

Sequencing Data ◽

Comprehensive View ◽

File Formats ◽

Public Data

Abstract The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently-proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nextstrain clade and Pango lineage labels at clade roots. As of June 9, 2021, our SARS-CoV-2 MAT consists of 834,521 sequences and provides a comprehensive view of the virus' evolutionary history using public data. We also present matUtils—a command-line utility for rapidly querying, interpreting and manipulating the MATs. Our daily-updated SARS-CoV-2 MAT database and matUtils software are available at http://hgdownload.soe.ucsc.edu/goldenPath/wuhCor1/UShER_SARS-CoV-2/ and https://github.com/yatisht/usher, respectively.

Download Full-text