scholarly journals viromeBrowser: A Shiny App for Browsing Virome Sequencing Analysis Results

Viruses ◽  
2021 ◽  
Vol 13 (3) ◽  
pp. 437
Author(s):  
David F. Nieuwenhuijse ◽  
Bas B. Oude Munnink ◽  
Marion P. G. Koopmans

Experiments in which complex virome sequencing data is generated remain difficult to explore and unpack for scientists without a background in data science. The processing of raw sequencing data by high throughput sequencing workflows usually results in contigs in FASTA format coupled to an annotation file linking the contigs to a reference sequence or taxonomic identifier. The next step is to compare the virome of different samples based on the metadata of the experimental setup and extract sequences of interest that can be used in subsequent analyses. The viromeBrowser is an application written in the opensource R shiny framework that was developed in collaboration with end-users and is focused on three common data analysis steps. First, the application allows interactive filtering of annotations by default or custom quality thresholds. Next, multiple samples can be visualized to facilitate comparison of contig annotations based on sample specific metadata values. Last, the application makes it easy for users to extract sequences of interest in FASTA format. With the interactive features in the viromeBrowser we aim to enable scientists without a data science background to compare and extract annotation data and sequences from virome sequencing analysis results.

2020 ◽  
Vol 11 ◽  
Author(s):  
Paul E. Smith ◽  
Sinead M. Waters ◽  
Ruth Gómez Expósito ◽  
Hauke Smidt ◽  
Ciara A. Carberry ◽  
...  

Our understanding of complex microbial communities, such as those residing in the rumen, has drastically advanced through the use of high throughput sequencing (HTS) technologies. Indeed, with the use of barcoded amplicon sequencing, it is now cost effective and computationally feasible to identify individual rumen microbial genera associated with ruminant livestock nutrition, genetics, performance and greenhouse gas production. However, across all disciplines of microbial ecology, there is currently little reporting of the use of internal controls for validating HTS results. Furthermore, there is little consensus of the most appropriate reference database for analyzing rumen microbiota amplicon sequencing data. Therefore, in this study, a synthetic rumen-specific sequencing standard was used to assess the effects of database choice on results obtained from rumen microbial amplicon sequencing. Four DADA2 reference training sets (RDP, SILVA, GTDB, and RefSeq + RDP) were compared to assess their ability to correctly classify sequences included in the rumen-specific sequencing standard. In addition, two thresholds of phylogenetic bootstrapping, 50 and 80, were applied to investigate the effect of increasing stringency. Sequence classification differences were apparent amongst the databases. For example the classification of Clostridium differed between all databases, thus highlighting the need for a consistent approach to nomenclature amongst different reference databases. It is hoped the effect of database on taxonomic classification observed in this study, will encourage research groups across various microbial disciplines to develop and routinely use their own microbiome-specific reference standard to validate analysis pipelines and database choice.


2016 ◽  
Vol 32 (10) ◽  
pp. 1486-1492 ◽  
Author(s):  
Gabriel H. Murillo ◽  
Na You ◽  
Xiaoquan Su ◽  
Wei Cui ◽  
Muredach P. Reilly ◽  
...  

2015 ◽  
Author(s):  
Manolis Maragkakis ◽  
Panagiotis Alexiou ◽  
Zissimos Mourelatos

Background: High throughput sequencing (HTS) has become one of the primary experimental tools used to extract genomic information from biological samples. Bioinformatics tools are continuously being developed for the analysis of HTS data. Beyond some well-defined core analyses, such as quality control or genomic alignment, the consistent development of custom tools and the representation of sequencing data in organized computational structures and entities remains a challenging effort for bioinformaticians. Results: In this work, we present GenOO [jee-noo], an open-source; object-oriented (OO) Perl framework specifically developed for the design and implementation of HTS analysis tools. GenOO models biological entities such as genes and transcripts as Perl objects, and includes relevant modules, attributes and methods that allow for the manipulation of high throughput sequencing data. GenOO integrates these elements in a simple and transparent way which allows for the creation of complex analysis pipelines minimizing the overhead for the researcher. GenOO has been designed with flexibility in mind, and has an easily extendable modular structure with minimal requirements for external tools and libraries. As an example of the framework’s capabilities and usability, we present a short and simple walkthrough of a custom use case in HTS analysis. Conclusions: GenOO is a tool of high software quality which can be efficiently used for advanced HTS analyses. It has been used to develop several custom analysis tools, leading to a number of published works. Using GenOO as a core development module can greatly benefit users, by reducing the overhead and complexity of managing HTS data and biological entities at hand.


2019 ◽  
Vol 5 ◽  
Author(s):  
Álvaro Briz-Redón

Spatial statistics is an important field of data science with many applications in very different areas of study such as epidemiology, criminology, seismology, astronomy and econometrics, among others. In particular, spatial statistics has frequently been used to analyze traffic accidents datasets with explanatory and preventive objectives. Traditionally, these studies have employed spatial statistics techniques at some level of areal aggregation, usually related to administrative units. However, last decade has brought an increasing number of works on the spatial incidence and distribution of traffic accidents at the road level by means of the spatial structure known as a linear network. This change seems positive because it could provide deeper and more accurate investigations than previous studies that were based on areal spatial units. The interest in working at the road level renders some technical difficulties due to the high complexity of these structures, specially in terms of manipulation and rectification. The R Shiny app SpNetPrep, which is available online and via an R package named the same way, has the goal of providing certain functionalities that could be useful for a user which is interested in performing an spatial analysis over a road network structure.


2020 ◽  
Author(s):  
Dileep Kishore ◽  
Gabriel Birzu ◽  
Zhenjun Hu ◽  
Charles DeLisi ◽  
Kirill S. Korolev ◽  
...  

AbstractMicrobes tend to organize into communities consisting of hundreds of species involved in complex interactions with each other. 16S ribosomal RNA (16S rRNA) amplicon profiling provides snapshots that reveal the phylogenies and abundance profiles of these microbial communities. These snapshots, when collected from multiple samples, have the potential to reveal which microbes co-occur, providing a glimpse into the network of associations in these communities. The inference of networks from 16S data is prone to statistical artifacts. There are many tools for performing each step of the 16S analysis workflow, but the extent to which these steps affect the final network is still unclear. In this study, we perform a meticulous analysis of each step of a pipeline that can convert 16S sequencing data into a network of microbial associations. Through this process, we map how different choices of algorithms and parameters affect the co-occurrence network and estimate steps that contribute most significantly to the variance. We further determine the tools and parameters that generate the most accurate and robust co-occurrence networks based on comparison with mock and synthetic datasets. Ultimately, we develop a standardized pipeline (available at https://github.com/segrelab/MiCoNE) that follows these default tools and parameters, but that can also help explore the outcome of any other combination of choices. We envisage that this pipeline could be used for integrating multiple data-sets, and for generating comparative analyses and consensus networks that can help understand and control microbial community assembly in different biomes.ImportanceTo understand and control the mechanisms that determine the structure and function of microbial communities, it is important to map the interrelationships between its constituent microbial species. The surge in the high-throughput sequencing of microbial communities has led to the creation of thousands of datasets containing information about microbial abundances. These abundances can be transformed into networks of co-occurrences across multiple samples, providing a glimpse into the structure of microbiomes. However, processing these datasets to obtain co-occurrence information relies on several complex steps, each of which involves multiple choices of tools and corresponding parameters. These multiple options pose questions about the accuracy and uniqueness of the inferred networks. In this study, we address this workflow and provide a systematic analysis of how these choices of tools and parameters affect the final network, and on how to select those that are most appropriate for a particular dataset.


2020 ◽  
Vol 8 (5) ◽  
pp. 684
Author(s):  
Nathanael J. Bangayan ◽  
Baochen Shi ◽  
Jerry Trinh ◽  
Emma Barnard ◽  
Gabriela Kasimatis ◽  
...  

The microbiome plays an important role in human physiology. The composition of the human microbiome has been described at the phylum, class, genus, and species levels, however, it is largely unknown at the strain level. The importance of strain-level differences in microbial communities has been increasingly recognized in understanding disease associations. Current methods for identifying strain populations often require deep metagenomic sequencing and a comprehensive set of reference genomes. In this study, we developed a method, metagenomic multi-locus sequence typing (MG-MLST), to determine strain-level composition in a microbial community by combining high-throughput sequencing with multi-locus sequence typing (MLST). We used a commensal bacterium, Propionibacterium acnes, as an example to test the ability of MG-MLST in identifying the strain composition. Using simulated communities, MG-MLST accurately predicted the strain populations in all samples. We further validated the method using MLST gene amplicon libraries and metagenomic shotgun sequencing data of clinical skin samples. MG-MLST yielded consistent results of the strain composition to those obtained from nearly full-length 16S rRNA clone libraries and metagenomic shotgun sequencing analysis. When comparing strain-level differences between acne and healthy skin microbiomes, we demonstrated that strains of RT2/6 were highly associated with healthy skin, consistent with previous findings. In summary, MG-MLST provides a quantitative analysis of the strain populations in the microbiome with diversity and richness. It can be applied to microbiome studies to reveal strain-level differences between groups, which are critical in many microorganism-related diseases.


2018 ◽  
Author(s):  
Christoffer Flensburg ◽  
Tobias Sargeant ◽  
Alicia Oshlack ◽  
Ian Majewski

AbstractAnalysing multiple cancer samples from an individual patient can provide insight into the way the disease evolves. Monitoring the expansion and contraction of distinct clones helps to reveal the mutations that initiate the disease and those that drive progression. Existing approaches for clonal tracking from sequencing data typically require the user to combine multiple tools that are not purpose-built for this task. Furthermore, most methods require a matched normal (non-tumour) sample, which limits the scope of application. We developed SuperFreq, a cancer exome sequencing analysis pipeline that integrates identification of somatic single nucleotide variants (SNVs) and copy number alterations (CNAs) and clonal tracking for both. SuperFreq does not require a matched normal and instead relies on unrelated controls. When analysing multiple samples from a single patient, SuperFreq cross checks variant calls to improve clonal tracking, which helps to separate somatic from germline variants, and to resolve overlapping CNA calls. To demonstrate our software we analysed 304 cancer-normal exome samples across 33 cancer types in The Cancer Genome Atlas (TCGA) and evaluated the quality of the SNV and CNA calls. We simulated clonal evolution through in silico mixing of cancer and normal samples in known proportion. We found that SuperFreq identified 93% of clones with a cellular fraction of at least 50% and mutations were assigned to the correct clone with high recall and precision. In addition, SuperFreq maintained a similar level of performance for most aspects of the analysis when run without a matched normal. SuperFreq is highly versatile and can be applied in many different experimental settings for the analysis of exomes and other capture libraries. We demonstrate an application of SuperFreq to leukaemia patients with diagnosis and relapse samples.SuperFreq is implemented in R and available on github at https://github.com/ChristofferFlensburg/SuperFreq.


2018 ◽  
Author(s):  
Yuna Blum ◽  
Aurélien de Reyniès ◽  
Nelson Dusetti ◽  
Juan Iovanna ◽  
Laetitia Marisa ◽  
...  

AbstractBackgroundPatient-derived xenograft is the model of reference in oncology fordrug response analyses. Xenografts samples have the specificity to be composedof cells from both the graft and the host species. Sequencing analysis ofxenograft samples therefore requires specific processing methods to properlyreconstruct genomic profiles of both the host and graft compartments.ResultsWe propose a novel xenograft sequencing process pipeline termedSMAP for Simultaneous mapping. SMAP integrates the distinction of host andgraft sequencing reads to the mapping process by simultaneously aligning to bothgenome references. We show that SMAP increases accuracy of species-assignmentwhile reducing the number of discarded ambiguous reads compared to otherexisting methods. Moreover, SMAP includes a module called SMAP-fuz toimprove the detection of chimeric transcript fusion in xenograft RNAseq data. Finally, we apply SMAP on a real dataset and show the relevance of pathway andcell population analysis of the tumoral and stromal compartments.ConclusionsIn high-throughput sequencing analysis of xenografts, our resultsshow that: i. the use of ad hoc sequence processing methods is essential, ii. highsequence homology does not introduce a significant bias when proper methodsare used and iii. the detection of fusion transcripts can be improved using ourapproach. SMAP is available on GitHub: cit-bioinfo.github.io/SMAP.


Viruses ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2541
Author(s):  
Izabela Fabiańska ◽  
Stefan Borutzki ◽  
Benjamin Richter ◽  
Hon Q. Tran ◽  
Andreas Neubert ◽  
...  

High-throughput sequencing (HTS) allows detection of known and unknown viruses in samples of broad origin. This makes HTS a perfect technology to determine whether or not the biological products, such as vaccines are free from the adventitious agents, which could support or replace extensive testing using various in vitro and in vivo assays. Due to bioinformatics complexities, there is a need for standardized and reliable methods to manage HTS generated data in this field. Thus, we developed LABRADOR—an analysis pipeline for adventitious virus detection. The pipeline consists of several third-party programs and is divided into two major parts: (i) direct reads classification based on the comparison of characteristic profiles between reads and sequences deposited in the database supported with alignment of to the best matching reference sequence and (ii) de novo assembly of contigs and their classification on nucleotide and amino acid levels. To meet the requirements published in guidelines for biologicals’ safety we generated a custom nucleotide database with viral sequences. We tested our pipeline on publicly available HTS datasets and showed that LABRADOR can reliably detect viruses in mixtures of model viruses, vaccines and clinical samples.


Sign in / Sign up

Export Citation Format

Share Document