scholarly journals VirusLab: A Tool for Customized SARS-CoV-2 Data Analysis

BioTech ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 27
Author(s):  
Pietro Pinoli ◽  
Anna Bernasconi ◽  
Anna Sandionigi ◽  
Stefano Ceri

Since the beginning of 2020, the COVID-19 pandemic has posed unprecedented challenges to viral data analysis and connected host disease diagnostic methods. We propose VirusLab, a flexible system for analysing SARS-CoV-2 viral sequences and relating them to metadata or clinical information about the host. VirusLab capitalizes on two existing resources: ViruSurf, a database of public SARS-CoV-2 sequences supporting metadata-driven search, and VirusViz, a tool for visual analysis of search results. VirusLab is designed for taking advantage of these resources within a server-side architecture that: (i) covers pipelines based on approaches already in use (ARTIC, Galaxy) but entirely cutomizable upon user request; (ii) predigests analysis of raw sequencing data from different platforms (Oxford Nanopore and Illumina); (iii) gives access to public archives datasets; (iv) supplies user-friendly reporting – making it a tool that can also be integrated into a business environment. VirusLab can be installed and hosted within the premises of any organization where information about SARS-CoV-2 sequences can be safely integrated with information about hosts (e.g., clinical metadata). A system such as VirusLab is not currently available in the landscape of similar providers: our results show that VirusLab is a powerful tool to generate tabular/graphical and machine readable reports that can be integrated in more complex pipelines. We foresee that the proposed system can support many research-oriented and therapeutic scenarios within hospitals or the tracing of viral sequences and their mutational processes within organizations for viral surveillance.

2019 ◽  
Author(s):  
Rumen Manolov

The lack of consensus regarding the most appropriate analytical techniques for single-case experimental designs data requires justifying the choice of any specific analytical option. The current text mentions some of the arguments, provided by methodologists and statisticians, in favor of several analytical techniques. Additionally, a small-scale literature review is performed in order to explore if and how applied researchers justify the analytical choices that they make. The review suggests that certain practices are not sufficiently explained. In order to improve the reporting regarding the data analytical decisions, it is proposed to choose and justify the data analytical approach prior to gathering the data. As a possible justification for data analysis plan, we propose using as a basis the expected the data pattern (specifically, the expectation about an improving baseline trend and about the immediate or progressive nature of the intervention effect). Although there are multiple alternatives for single-case data analysis, the current text focuses on visual analysis and multilevel models and illustrates an application of these analytical options with real data. User-friendly software is also developed.


2021 ◽  
Vol 15 (1) ◽  
Author(s):  
Zeeshan Ahmed ◽  
Eduard Gibert Renart ◽  
Saman Zeeshan ◽  
XinQi Dong

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Ratanond Koonchanok ◽  
Swapna Vidhur Daulatabad ◽  
Quoseena Mir ◽  
Khairi Reda ◽  
Sarath Chandra Janga

Abstract Background Direct-sequencing technologies, such as Oxford Nanopore’s, are delivering long RNA reads with great efficacy and convenience. These technologies afford an ability to detect post-transcriptional modifications at a single-molecule resolution, promising new insights into the functional roles of RNA. However, realizing this potential requires new tools to analyze and explore this type of data. Result Here, we present Sequoia, a visual analytics tool that allows users to interactively explore nanopore sequences. Sequoia combines a Python-based backend with a multi-view visualization interface, enabling users to import raw nanopore sequencing data in a Fast5 format, cluster sequences based on electric-current similarities, and drill-down onto signals to identify properties of interest. We demonstrate the application of Sequoia by generating and analyzing ~ 500k reads from direct RNA sequencing data of human HeLa cell line. We focus on comparing signal features from m6A and m5C RNA modifications as the first step towards building automated classifiers. We show how, through iterative visual exploration and tuning of dimensionality reduction parameters, we can separate modified RNA sequences from their unmodified counterparts. We also document new, qualitative signal signatures that characterize these modifications from otherwise normal RNA bases, which we were able to discover from the visualization. Conclusions Sequoia’s interactive features complement existing computational approaches in nanopore-based RNA workflows. The insights gleaned through visual analysis should help users in developing rationales, hypotheses, and insights into the dynamic nature of RNA. Sequoia is available at https://github.com/dnonatar/Sequoia.


2020 ◽  
Vol 49 (D1) ◽  
pp. D877-D883
Author(s):  
Fangzhou Xie ◽  
Shurong Liu ◽  
Junhao Wang ◽  
Jiajia Xuan ◽  
Xiaoqin Zhang ◽  
...  

Abstract Eukaryotic genomes encode thousands of small and large non-coding RNAs (ncRNAs). However, the expression, functions and evolution of these ncRNAs are still largely unknown. In this study, we have updated deepBase to version 3.0 (deepBase v3.0, http://rna.sysu.edu.cn/deepbase3/index.html), an increasingly popular and openly licensed resource that facilitates integrative and interactive display and analysis of the expression, evolution, and functions of various ncRNAs by deeply mining thousands of high-throughput sequencing data from tissue, tumor and exosome samples. We updated deepBase v3.0 to provide the most comprehensive expression atlas of small RNAs and lncRNAs by integrating ∼67 620 data from 80 normal tissues and ∼50 cancer tissues. The extracellular patterns of various ncRNAs were profiled to explore their applications for discovery of noninvasive biomarkers. Moreover, we constructed survival maps of tRNA-derived RNA Fragments (tRFs), miRNAs, snoRNAs and lncRNAs by analyzing >45 000 cancer sample data and corresponding clinical information. We also developed interactive webs to analyze the differential expression and biological functions of various ncRNAs in ∼50 types of cancers. This update is expected to provide a variety of new modules and graphic visualizations to facilitate analyses and explorations of the functions and mechanisms of various types of ncRNAs.


Genomics ◽  
2017 ◽  
Vol 109 (2) ◽  
pp. 83-90 ◽  
Author(s):  
Yan Guo ◽  
Yulin Dai ◽  
Hui Yu ◽  
Shilin Zhao ◽  
David C. Samuels ◽  
...  

Biometrika ◽  
2021 ◽  
Author(s):  
Pixu Shi ◽  
Yuchen Zhou ◽  
Anru R Zhang

Abstract In microbiome and genomic studies, the regression of compositional data has been a crucial tool for identifying microbial taxa or genes that are associated with clinical phenotypes. To account for the variation in sequencing depth, the classic log-contrast model is often used where read counts are normalized into compositions. However, zero read counts and the randomness in covariates remain critical issues. In this article, we introduce a surprisingly simple, interpretable, and efficient method for the estimation of compositional data regression through the lens of a novel high-dimensional log-error-in-variable regression model. The proposed method provides both corrections on sequencing data with possible overdispersion and simultaneously avoids any subjective imputation of zero read counts. We provide theoretical justifications with matching upper and lower bounds for the estimation error. The merit of the procedure is illustrated through real data analysis and simulation studies.


2019 ◽  
Author(s):  
Ajay Chatrath ◽  
Roza Przanowska ◽  
Shashi Kiran ◽  
Zhangli Su ◽  
Shekhar Saha ◽  
...  

AbstractWhile clinical data provides physicians with information about patient prognosis, genomic data can further improve these predictions. We analyzed sequencing data from over 10,000 cancer patients and identified hundreds of prognostic germline variants using multivariate Cox regression models. These variants provide information about patient outcomes beyond clinical information currently in use and may augment clinical decisions based on expected tumor aggressiveness. Molecularly, at least twelve of the germline variants are likely associated with patient outcome through perturbation of protein structure and at least five through association with gene expression differences. About half of these germline variants are in previously reported tumor suppressors or oncogenes, with the other half pointing to loci of previously unstudied genes in the literature that should be further investigated for roles in cancers. Our results suggest that germline variation contributes to tumor progression across most cancers and contains patient outcome information not captured by clinical factors.


Sign in / Sign up

Export Citation Format

Share Document