scholarly journals GDCRNATools: an R/Bioconductor package for integrative analysis of lncRNA, miRNA, and mRNA data in GDC

2017 ◽  
Author(s):  
Ruidong Li ◽  
Han Qu ◽  
Shibo Wang ◽  
Julong Wei ◽  
Le Zhang ◽  
...  

AbstractThe large-scale multidimensional omics data in the Genomic Data Commons (GDC) provides opportunities to investigate the crosstalk among different RNA species and their regulatory mechanisms in cancers. Easy-to-use bioinformatics pipelines are needed to facilitate such studies. We have developed a user-friendly R/Bioconductor package, named GDCRNATools, to facilitate downloading, organizing, and analyzing RNA data in GDC with an emphasis on deciphering the lncRNA-mRNA related competing endogenous RNAs (ceRNAs) regulatory network in cancers. Many widely used bioinformatics tools and databases are utilized in our package. Users can easily pack preferred downstream analysis pipelines or integrate their own pipelines into the workflow. Interactive shiny web apps built in GDCRNATools greatly improve visualization of results from the analysis.AvailabilityGDCRNATools is an R/Bioconductor package that is freely available at https://github.com/Jialab-UCR/GDCRNATools

2020 ◽  
Vol 19 ◽  
pp. 153303382090911
Author(s):  
Qi-en He ◽  
Yi-fan Tong ◽  
Zhou Ye ◽  
Li-xia Gao ◽  
Yi-zhi Zhang ◽  
...  

Radiotherapy is one of the most important cancer treatments, but its response varies greatly among individual patients. Therefore, the prediction of radiosensitivity, identification of potential signature genes, and inference of their regulatory networks are important for clinical and oncological reasons. Here, we proposed a novel multiple genomic fused partial least squares deep regression method to simultaneously analyze multi-genomic data. Using 60 National Cancer Institute cell lines as examples, we aimed to identify signature genes by optimizing the radiosensitivity prediction model and uncovering regulatory relationships. A total of 113 signature genes were selected from more than 20,000 genes. The root mean square error of the model was only 0.0025, which was much lower than previously published results, suggesting that our method can predict radiosensitivity with the highest accuracy. Additionally, our regulatory network analysis identified 24 highly important ‘hub’ genes. The data analysis workflow we propose provides a unified and computational framework to harness the full potential of large-scale integrated cancer genomic data for integrative signature discovery. Furthermore, the regression model, signature genes, and their regulatory network should provide a reliable quantitative reference for optimizing personalized treatment options, and may aid our understanding of cancer progress mechanisms.


Horticulturae ◽  
2021 ◽  
Vol 8 (1) ◽  
pp. 41
Author(s):  
Yuhan Zhou ◽  
Yushan Qiao ◽  
Zhiyou Ni ◽  
Jianke Du ◽  
Jinsong Xiong ◽  
...  

Strawberry species (Fragaria spp.) are known as the “queen of fruits” and are cultivated around the world. Over the past few years, eight strawberry genome sequences have been released. The reuse of these large amount of genomic data, and the more large-scale comparative analyses are very challenging to both plant biologists and strawberry breeders. To promote the reuse and exploration of strawberry genomic data and enable extensive analyses using various bioinformatics tools, we have developed the Genome Database for Strawberry (GDS). This platform integrates the genome collection, storage, integration, analysis, and dissemination of large amounts of data for researchers engaged in the study of strawberry. We collected and formatted the eight published strawberry genomes. We constructed the GDS based on Linux, Apache, PHP and MySQL. Different bioinformatic software were integrated. The GDS contains data from eight strawberry species, as well as multiple tools such as BLAST, JBrowse, synteny analysis, and gene search. It has a designed interface and user-friendly tools that perform a variety of query tasks with a few simple operations. In the future, we hope that the GDS will serve as a community resource for the study of strawberries.


F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 752
Author(s):  
Shian Su ◽  
Vincent J. Carey ◽  
Lori Shepherd ◽  
Matthew Ritchie ◽  
Martin T. Morgan ◽  
...  

Motivation: The Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches. Results: The BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R. It provides functions to load detailed package metadata, longitudinal package download statistics, package dependencies, and Bioconductor build reports, all in "tidy data" form. BiocPkgTools can convert from tidy data structures to graph structures, enabling graph-based analytics and visualization. An end-user-friendly graphical package explorer aids in task-centric package discovery. Full documentation and example use cases are included. Availability: The BiocPkgTools software and complete documentation are available from Bioconductor (https://bioconductor.org/packages/BiocPkgTools).


2019 ◽  
Author(s):  
Melissa Y Yan ◽  
Betsy Ferguson ◽  
Benjamin N Bimber

Abstract Summary Large scale genomic studies produce millions of sequence variants, generating datasets far too massive for manual inspection. To ensure variant and genotype data are consistent and accurate, it is necessary to evaluate variants prior to downstream analysis using quality control (QC) reports. Variant call format (VCF) files are the standard format for representing variant data; however, generating summary statistics from these files is not always straightforward. While tools to summarize variant data exist, they generally produce simple text file tables, which still require additional processing and interpretation. VariantQC fills this gap as a user friendly, interactive visual QC report that generates and concisely summarizes statistics from VCF files. The report aggregates and summarizes variants by dataset, chromosome, sample and filter type. The VariantQC report is useful for high-level dataset summary, quality control and helps flag outliers. Furthermore, VariantQC operates on VCF files, so it can be easily integrated into many existing variant pipelines. Availability and implementation DISCVRSeq's VariantQC tool is freely available as a Java program, with the compiled JAR and source code available from https://github.com/BimberLab/DISCVRSeq/. Documentation and example reports are available at https://bimberlab.github.io/DISCVRSeq/.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Weize Xu ◽  
Quan Zhong ◽  
Da Lin ◽  
Ya Zuo ◽  
Jinxia Dai ◽  
...  

Abstract Background Data visualization, especially the genome track plots, is crucial for genomics researchers to discover patterns in large-scale sequencing dataset. Although existing tools works well for producing a normal view of the input data, they are not convenient when users want to create customized data representations. Such gap between the visualization and data processing, prevents the users to uncover more hidden structure of the dataset. Results We developed CoolBox—an open-source toolkit for visual analysis of genomics data. This user-friendly toolkit is highly compatible with the Python ecosystem and customizable with a well-designed user interface. It can be used in various visualization situations like a Swiss army knife. For example, to produce high-quality genome track plots or fetch commonly used genomic data files with a Python script or command line, to explore genomic data interactively within Jupyter environment or web browser. Moreover, owing to the highly extensible Application Programming Interface design, users can customize their own tracks without difficulty, which greatly facilitate analytical, comparative genomic data visualization tasks. Conclusions CoolBox allows users to produce high-quality visualization plots and explore their data in a flexible, programmable and user-friendly way.


2021 ◽  
Author(s):  
Alex M Mawla ◽  
Mark O Huising

epiRomics is an R package designed to integrate multi-omics data in order to identify and visualize enhancer regions alongside gene expression and other epigenomic modifications. Regulatory network analysis can be done using combinatory approaches to infer regions of significance such as enhancers, when combining ChIP and histone data. Downstream analysis can identify co-occurrence of these regions of interest with other user-supplied data, such as chromatin availability or gene expression. Finally, this package allows for results to be visualized at high resolution in a stand-alone browser.


Author(s):  
Dehe Wang ◽  
Weiliang Fan ◽  
Xiaolong Guo ◽  
Kai Wu ◽  
Siyu Zhou ◽  
...  

Abstract Malvaceae is a family of flowering plants containing many economically important plant species including cotton, cacao and durian. Recently, the genomes of several Malvaceae species have been decoded, and many omics data were generated for individual species. However, no integrative database of multiple species, enabling users to jointly compare and analyse relevant data, is available for Malvaceae. Thus, we developed a user-friendly database named MaGenDB (http://magen.whu.edu.cn) as a functional genomics hub for the plant community. We collected the genomes of 13 Malvaceae species, and comprehensively annotated genes from different perspectives including functional RNA/protein element, gene ontology, KEGG orthology, and gene family. We processed 374 sets of diverse omics data with the ENCODE pipelines and integrated them into a customised genome browser, and designed multiple dynamic charts to present gene/RNA/protein-level knowledge such as dynamic expression profiles and functional elements. We also implemented a smart search system for efficiently mining genes. In addition, we constructed a functional comparison system to help comparative analysis between genes on multiple features in one species or across closely related species. This database and associated tools will allow users to quickly retrieve large-scale functional information for biological discovery.


2018 ◽  
Author(s):  
Mohamed Mounir ◽  
Tiago C. Silva ◽  
Marta Lucchetta ◽  
Catharina Olsen ◽  
Gianluca Bontempi ◽  
...  

ABSTRACTThe advent of Next Generation Sequencing (NGS) technologies has opened new perspectives in deciphering the genetic mechanisms underlying complex diseases. Nowadays, the amount of genomic data is massive and substantial efforts and new tools are required to unveil the information hidden in the data.The Genomic Data Commons (GDC) Data Portal is a large data collection platform that includes different genomic studies included the ones from The Cancer Genome Atlas (TCGA) and the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) initiatives, accounting for more than 40 tumor types originating from nearly 30000 patients. Such platforms, although very attractive, must make sure the stored data are easily accessible and adequately harmonized. Moreover, they have the primary focus on the data storage in a unique place, and they do not provide a comprehensive toolkit for analyses and interpretation of the data. To fulfill this urgent need, comprehensive but easily accessible computational methods for integrative analyses of genomic data without renouncing a robust statistical and theoretical framework are needed. In this context, the R/Bioconductor package TCGAbiolinks was developed, offering a variety of bioinformatics functionalities. Here we introduce new features and enhancements of TCGAbiolinks in terms of i) more accurate and flexible pipelines for differential expression analyses, ii) different methods for tumor purity estimation and filtering, iii) integration of normal samples from the Genotype-Tissue-Expression (GTEx) platform iv) support for other genomics datasets, here exemplified by the TARGET data.Evidence has shown that accounting for tumor purity is essential in the study of tumorigenesis, as these factors promote confounding behavior regarding differential expression analysis. Henceforth, we implemented these filtering procedures in TCGAbiolinks. Moreover, a limitation of some of the TCGA datasets is the unavailability or paucity of corresponding normal samples. We thus integrated into TCGAbiolinks the possibility to use normal samples from the Genotype-Tissue Expression (GTEx) project, which is another large-scale repository cataloging gene expression from healthy individuals. The new functionalities are available in the TCGABiolinks v 2.8 and higher released in Bioconductor version 3.7.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4234 ◽  
Author(s):  
T. Jeffrey Cole ◽  
Michael S. Brewer

Background The recent proliferation of large amounts of biodiversity transcriptomic data has resulted in an ever-expanding need for scalable and user-friendly tools capable of answering large scale molecular evolution questions. FUSTr identifies gene families involved in the process of adaptation. This is a tool that finds genes in transcriptomic datasets under strong positive selection that automatically detects isoform designation patterns in transcriptome assemblies to maximize phylogenetic independence in downstream analysis. Results When applied to previously studied spider transcriptomic data as well as simulated data, FUSTr successfully grouped coding sequences into proper gene families as well as correctly identified those under strong positive selection in relatively little time. Conclusions FUSTr provides a useful tool for novice bioinformaticians to characterize the molecular evolution of organisms throughout the tree of life using large transcriptomic biodiversity datasets and can utilize multi-processor high-performance computational facilities.


2019 ◽  
Author(s):  
Shian Su ◽  
Vincent J. Carey ◽  
Lori Shepherd ◽  
Matthew Ritchie ◽  
Martin T. Morgan ◽  
...  

AbstractMotivationThe Bioconductor project, a large collection of open source software for the comprehension of large-scale biological data, continues to grow with new packages added each week, motivating the development of software tools focused on exposing package metadata to developers and users. The resulting BiocPkgTools package facilitates access to extensive metadata in computable form covering the Bioconductor package ecosystem, facilitating downstream applications such as custom reporting, data and text mining of Bioconductor package text descriptions, graph analytics over package dependencies, and custom search approaches.ResultsThe BiocPkgTools package has been incorporated into the Bioconductor project, installs using standard procedures, and runs on any system supporting R. It provides functions to load detailed package metadata, longitudinal package download statistics, package dependencies, and Bioconductor build reports, all in “tidy data” form. BiocPkgTools can convert from tidy data structures to graph structures, enabling graphbased analytics and visualization. An end-user-friendly graphical package explorer aids in task-centric package discovery. Full documentation and example use cases are included.AvailabilityThe BiocPkgTools software and complete documentation are available from Bioconductor (https://bioconductor.org/packages/BiocPkgTools).


Sign in / Sign up

Export Citation Format

Share Document