scholarly journals NASA GeneLab RNA-Seq Consensus Pipeline: Standardized Processing of Short-Read RNA-Seq Data

2020 ◽  
Author(s):  
Eliah G. Overbey ◽  
Amanda M. Saravia-Butler ◽  
Zhe Zhang ◽  
Komal S. Rathi ◽  
Homer Fogle ◽  
...  

SummaryWith the development of transcriptomic technologies, we are able to quantify precise changes in gene expression profiles from astronauts and other organisms exposed to spaceflight. Members of NASA GeneLab and GeneLab-associated analysis working groups (AWGs) have developed a consensus pipeline for analyzing short-read RNA-sequencing data from spaceflight-associated experiments. The pipeline includes quality control, read trimming, mapping, and gene quantification steps, culminating in the detection of differentially expressed genes. This data analysis pipeline and the results of its execution using data submitted to GeneLab are now all publicly available through the GeneLab database. We present here the full details and rationale for the construction of this pipeline in order to promote transparency, reproducibility and reusability of pipeline data, to provide a template for data processing of future spaceflight-relevant datasets, and to encourage cross-analysis of data from other databases with the data available in GeneLab.

2020 ◽  
Author(s):  
Marmar Moussa ◽  
Ion I. Măndoiu

AbstractThe variation in gene expression profiles of cells captured in different phases of the cell cycle can interfere with cell type identification and functional analysis of single cell RNA-Seq (scRNA-Seq) data. In this paper, we introduce SC1CC (SC1 Cell Cycle analysis tool), a computational approach for clustering and ordering single cell transcriptional profiles according to their progression along cell cycle phases. We also introduce a new robust metric, Gene Smoothness Score (GSS) for assessing the cell cycle based order of the cells. SC1CC is available as part of the SC1 web-based scRNA-Seq analysis pipeline, publicly accessible at https://sc1.engr.uconn.edu/.


2019 ◽  
Author(s):  
Morten Seirup ◽  
Li-Fang Chu ◽  
Srikumar Sengupta ◽  
Ning Leng ◽  
Hadley Browder ◽  
...  

AbstractAs newer single-cell protocols generate increasingly more cells at reduced sequencing depths, the value of a higher read depth may be overlooked. Using data from three different single-cell RNA-seq protocols that lend themselves to having either higher read depth (Smart-seq) or many cells (MARS-seq and 10X), we evaluate their ability to recapitulate biological signals in the context of pseudo-spatial reconstruction. Overall, we find gene expression profiles after spatial-reconstruction analysis are highly reproducible between datasets despite being generated by different protocols and using different computational algorithms. While UMI based protocols such as 10X and MARS-seq allow for capturing more cells, Smart-seq’s higher sensitivity and read-depth allows for analysis of lower expressed genes and isoforms. Additionally, we evaluate trade-offs for each protocol by performing subsampling analyses, and find that optimizing the balance between sequencing depth and number of cells within a protocol is important for efficient use of resources. Our analysis emphasizes the importance of selecting a protocol based on the biological questions and features of interest.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yanan Ren ◽  
Ting-You Wang ◽  
Leah C. Anderton ◽  
Qi Cao ◽  
Rendong Yang

Abstract Background Long non-coding RNAs (lncRNAs) are a growing focus in cancer research. Deciphering pathways influenced by lncRNAs is important to understand their role in cancer. Although knock-down or overexpression of lncRNAs followed by gene expression profiling in cancer cell lines are established approaches to address this problem, these experimental data are not available for a majority of the annotated lncRNAs. Results As a surrogate, we present lncGSEA, a convenient tool to predict the lncRNA associated pathways through Gene Set Enrichment Analysis of gene expression profiles from large-scale cancer patient samples. We demonstrate that lncGSEA is able to recapitulate lncRNA associated pathways supported by literature and experimental validations in multiple cancer types. Conclusions LncGSEA allows researchers to infer lncRNA regulatory pathways directly from clinical samples in oncology. LncGSEA is written in R, and is freely accessible at https://github.com/ylab-hi/lncGSEA.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Xueyi Dong ◽  
Luyi Tian ◽  
Quentin Gouil ◽  
Hasaru Kariyawasam ◽  
Shian Su ◽  
...  

Abstract Application of Oxford Nanopore Technologies’ long-read sequencing platform to transcriptomic analysis is increasing in popularity. However, such analysis can be challenging due to the high sequence error and small library sizes, which decreases quantification accuracy and reduces power for statistical testing. Here, we report the analysis of two nanopore RNA-seq datasets with the goal of obtaining gene- and isoform-level differential expression information. A dataset of synthetic, spliced, spike-in RNAs (‘sequins’) as well as a mouse neural stem cell dataset from samples with a null mutation of the epigenetic regulator Smchd1 was analysed using a mix of long-read specific tools for preprocessing together with established short-read RNA-seq methods for downstream analysis. We used limma-voom to perform differential gene expression analysis, and the novel FLAMES pipeline to perform isoform identification and quantification, followed by DRIMSeq and limma-diffSplice (with stageR) to perform differential transcript usage analysis. We compared results from the sequins dataset to the ground truth, and results of the mouse dataset to a previous short-read study on equivalent samples. Overall, our work shows that transcriptomic analysis of long-read nanopore data using long-read specific preprocessing methods together with short-read differential expression methods and software that are already in wide use can yield meaningful results.


2021 ◽  
Author(s):  
Taguchi Y-h. ◽  
Turki Turki

Abstract The integrated analysis of multiple gene expression profiles measured in distinct studies is always problematic. Especially, missing sample matching and missing common labeling between distinct studies prevent the integration of multiple studies in fully data-driven and unsupervised manner. In this study, we propose a strategy enabling the integration of multiple gene expression profiles among multiple independent studies without either labeling or sample matching, using tensor decomposition-based unsupervised feature extraction. As an example, we applied this strategy to Alzheimer’s disease (AD)-related gene expression profiles that lack exact correspondence among samples as well as AD single-cell RNA-seq (scRNA-seq) data. We found that we could select biologically reasonable genes with integrated analysis. Overall, integrated gene expression profiles can function analogously to prior learning and/or transfer learning strategies in other machine learning applications. For scRNA-seq, the proposed approach was able to drastically reduce the required computational memory.


2017 ◽  
Author(s):  
Irina Mohorianu

AbstractBackgroundRNA sequencing (RNA-seq) is widely used for RNA quantification across environmental, biological and medical sciences; it enables the description of genome-wide patterns of expression and the deduction of regulatory interactions and networks. The aim of computational analyses is to achieve an accurate output, i.e. rigorous quantification of genes/transcripts to allow a reliable prediction of differential expression (DE), despite the variable levels of noise and biases present in sequencing data. The evaluation of sequencing quality and normalization are essential components of this process.ResultsWe investigate the discriminative power of existing approaches for the quality checking of mRNA-seq data and also propose additional, quantitative, quality checks. To accommodate the analysis of a nested, multi-level design using data on D. melanogaster, we incorporated the sample layout into the analysis. We describe a “subsampling without replacement”-based normalization and identification of DE that accounts for the experimental design i.e. the hierarchy and amplitude of effect sizes within samples. We also evaluate the differential expression call in comparison to existing approaches. To assess the broader applicability of these methods, we applied this series of steps to a published set of H. sapiens mRNA-seq samples.ConclusionsThe dataset-tailored methods improved sample comparability and delivered a robust prediction of subtle gene expression changes. Overall, the proposed approach offers the potential to improve key steps in the analysis of RNA-seq data by incorporating the structure and characteristics of biological experiments into the data analysis. 38


2019 ◽  
Author(s):  
Christina Huan Shi ◽  
Kevin Y. Yip

AbstractK-mer counting has many applications in sequencing data processing and analysis. However, sequencing errors can produce many false k-mers that substantially increase the memory requirement during counting. We propose a fast k-mer counting method, CQF-deNoise, which has a novel component for dynamically identifying and removing false k-mers while preserving counting accuracy. Compared with four state-of-the-art k-mer counting methods, CQF-deNoise consumed 49-76% less memory than the second best method, but still ran competitively fast. The k-mer counts from CQF-deNoise produced cell clusters from single-cell RNA-seq data highly consistent with CellRanger but required only 5% of the running time at the same memory consumption, suggesting that CQF-deNoise can be used for a preview of cell clusters for an early detection of potential data problems, before running a much more time-consuming full analysis pipeline.


2022 ◽  
Author(s):  
Andreas B Diendorfer ◽  
Kseniya.Khamina not provided ◽  
marianne.pultar not provided

miND is a NGS data analysis pipeline for smallRNA sequencing data. In this protocol, the pipeline is setup and run on an AWS EC2 instance with example data from a public repository. Please see the publication paper on F1000 for more details on the pipeline and how to use it.


Author(s):  
Haowei Zhang ◽  
Yujin Ding ◽  
Qin Zeng ◽  
Dandan Wang ◽  
Ganglei Liu ◽  
...  

Background: Mesenteric adipose tissue (MAT) plays a critical role in the intestinal physiological ecosystems. Small and large intestines have evidently intrinsic and distinct characteristics. However, whether there exist any mesenteric differences adjacent to the small and large intestines (SMAT and LMAT) has not been properly characterized. We studied the important facets of these differences, such as morphology, gene expression, cell components and immune regulation of MATs, to characterize the mesenteric differences. Methods: The SMAT and LMAT of mice were utilized for comparison of tissue morphology. Paired mesenteric samples were analyzed by RNA-seq to clarify gene expression profiles. MAT partial excision models were constructed to illustrate the immune regulation roles of MATs, and 16S-seq was applied to detect the subsequent effect on microbiota. Results: Our data show that different segments of mesenteries have different morphological structures. SMAT not only has smaller adipocytes but also contains more fat-associated lymphoid clusters than LMAT. The gene expression profile is also discrepant between these two MATs in mice. B-cell markers were abundantly expressed in SMAT, while development-related genes were highly expressed in LMAT. Adipose-derived stem cells of LMAT exhibited higher adipogenic potential and lower proliferation rates than those of SMAT. In addition, SMAT and LMAT play different roles in immune regulation and subsequently affect microbiota components. Finally, our data clarified the described differences between SMAT and LMAT in humans. Conclusions: There were significant differences in cell morphology, gene expression profiles, cell components, biological characteristics, and immune and microbiota regulation roles between regional MATs.


Sign in / Sign up

Export Citation Format

Share Document