Blind exploration of the unreferenced transcriptome reveals novel RNAs for prostate cancer diagnosis

AbstractThe broad use of RNA-sequencing technologies held a promise of improved diagnostic tools based on comprehensive transcript sets. However, mining human transcriptome data for disease biomarkers in clinical specimens is restricted by the limited power of conventional reference-based protocols relying on uniquely mapped reads and transcript annotations. Here, we implemented a blind reference-free computational protocol, DE-kupl, to directly infer RNA variations of any origin, including yet unreferenced RNAs, from high coverage total stranded RNA-sequencing datasets of tissue origin. As a bench test, this protocol was powered for detection of RNA subsequences embedded into unannotated putative long noncoding (lnc)RNAs expressed in prostate cancer tissues. Through filtering and visual inspection of 1,179 candidates, we defined 21 lncRNA probes that were further validated for robust tumor-specific expression by NanoString single molecule-based RNA measurements in 144 tissue specimens. Predictive modeling yielded a restricted probe panel enabling over 90% of true positive detection of cancer in an independent dataset from The Cancer Genome Atlas. Remarkably, this clinical signature made of only 9 unannotated lncRNAs largely outperformed PCA3, the only RNA biomarker approved by the Food and Drug Administration agency, specifically, in detection of high-risk prostate tumors. The proposed reference-free computational workflow is modular, highly sensitive and robust and can be applied to any pathology and any clinical application.

Download Full-text

Reference-free transcriptome exploration reveals novel RNAs for prostate cancer diagnosis

Life Science Alliance ◽

10.26508/lsa.201900449 ◽

2019 ◽

Vol 2 (6) ◽

pp. e201900449 ◽

Cited By ~ 2

Author(s):

Marina Pinskaya ◽

Zohra Saci ◽

Mélina Gallopin ◽

Marc Gabriel ◽

Ha TN Nguyen ◽

...

Keyword(s):

Prostate Cancer ◽

Rna Sequencing ◽

The Cancer Genome Atlas ◽

Diagnostic Tools ◽

Bench Test ◽

Specific Expression ◽

Sequencing Technologies ◽

Cancer Genome Atlas ◽

Limited Power ◽

Tissue Specimens

The use of RNA-sequencing technologies held a promise of improved diagnostic tools based on comprehensive transcript sets. However, mining human transcriptome data for disease biomarkers in clinical specimens are restricted by the limited power of conventional reference-based protocols relying on unique and annotated transcripts. Here, we implemented a blind reference-free computational protocol, DE-kupl, to infer yet unreferenced RNA variations from total stranded RNA-sequencing datasets of tissue origin. As a bench test, this protocol was powered for detection of RNA subsequences embedded into putative long noncoding (lnc)RNAs expressed in prostate cancer. Through filtering of 1,179 candidates, we defined 21 lncRNAs that were further validated by NanoString for robust tumor-specific expression in 144 tissue specimens. Predictive modeling yielded a restricted probe panel enabling more than 90% of true-positive detections of cancer in an independent The Cancer Genome Atlas cohort. Remarkably, this clinical signature made of only nine unannotated lncRNAs largely outperformed PCA3, the only used prostate cancer lncRNA biomarker, in detection of high-risk tumors. This modular workflow is highly sensitive and can be applied to any pathology or clinical application.

Download Full-text

Attomole-level Genomics with Single-molecule Direct DNA, cDNA and RNA Sequencing Technologies

Current Issues in Molecular Biology ◽

10.21775/cimb.018.043 ◽

2016 ◽

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Sequencing Technologies

Download Full-text

A comparison between single cell RNA sequencing and single molecule RNA FISH for rare cell analysis

10.1101/138289 ◽

2017 ◽

Cited By ~ 6

Author(s):

Eduardo Torre ◽

Hannah Dueck ◽

Sydney Shaffer ◽

Janko Gospocic ◽

Rohit Gupte ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Single Molecule ◽

Single Cells ◽

Cycle Phase ◽

Expression Variability ◽

Sequencing Technologies ◽

Rna Fish ◽

Single Cell Rna Sequencing ◽

Sequencing Platforms

AbstractThe development of single cell RNA sequencing technologies has emerged as a powerful means of profiling the transcriptional behavior of single cells, leveraging the breadth of sequencing measurements to make inferences about cell type. However, there is still little understanding of how well these methods perform at measuring single cell variability for small sets of genes and what “transcriptome coverage” (e.g. genes detected per cell) is needed for accurate measurements. Here, we use single molecule RNA FISH measurements of 26 genes in thousands of melanoma cells to provide an independent reference dataset to assess the performance of the DropSeq and Fluidigm single cell RNA sequencing platforms. We quantified the Gini coefficient, a measure of rare-cell expression variability, and find that the correspondence between RNA FISH and single cell RNA sequencing for Gini, unlike for mean, increases markedly with per-cell library complexity up to a threshold of ∼2000 genes detected. A similar complexity threshold also allows for robust assignment of multi-genic cell states such as cell cycle phase. Our results provide guidelines for selecting sequencing depth and complexity thresholds for single cell RNA sequencing. More generally, our results suggest that if the number of genes whose expression levels are required to answer any given biological question is small, then greater transcriptome complexity per cell is likely more important than obtaining very large numbers of cells.

Download Full-text

Trans-NanoSim characterizes and simulates nanopore RNA-sequencing data

GigaScience ◽

10.1093/gigascience/giaa061 ◽

2020 ◽

Vol 9 (6) ◽

Cited By ~ 1

Author(s):

Saber Hafezqorani ◽

Chen Yang ◽

Theodora Lo ◽

Ka Ming Nip ◽

René L Warren ◽

...

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Rapid Development ◽

Cost Effective ◽

Third Generation ◽

Sequencing Data ◽

Complementary Dna ◽

Sequencing Technologies ◽

Analytical Tools ◽

Generation Sequencing

Abstract Background Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other third-generation sequencing technologies. To aid the development of analytical tools that leverage the power of this technology, simulated data provide a cost-effective solution with ground truth. However, a nanopore sequence simulator targeting transcriptomic data is not available yet. Findings We introduce Trans-NanoSim, a tool that simulates reads with technical and transcriptome-specific features learnt from nanopore RNA-sequncing data. We comprehensively benchmarked Trans-NanoSim on direct RNA and complementary DNA datasets describing human and mouse transcriptomes. Through comparison against other nanopore read simulators, we show the unique advantage and robustness of Trans-NanoSim in capturing the characteristics of nanopore complementary DNA and direct RNA reads. Conclusions As a cost-effective alternative to sequencing real transcriptomes, Trans-NanoSim will facilitate the rapid development of analytical tools for nanopore RNA-sequencing data. Trans-NanoSim and its pre-trained models are freely accessible at https://github.com/bcgsc/NanoSim.

Download Full-text

Deficiency of NEIL3 Enhances the Chemotherapy Resistance of Prostate Cancer

International Journal of Molecular Sciences ◽

10.3390/ijms22084098 ◽

2021 ◽

Vol 22 (8) ◽

pp. 4098

Author(s):

Yiwei Wang ◽

Liuyue Xu ◽

Shanshan Shi ◽

Sha Wu ◽

Ruijie Meng ◽

...

Keyword(s):

Prostate Cancer ◽

Cell Cycle ◽

Flow Cytometry ◽

Rna Sequencing ◽

Western Blotting ◽

Target Genes ◽

Chemotherapy Resistance ◽

The Cancer Genome Atlas ◽

Serine Threonine Kinase ◽

Threonine Kinase

Acquired treatment resistance is an important cause of death in prostate cancer, and this study aimed to explore the mechanisms of chemotherapy resistance in prostate cancer. We employed castration-resistant prostate cancer (CRPC), neuroendocrine prostate cancer (NEPC), and chemotherapy-resistant prostate cancer datasets to screen for potential target genes. The Cancer Genome Atlas (TCGA) was used to detect the correlation between the target genes and prognosis and clinical characteristics. Nei endonuclease VIII-like 3 (NEIL3) knockdown cell lines were constructed with RNA interference. Prostate cancer cells were treated with enzalutamide for the androgen deprivation therapy (ADT) model, and with docetaxel and cisplatin for the chemotherapy model. Apoptosis and the cell cycle were examined using flow cytometry. RNA sequencing and western blotting were performed in the knockdown Duke University 145 (DU145) cell line to explore the possible mechanisms. The TCGA dataset demonstrated that high NEIL3 was associated with a high T stage and Gleason score, and indicated a possibility of lymph node metastasis, but a good prognosis. The cell therapy models showed that the loss of NEIL3 could promote the chemotherapy resistance (but not ADT resistance) of prostate cancer (PCa). Flow cytometry revealed that the loss of NEIL3 in PCa could inhibit cell apoptosis and cell cycle arrest under cisplatin treatment. RNA sequencing showed that the knockdown of NEIL3 changes the expression of neuroendocrine-related genes. Further western blotting revealed that the loss of NEIL3 could significantly promote the phosphorylation of ATR serine/threonine kinase (ATR) and ATM serine/threonine kinase (ATM) under chemotherapy, thus initiating downstream pathways related to DNA repair. In summary, the loss of NEIL3 promotes chemotherapy resistance in prostate cancer, and NEIL3 may serve as a diagnostic marker for chemotherapy-resistant patients.

Download Full-text

Comparative Transcriptome Profiling of Disruptive Technology, Single- Molecule Direct RNA Sequencing

Current Bioinformatics ◽

10.2174/1574893614666191017154427 ◽

2020 ◽

Vol 15 (2) ◽

pp. 165-172

Author(s):

Chaithra Pradeep ◽

Dharam Nandan ◽

Arya A. Das ◽

Dinesh Velayutham

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

Transcriptome Assembly ◽

Transcriptome Profiling ◽

Read Length ◽

Complex Nature ◽

Disruptive Technology ◽

Sequencing Technology ◽

Sequencing Technologies ◽

Long Read

Background: The standard approach for transcriptomic profiling involves high throughput short-read sequencing technology, mainly dominated by Illumina. However, the short reads have limitations in transcriptome assembly and in obtaining full-length transcripts due to the complex nature of transcriptomes with variable length and multiple alternative spliced isoforms. Recent advances in long read sequencing by the Oxford Nanopore Technologies (ONT) offered both cDNA as well as direct RNA sequencing and has brought a paradigm change in the sequencing technology to greatly improve the assembly and expression estimates. ONT enables molecules to be sequenced without fragmentation resulting in ultra-long read length enabling the entire genes and transcripts to be fully characterized. The direct RNA sequencing method, in addition, circumvents the reverse transcription and amplification steps. Objective: In this study, RNA sequencing methods were assessed by comparing data from Illumina (ILM), ONT cDNA (OCD) and ONT direct RNA (ODR). Methods: The sensitivity & specificity of the isoform detection was determined from the data generated by Illumina, ONT cDNA and ONT direct RNA sequencing technologies using Saccharomyces cerevisiae as model. Comparative studies were conducted with two pipelines to detect the isoforms, novel genes and variable gene length. Results: Mapping metrics and qualitative profiles for different pipelines are presented to understand these disruptive technologies. The variability in sequencing technology and the analysis pipeline were studied.

Download Full-text

Identification of miRNA-Mediated Subpathways as Prostate Cancer Biomarkers Based on Topological Inference in a Machine Learning Process Using Integrated Gene and miRNA Expression Data

Frontiers in Genetics ◽

10.3389/fgene.2021.656526 ◽

2021 ◽

Vol 12 ◽

Author(s):

Ziyu Ning ◽

Shuang Yu ◽

Yanqiao Zhao ◽

Xiaoming Sun ◽

Haibin Wu ◽

...

Keyword(s):

Prostate Cancer ◽

Machine Learning ◽

Single Molecule ◽

Target Genes ◽

Gene Expression Omnibus ◽

The Cancer Genome Atlas ◽

Differentially Expressed ◽

Support Vector ◽

Normal Prostate ◽

Pathway Network

Accurately identifying classification biomarkers for distinguishing between normal and cancer samples is challenging. Additionally, the reproducibility of single-molecule biomarkers is limited by the existence of heterogeneous patient subgroups and differences in the sequencing techniques used to collect patient data. In this study, we developed a method to identify robust biomarkers (i.e., miRNA-mediated subpathways) associated with prostate cancer based on normal prostate samples and cancer samples from a dataset from The Cancer Genome Atlas (TCGA; n = 546) and datasets from the Gene Expression Omnibus (GEO) database (n = 139 and n = 90, with the latter being a cell line dataset). We also obtained 10 other cancer datasets to evaluate the performance of the method. We propose a multi-omics data integration strategy for identifying classification biomarkers using a machine learning method that involves reassigning topological weights to the genes using a directed random walk (DRW)-based method. A global directed pathway network (GDPN) was constructed based on the significantly differentially expressed target genes of the significantly differentially expressed miRNAs, which allowed us to identify the robust biomarkers in the form of miRNA-mediated subpathways (miRNAs). The activity value of each miRNA-mediated subpathway was calculated by integrating multiple types of data, which included the expression of the miRNA and the miRNAs’ target genes and GDPN topological information. Finally, we identified the high-frequency miRNA-mediated subpathways involved in prostate cancer using a support vector machine (SVM) model. The results demonstrated that we obtained robust biomarkers of prostate cancer, which could classify prostate cancer and normal samples. Our method outperformed seven other methods, and many of the identified biomarkers were associated with known clinical treatments.

Download Full-text

Optimal design of single-cell RNA sequencing experiments for cell-type-specific eQTL analysis

10.1101/766972 ◽

2019 ◽

Cited By ~ 2

Author(s):

Igor Mandric ◽

Tommer Schwarz ◽

Arunabha Majumdar ◽

Richard Perez ◽

Meena Subramaniam ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Statistical Power ◽

Cost Savings ◽

Eqtl Analysis ◽

Cell Type ◽

Specific Expression ◽

High Coverage ◽

Single Cell Rna Sequencing ◽

Cell Type Specific

AbstractSingle-cell RNA-sequencing (scRNA-Seq) is a compelling approach to simultaneously measure cellular composition and state which is impossible with bulk profiling approaches. However, it has not yet become a widely used tool in population-scale analyses, due to its prohibitively high cost. Here we show that given the same budget, the statistical power of cell-type-specific expression quantitative trait loci (eQTL) mapping can be increased through low-coverage per-cell sequencing of more samples rather than high-coverage sequencing of fewer samples. We also show that multiple experimental designs with different numbers of samples, cells per sample and reads per cell could have similar statistical power, and choosing an appropriate design can yield large cost savings especially when multiplexed workflows are considered. Finally, we provide a practical approach on selecting cost-effective designs for maximizing cell-type-specific eQTL power.

Download Full-text

Transposable element expression at unique loci in single cells with CELLO-seq

10.1101/2020.10.02.322073 ◽

2020 ◽

Author(s):

Rebecca V Berrens ◽

Andrian Yang ◽

Christopher E Laumer ◽

Aaron TL Lun ◽

Florian Bieberich ◽

...

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Single Cells ◽

Biological Processes ◽

Specific Expression ◽

Protein Coding ◽

Sequencing Technologies ◽

Repetitive Nature ◽

Long Read ◽

Induced Pluripotent

AbstractThe role of Transposable Elements (TEs) in regulating diverse biological processes, from early development to cancer, is becoming increasing appreciated. However, unlike other biological processes, next generation single-cell sequencing technologies are ill-suited for assaying TE expression: in particular, their highly repetitive nature means that short cDNA reads cannot be unambiguously mapped to a specific locus. Consequently, it is extremely challenging to understand the mechanisms by which TE expression is regulated and how they might themselves regulate other protein coding genes. To resolve this, we introduce CELLO-seq, a novel method and computational framework for performing long-read RNA sequencing at single cell resolution. CELLO-seq allows for full-length RNA sequencing and enables measurement of allelic, isoform and TE expression at unique loci. We use CELLO-seq to assess the widespread expression of TEs in 2-cell mouse blastomeres as well as human induced pluripotent stem cells (hiPSCs). Across both species, old and young TEs showed evidence of locus-specific expression, with simulations demonstrating that only a small number of very young elements in the mouse could not be mapped back to with high confidence. Exploring the relationship between the expression of individual elements and putative regulators revealed surprising heterogeneity, with TEs within a class showing different patterns of correlation, suggesting distinct regulatory mechanisms.

Download Full-text

Upgrading the Repertoire of miRNAs in Gastric Adenocarcinoma to Provide a New Resource for Biomarker Discovery

International Journal of Molecular Sciences ◽

10.3390/ijms20225697 ◽

2019 ◽

Vol 20 (22) ◽

pp. 5697 ◽

Cited By ~ 2

Author(s):

Michelle E. Pewarchuk ◽

Mateus C. Barros-Filho ◽

Brenda C. Minatel ◽

David E. Cohn ◽

Florian Guisier ◽

...

Keyword(s):

Gastric Adenocarcinoma ◽

Biomarker Discovery ◽

The Cancer Genome Atlas ◽

Small Rna Sequencing ◽

Sequencing Data ◽

Specific Expression ◽

Novel Mirna ◽

Cancer Genome Atlas ◽

Context Specific ◽

Independent Cohort

Recent studies have uncovered microRNAs (miRNAs) that have been overlooked in early genomic explorations, which show remarkable tissue- and context-specific expression. Here, we aim to identify and characterize previously unannotated miRNAs expressed in gastric adenocarcinoma (GA). Raw small RNA-sequencing data were analyzed using the miRMaster platform to predict and quantify previously unannotated miRNAs. A discovery cohort of 475 gastric samples (434 GA and 41 adjacent nonmalignant samples), collected by The Cancer Genome Atlas (TCGA), were evaluated. Candidate miRNAs were similarly assessed in an independent cohort of 25 gastric samples. We discovered 170 previously unannotated miRNA candidates expressed in gastric tissues. The expression of these novel miRNAs was highly specific to the gastric samples, 143 of which were significantly deregulated between tumor and nonmalignant contexts (p-adjusted < 0.05; fold change > 1.5). Multivariate survival analyses showed that the combined expression of one previously annotated miRNA and two novel miRNA candidates was significantly predictive of patient outcome. Further, the expression of these three miRNAs was able to stratify patients into three distinct prognostic groups (p = 0.00003). These novel miRNAs were also present in the independent cohort (43 sequences detected in both cohorts). Our findings uncover novel miRNA transcripts in gastric tissues that may have implications in the biology and management of gastric adenocarcinoma.

Download Full-text