scholarly journals Prime-seq, efficient and powerful bulk RNA-sequencing

2021 ◽  
Author(s):  
Aleksandar Janjic ◽  
Lucas Esteban Wange ◽  
Johannes Walter Bagnoli ◽  
Johanna Geuder ◽  
Phong Nguyen ◽  
...  

With the advent of Next Generation Sequencing, RNA-sequencing (RNA-seq) has become the major method for quantitative gene expression analysis. Reducing library costs by early barcoding has propelled single-cell RNA-seq, but has not yet caught on for bulk RNA-seq. Here, we optimized and validated a bulk RNA-seq method we call prime-seq. We show that with respect to library complexity, measurement accuracy, and statistical power it performs equivalent to TruSeq, a standard bulk RNA-seq method, but is four-fold more cost-efficient due to almost 50-fold cheaper library costs. We also validate a direct RNA isolation step that further improves cost and time-efficiency, show that intronic reads are derived from RNA, validate that prime-seq performs optimal with only 1,000 cells as input, and calculate that prime-seq is the most cost-efficient bulk RNA-seq method currently available. We discuss why many labs would profit from a cost-efficient early barcoding RNA-seq protocol and argue that prime-seq is well suited for setting up such a protocol as it is well validated, well documented, and requires no specialized equipment.

Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

Abstract Background: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. Results: The data generating model in pooled experiments is defined mathematically to evaluate the the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. Conclusion: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power.


Life ◽  
2022 ◽  
Vol 12 (1) ◽  
pp. 69
Author(s):  
Davide Vacca ◽  
Antonino Fiannaca ◽  
Fabio Tramuto ◽  
Valeria Cancila ◽  
Laura La Paglia ◽  
...  

In consideration of the increasing prevalence of COVID-19 cases in several countries and the resulting demand for unbiased sequencing approaches, we performed a direct RNA sequencing (direct RNA seq.) experiment using critical oropharyngeal swab samples collected from Italian patients infected with SARS-CoV-2 from the Palermo region in Sicily. Here, we identified the sequences SARS-CoV-2 directly in RNA extracted from critical samples using the Oxford Nanopore MinION technology without prior cDNA retrotranscription. Using an appropriate bioinformatics pipeline, we could identify mutations in the nucleocapsid (N) gene, which have been reported previously in studies conducted in other countries. In conclusion, to the best of our knowledge, the technique used in this study has not been used for SARS-CoV-2 detection previously owing to the difficulties in the extraction of RNA of sufficient quantity and quality from routine oropharyngeal swabs. Despite these limitations, this approach provides the advantages of true native RNA sequencing and does not include amplification steps that could introduce systematic errors. This study can provide novel information relevant to the current strategies adopted in SARS-CoV-2 next-generation sequencing.


2015 ◽  
Vol 9s1 ◽  
pp. BBI.S28992
Author(s):  
Xin Li ◽  
Shaolei Teng

Schizophrenia (SCZ) is a serious psychiatric disorder that affects 1% of general population and places a heavy burden worldwide. The underlying genetic mechanism of SCZ remains unknown, but studies indicate that the disease is associated with a global gene expression disturbance across many genes. Next-generation sequencing, particularly of RNA sequencing (RNA-Seq), provides a powerful genome-scale technology to investigate the pathological processes of SCZ. RNA-Seq has been used to analyze the gene expressions and identify the novel splice isoforms and rare transcripts associated with SCZ. This paper provides an overview on the genetics of SCZ, the advantages of RNA-Seq for transcriptome analysis, the accomplishments of RNA-Seq in SCZ cohorts, and the applications of induced pluripotent stem cells and RNA-Seq in SCZ research.


Author(s):  
Isaac Raplee ◽  
Alexei Evsikov ◽  
Caralina Marín de Evsikova

The rapid expansion of transcriptomics from increased affordability of next-generation sequencing (NGS) technologies generates rocketing amounts of gene expression data across biology and medicine, and notably in cancer research. Concomitantly, many bioinformatics tools were developed to streamline gene expression analysis and quantification. We tested the concordance of NGS RNA sequencing (RNA-seq) analysis outcomes between the two predominant programs for reads alignment, HISAT2 and STAR, and the two most popular programs for quantifying gene expression in NGS experiments, edgeR and DESeq2, using RNA-seq data from a series of breast cancer progression specimens, which include histologically confirmed normal, early neoplasia, ductal carcinoma in situ and infiltrating ductal carcinoma samples microdissected from formalin fixed, paraffin embedded (FFPE) breast tissue blocks. We identified significant differences in aligners’ performance: HISAT2 was prone to misalign reads to retrogene genomic loci, STAR generated more precise alignments, especially for early neoplasia samples. edgeR and DESeq2 produced similar lists of differentially expressed genes in stage comparisons, with edgeR producing more conservative, though shorter, lists of genes. Albeit, Gene Ontology (GO) enrichment analysis revealed no skewness in significant GO categories identified among differentially expressed genes by edgeR vs DESeq2. As transcriptome analysis of archived FFPE samples becomes a vanguard of precision medicine, identification and fine-tuning of bioinformatics tools becomes critical for clinical research. Our results indicate that STAR and edgeR are well-suited tools for differential gene expression analysis from FFPE samples.


Author(s):  
Afzal Hussain

Next-generation sequencing or massively parallel sequencing describe DNA sequencing, RNA sequencing, or methylation sequencing, which shows its great impact on the life sciences. The recent advances of these parallel sequencing for the generation of huge amounts of data in a very short period of time as well as reducing the computing cost for the same. It plays a major role in the gene expression profiling, chromosome counting, finding out the epigenetic changes, and enabling the future of personalized medicine. Here the authors describe the NGS technologies and its application as well as applying different tools such as TopHat, Bowtie, Cufflinks, Cuffmerge, Cuffdiff for analyzing the high throughput RNA sequencing (RNA-Seq) data.


2012 ◽  
Vol 30 (15_suppl) ◽  
pp. 3065-3065
Author(s):  
Lorenza Mittempergher ◽  
Iris de Rink ◽  
Marja Nieuwland ◽  
Ron M Kerkhoven ◽  
Annuska Glas ◽  
...  

3065 Background: The development of new biomarkers often requires fresh frozen (FF) samples. Recently we showed that microarray gene expression data generated from FFPE material are comparable to data extracted from the FF counterpart, including known signatures such as the 70-gene prognosis signature (Mittempergher L et al., 2011). As described by Luo et al (2010) RNA profiling using next generation sequencing (RNA-Seq) is now applicable to archival FFPE specimens. Methods: Technical performance and the comparison between the RNA-Seq 70-gene read-out and the MammaPrint test (Glas et al., 2006) is evaluated in a series of 15 patients (11/15 with matched FFPE/FF material). RNA-Seq was carried out using minor adjustments of the Illumina TruSeq RNA preparation method. RNA sequencing libraries were prepared starting from 100ng of total RNA. Next, the DSN (Duplex-Specific Nuclease) normalization process was used to remove ribosomal RNA and other abundant transcripts (Luo et al, 2010). The libraries were paired-end sequenced on the Illumina HiSeq 2000 instrument with multiplexing of 4 libraries per lane. The resulting sequences were mapped to the human reference genome (build 37) using TopHat 1.3.1(Trapnell et al., 2009). The HTSeq-count tool was used to generate the total number of uniquely mapped reads for each gene. Results: Between 14% and 45% of the total number of reads were assigned to protein-coding genes. The minimum coverage per 1000bp of CDS was 38 reads. The 70 MammaPrint genes were successfully mapped to the RNA-Seq transcripts. We calculated the Pearson correlation coefficient between the centroids of the original good prognosis template (van’t Veer et al., 2002) and the 70-gene read count determined by RNA-Seq of each sample. Predictions based on the 70-gene RNA-Seq data showed a high agreement with the actual MammaPrint test predictions (>90%), irrespective of whether the RNA-seq was performed on FF or FFPE tissue. Conclusions: New generation RNA-sequencing is a feasible technology to assess diagnostic signatures.


GigaScience ◽  
2021 ◽  
Vol 10 (3) ◽  
Author(s):  
Holly C Beale ◽  
Jacquelyn M Roger ◽  
Matthew A Cattle ◽  
Liam T McKay ◽  
Drew K A Thompson ◽  
...  

Abstract Background The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis. Findings In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1–77% of all reads (median [IQR], 3% [3–6%]); duplicate reads constitute 3–100% of mapped reads (median [IQR], 27% [13–43%]); and non-exonic reads constitute 4–97% of mapped, non-duplicate reads (median [IQR], 25% [16–37%]). MEND reads constitute 0–79% of total reads (median [IQR], 50% [30–61%]). Conclusions Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.


2020 ◽  
Author(s):  
Alemu Takele Assefa ◽  
Jo Vandesompele ◽  
Olivier Thas

Abstract In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated data. Mathematical descriptions of the data generating mechanism in pooled experiments are used to reinforce our interpretations from the empirical and simulation studies. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of biological samples, an adequate pooling strategy is effective in maintaining the power of testing for DGE for low to medium abundance levels, along with a substantial reduction of the total cost of the experiment.


2021 ◽  
Author(s):  
Tommer Schwarz ◽  
Toni Boltz ◽  
Kangcheng Hou ◽  
Merel Bot ◽  
Chenda Duan ◽  
...  

Mapping genetic variants that regulate gene expression (eQTLs) in large-scale RNA sequencing (RNA-seq) studies is often employed to understand functional consequences of regulatory variants. However, the high cost of RNA-Seq limits sample size, sequencing depth, and therefore, discovery power. In this work, we demonstrate that, given a fixed budget, eQTL discovery power can be increased by lowering the sequencing depth per sample and increasing the number of individuals sequenced in the assay. We perform RNA-Seq of whole blood tissue across 1490 individuals at low-coverage (5.9 million reads/sample) and show that the effective power is higher than an RNA-Seq study of 570 individuals at high-coverage (13.9 million reads/sample). Next, we leverage synthetic datasets derived from real RNA-Seq data to explore the interplay of coverage and number individuals in eQTL studies, and show that a 10-fold reduction in coverage leads to only a 2.5-fold reduction in statistical power. Our study suggests that lowering coverage while increasing the number of individuals is an effective approach to increase discovery power in RNA-Seq studies.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Barry Slaff ◽  
Caleb M. Radens ◽  
Paul Jewell ◽  
Anupama Jha ◽  
Nicholas F. Lahens ◽  
...  

AbstractThe effects of confounding factors on gene expression analysis have been extensively studied following the introduction of high-throughput microarrays and subsequently RNA sequencing. In contrast, there is a lack of equivalent analysis and tools for RNA splicing. Here we first assess the effect of confounders on both expression and splicing quantifications in two large public RNA-Seq datasets (TARGET, ENCODE). We show quantification of splicing variations are affected at least as much as those of gene expression, revealing unwanted sources of variations in both datasets. Next, we develop MOCCASIN, a method to correct the effect of both known and unknown confounders on RNA splicing quantification and demonstrate MOCCASIN’s effectiveness on both synthetic and real data. Code, synthetic and corrected datasets are all made available as resources.


Sign in / Sign up

Export Citation Format

Share Document