Bias in RNA-seq Library Preparation: Current Challenges and Solutions

BioMed Research International ◽

10.1155/2021/6647597 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Huajuan Shi ◽

Ying Zhou ◽

Erteng Jia ◽

Min Pan ◽

Yunfei Bai ◽

...

Keyword(s):

Rna Sequencing ◽

Transcriptome Analysis ◽

Advanced Technology ◽

Library Preparation ◽

Rna Seq ◽

Detailed Understanding ◽

Experimental Bias ◽

Bioinformatics Tools ◽

Sequencing Result

Although RNA sequencing (RNA-seq) has become the most advanced technology for transcriptome analysis, it also confronts various challenges. As we all know, the workflow of RNA-seq is extremely complicated and it is easy to produce bias. This may damage the quality of RNA-seq dataset and lead to an incorrect interpretation for sequencing result. Thus, our detailed understanding of the source and nature of these biases is essential for the interpretation of RNA-seq data, finding methods to improve the quality of RNA-seq experimental, or development bioinformatics tools to compensate for these biases. Here, we discuss the sources of experimental bias in RNA-seq. And for each type of bias, we discussed the method for improvement, in order to provide some useful suggestions for researcher in RNA-seq experimental.

Methyltransferase-directed orthogonal tagging and sequencing of miRNAs and bacterial small RNAs

BMC Biology ◽

10.1186/s12915-021-01053-w ◽

2021 ◽

Vol 19 (1) ◽

Author(s):

Milda Mickutė ◽

Kotryna Kvederavičiūtė ◽

Aleksandr Osipenko ◽

Raminta Mineikaitė ◽

Saulius Klimašauskas ◽

...

Keyword(s):

Rna Sequencing ◽

Regulatory Networks ◽

Library Preparation ◽

Rna Seq ◽

Basic Principles ◽

Cofactor Binding ◽

Sequencing Library ◽

Sequencing Library Preparation ◽

Target Rna

Abstract Background Targeted installation of designer chemical moieties on biopolymers provides an orthogonal means for their visualisation, manipulation and sequence analysis. Although high-throughput RNA sequencing is a widely used method for transcriptome analysis, certain steps, such as 3′ adapter ligation in strand-specific RNA sequencing, remain challenging due to structure- and sequence-related biases introduced by RNA ligases, leading to misrepresentation of particular RNA species. Here, we remedy this limitation by adapting two RNA 2′-O-methyltransferases from the Hen1 family for orthogonal chemo-enzymatic click tethering of a 3′ sequencing adapter that supports cDNA production by reverse transcription of the tagged RNA. Results We showed that the ssRNA-specific DmHen1 and dsRNA-specific AtHEN1 can be used to efficiently append an oligonucleotide adapter to the 3′ end of target RNA for sequencing library preparation. Using this new chemo-enzymatic approach, we identified miRNAs and prokaryotic small non-coding sRNAs in probiotic Lactobacillus casei BL23. We found that compared to a reference conventional RNA library preparation, methyltransferase-Directed Orthogonal Tagging and RNA sequencing, mDOT-seq, avoids misdetection of unspecific highly-structured RNA species, thus providing better accuracy in identifying the groups of transcripts analysed. Our results suggest that mDOT-seq has the potential to advance analysis of eukaryotic and prokaryotic ssRNAs. Conclusions Our findings provide a valuable resource for studies of the RNA-centred regulatory networks in Lactobacilli and pave the way to developing novel transcriptome and epitranscriptome profiling approaches in vitro and inside living cells. As RNA methyltransferases share the structure of the AdoMet-binding domain and several specific cofactor binding features, the basic principles of our approach could be easily translated to other AdoMet-dependent enzymes for the development of modification-specific RNA-seq techniques.

Multiple freeze-thaw cycles lead to a loss of consistency in poly(A)-enriched RNA sequencing

10.21203/rs.3.rs-67621/v2 ◽

2020 ◽

Author(s):

Benjamin Kellman ◽

Hratch Baghdassarian ◽

Tiziano Pramparo ◽

Isaac Shamie ◽

Vahid Gazestani ◽

...

Keyword(s):

Differential Expression ◽

Rna Sequencing ◽

Ribosomal Rna ◽

Rna Degradation ◽

Library Preparation ◽

Rna Seq ◽

Rna Integrity ◽

Freeze Thaw ◽

The Impact ◽

Do So

Abstract Background: Both RNA-Seq and sample freeze-thaw are ubiquitous. However, knowledge about the impact of freeze-thaw on downstream analyses is limited. The lack of common quality metrics that are sufficiently sensitive to freeze-thaw and RNA degradation, e.g. the RNA Integrity Score, makes such assessments challenging.Results: Here we quantify the impact of repeated freeze-thaw cycles on the reliability of RNA-Seq by examining poly(A)-enriched and ribosomal RNA depleted RNA-seq from frozen leukocytes drawn from a toddler Autism cohort. To do so, we estimate the relative noise, or percentage of random counts, separating technical replicates. Using this approach we measured noise associated with RIN and freeze-thaw cycles. As expected, RIN does not fully capture sample degradation due to freeze-thaw. We further examined differential expression results and found that three freeze-thaws should extinguish the differential expression reproducibility of similar experiments. Freeze-thaw also resulted in a 3’ shift in the read coverage distribution along the gene body of poly(A)-enriched samples compared to ribosomal RNA depleted samples, suggesting that library preparation may exacerbate freeze-thaw-induced sample degradation.Conclusion: The use of poly(A)-enrichment for RNA sequencing is pervasive in library preparation of frozen tissue, and thus, it is important during experimental design and data analysis to consider the impact of repeated freeze-thaw cycles on reproducibility.

RNA sequencing by direct tagmentation of RNA/DNA hybrids

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1919800117 ◽

2020 ◽

Vol 117 (6) ◽

pp. 2886-2893 ◽

Cited By ~ 15

Author(s):

Lin Di ◽

Yusi Fu ◽

Yue Sun ◽

Jie Li ◽

Lu Liu ◽

...

Keyword(s):

Rna Sequencing ◽

Single Cells ◽

Transcriptome Profiling ◽

Initial Material ◽

Library Preparation ◽

Rna Seq ◽

Cdna Synthesis ◽

Complementary Dna ◽

Dna Analyses ◽

Wide Range

Transcriptome profiling by RNA sequencing (RNA-seq) has been widely used to characterize cellular status, but it relies on second-strand complementary DNA (cDNA) synthesis to generate initial material for library preparation. Here we use bacterial transposase Tn5, which has been increasingly used in various high-throughput DNA analyses, to construct RNA-seq libraries without second-strand synthesis. We show that Tn5 transposome can randomly bind RNA/DNA heteroduplexes and add sequencing adapters onto RNA directly after reverse transcription. This method, Sequencing HEteRo RNA-DNA-hYbrid (SHERRY), is versatile and scalable. SHERRY accepts a wide range of starting materials, from bulk RNA to single cells. SHERRY offers a greatly simplified protocol and produces results with higher reproducibility and GC uniformity compared with prevailing RNA-seq methods.

A systematic evaluation of single cell RNA-seq analysis pipelines

Nature Communications ◽

10.1038/s41467-019-12266-7 ◽

2019 ◽

Vol 10 (1) ◽

Cited By ~ 47

Author(s):

Beate Vieth ◽

Swati Parekh ◽

Christoph Ziegenhain ◽

Wolfgang Enard ◽

Ines Hellmann

Keyword(s):

Best Practices ◽

Sample Size ◽

Single Cell ◽

Differential Expression ◽

Rna Sequencing ◽

Systematic Evaluation ◽

Library Preparation ◽

Rna Seq ◽

Rapid Spread ◽

Single Cell Rna Sequencing

Abstract The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not yet been established. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ~3000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.

Deploying new generation sequencing for the study of flesh color depletion in Atlantic Salmon (Salmo salar)

BMC Genomics ◽

10.1186/s12864-021-07884-9 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Thu Thi Minh Vo ◽

Tuan Viet Nguyen ◽

Gianluca Amoroso ◽

Tomer Ventura ◽

Abigail Elizur

Keyword(s):

Lipid Metabolism ◽

Atlantic Salmon ◽

Rna Sequencing ◽

Metal Ion ◽

Splice Variants ◽

Library Preparation ◽

Rna Seq ◽

Flesh Color ◽

Fatty Acid Binding ◽

Farmed Atlantic Salmon

Abstract Background The flesh pigmentation of farmed Atlantic salmon is formed by accumulation of carotenoids derived from commercial diets. In the salmon gastrointestinal system, the hindgut is considered critical in the processes of carotenoids uptake and metabolism. In Tasmania, flesh color depletion can noticeably affect farmed Atlantic salmon at different levels of severity following extremely hot summers. In this study, RNA sequencing (RNA-Seq) was performed to investigate the reduction in flesh pigmentation. Library preparation is a key step that significantly impacts the effectiveness of RNA sequencing (RNA-Seq) experiments. Besides the commonly used whole transcript RNA-Seq method, the 3’ mRNA-Seq method is being applied widely, owing to its reduced cost, enabling more repeats to be sequenced at the expense of lower resolution. Therefore, the output of the Illumina TruSeq kit (whole transcript RNA-Seq) and the Lexogen QuantSeq kit (3’ mRNA-Seq) was analyzed to identify genes in the Atlantic salmon hindgut that are differentially expressed (DEGs) between two flesh color phenotypes. Results In both methods, DEGs between the two color phenotypes were associated with metal ion transport, oxidation-reduction processes, and immune responses. We also found DEGs related to lipid metabolism in the QuantSeq method. In the TruSeq method, a missense mutation was detected in DEGs in different flesh color traits. The number of DEGs found in the TruSeq libraries was much higher than the QuantSeq; however, the trend of DEGs in both library methods was similar and validated by qPCR. Conclusions Flesh coloration in Atlantic salmon is related to lipid metabolism in which apolipoproteins, serum albumin and fatty acid-binding protein genes are hypothesized to be linked to the absorption, transport and deposition of carotenoids. Our findings suggest that Grp could inhibit the feeding behavior of low color-banded fish, resulting in the dietary carotenoid shortage. Several SNPs in genes involving in carotenoid-binding cholesterol and oxidative stress were detected in both flesh color phenotypes. Regarding the choice of the library preparation method, the selection criteria depend on the research design and purpose. The 3’ mRNA-Seq method is ideal for targeted identification of highly expressed genes, while the whole RNA-Seq method is recommended for identification of unknown genes, enabling the identification of splice variants and trait-associated SNPs, as we have found for duox2 and duoxa1.

Chromatin conformation capture (Hi-C) sequencing of patient-derived xenografts: analysis guidelines

10.1101/2020.10.17.343814 ◽

2020 ◽

Author(s):

Mikhail G. Dozmorov ◽

Katarzyna M. Tyc ◽

Nathan C. Sheffield ◽

David C. Boyd ◽

Amy L. Olex ◽

...

Keyword(s):

Molecular Mechanisms ◽

Genetic Material ◽

Human Tumor ◽

Chromatin Conformation ◽

Library Preparation ◽

Rna Seq ◽

Chromatin Conformation Capture ◽

Alignment Strategy ◽

Preparation Strategy

AbstractSequencing of patient-derived xenograft (PDX) mouse models allows investigation of the molecular mechanisms of human tumor samples engrafted in a mouse host. Thus, both human and mouse genetic material is sequenced. Several methods have been developed to remove mouse sequencing reads from RNA-seq or exome sequencing PDX data and improve the downstream signal. However, for more recent chromatin conformation capture technologies (Hi-C), the effect of mouse reads remains undefined.We evaluated the effect of mouse read removal on the quality of Hi-C data using in silico created PDX Hi-C data with 10% and 30% mouse reads. Additionally, we generated two experimental PDX Hi-C datasets using different library preparation strategies. We evaluated three alignment strategies (Direct, Xenome, Combined) and three processing pipelines (Juicer, HiC-Pro, HiCExplorer) on the quality of Hi-C data.Removal of mouse reads had little-to-no effect on data quality than the results obtained with Direct alignment strategy. Juicer pipeline extracted the most useful information from PDX Hi-C data. However, library preparation strategy had the largest effect on all quality metrics. Together, our study presents comprehensive guidelines on PDX Hi-C data processing.

Validation of methods for Low-volume RNA-seq

10.1101/006130 ◽

2014 ◽

Author(s):

Peter Acuña Combs ◽

Michael B Eisen

Keyword(s):

Single Cell ◽

Rna Sequencing ◽

Critical Evaluation ◽

Standard Protocol ◽

Library Preparation ◽

Rna Seq ◽

Simple Modification ◽

Input Sample ◽

Low Volume ◽

The Cost

Recently, a number of protocols extending RNA-sequencing to the single-cell regime have been published. However, we were concerned that the additional steps to deal with such minute quantities of input sample would introduce serious biases that would make analysis of the data using existing approaches invalid. In this study, we performed a critical evaluation of several of these low-volume RNA-seq protocols, and found that they performed slightly less well in metrics of interest to us than a more standard protocol, but with at least two orders of magnitude less sample required. We also explored a simple modification to one of these protocols that, for many samples, reduced the cost of library preparation to approximately $20/sample.

Comprehensive analysis of RNA-seq kits for standard, low and ultra-low quantity samples

10.1101/524439 ◽

2019 ◽

Author(s):

Marie-Ange Palomares ◽

Cyril Dalmasso ◽

Eric Bonnet ◽

Céline Derbois ◽

Solène Brohard-Julien ◽

...

Keyword(s):

Gene Expression ◽

Rna Sequencing ◽

Alternative Transcript ◽

Library Preparation ◽

Rna Seq ◽

Reference Tissue ◽

Gold Standard Method ◽

Differential Gene ◽

Commercial Kits ◽

Whole Transcriptome

ABSTRACTHigh-throughput RNA-sequencing has become the gold standard method for whole-transcriptome gene expression analysis, and is widely used in numerous applications to study cell and tissue transcriptomes. It is also being increasingly used in a number of clinical applications, including expression profiling for diagnostics and alternative transcript detection. However, despite its many advantages, RNA sequencing can be challenging in some situations, for instance in cases of low input amounts or degraded RNA samples. Several protocols have been proposed to overcome these challenges, and many are available as commercial kits. In this study, we comprehensively test three recent commercial technologies for RNA-seq library preparation (TruSeq, SMARTer and SMARTer Ultra-Low) on human reference tissue preparations, using standard (1μg), low (100 and 10 ng) and ultra-low (< 1 ng) input amounts, and for mRNA and total RNA, stranded or unstranded. The results are analyzed using read quality and alignment metrics, gene detection and differential gene expression metrics. Overall, we show that the TruSeq kit performs well with an input amount of 100 ng, while the SMARTer kit shows degraded performance for inputs of 100 and 10 ng, and the SMARTer Ultra-Low kit performs relatively well for input amounts < 1 ng. All the results are discussed in detail, and we provide guidelines for biologists for the selection of a RNA-seq library preparation kit.

Deploying new generation sequencing for the study of flesh color depletion in Atlantic Salmon (Salmo salar)

10.21203/rs.3.rs-273128/v1 ◽

2021 ◽

Author(s):

Thu Thi Minh Vo ◽

Tuan Viet Nguyen ◽

Gianluca Amoroso ◽

Tomer Ventura ◽

Abigail Elizur

Keyword(s):

Lipid Metabolism ◽

Atlantic Salmon ◽

Rna Sequencing ◽

Metal Ion ◽

Splice Variants ◽

Library Preparation ◽

Rna Seq ◽

Flesh Color ◽

Fatty Acid Binding ◽

Farmed Atlantic Salmon

Abstract Background: The flesh pigmentation of farmed Atlantic salmon is formed by accumulation of carotenoids derived from commercial diets. In the salmon gastrointestinal system, the hindgut is considered critical in the processes of carotenoids uptake and metabolism. In Tasmania, flesh color depletion can noticeably affect farmed Atlantic salmon at different levels of severity following extremely hot summers. In this study, RNA sequencing (RNA-Seq) was performed to investigate the reduction in flesh pigmentation. Library preparation is a key step that significantly impacts the effectiveness of RNA sequencing (RNA-Seq) experiments. Besides the commonly used whole transcript RNA-Seq method, the 3’ mRNA-Seq method is being applied widely, owing to its reduced cost, enabling more repeats to be sequenced at the expense of lower resolution. Therefore, the output of the Illumina TruSeq kit (whole transcript RNA-Seq) and the Lexogen QuantSeq kit (3’ mRNA-Seq) was analyzed to identify genes in the Atlantic salmon hindgut that are differentially expressed (DEGs) between two flesh color phenotypes.Results: In both methods, DEGs between the two color phenotypes were associated with metal ion transport, oxidation-reduction processes, and immune responses. We also found DEGs related to lipid metabolism in the QuantSeq method. In the TruSeq method, a missense mutation was detected in DEGs in different flesh color traits. The number of DEGs found in the TruSeq libraries was much higher than the QuantSeq; however, the trend of DEGs in both library methods was similar and validated by qPCR.Conclusion: Flesh coloration in Atlantic salmon is related to lipid metabolism in which apolipoproteins, serum albumin and fatty acid-binding protein genes are hypothesized to be linked to the absorption, transport and deposition of carotenoids. Our findings suggest that Grp could inhibit the feeding behavior of low color-banded fish, resulting in the dietary carotenoid shortage. Several SNPs in genes involving in carotenoid-binding cholesterol and oxidative stress were detected in both flesh color phenotypes. Regarding the choice of the library preparation method, the selection criteria depend on the research design and purpose. The 3’ mRNA-Seq method is ideal for targeted identification of highly expressed genes, while the whole RNA-Seq method is recommended for identification of unknown genes, enabling the identification of splice variants and trait-associated SNPs, as we have found for Duox2 and DuoxA1.

Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies

10.1101/339887 ◽

2018 ◽

Cited By ~ 1

Author(s):

Haridha Shivram ◽

Vishwanath R. Iyer

Keyword(s):

Reverse Transcriptase ◽

Rna Sequencing ◽

Reverse Transcription ◽

Group Ii Intron ◽

Computational Pipeline ◽

Rna Seq ◽

Sequencing Data ◽

Analysis Pipeline ◽

Specific Priming

AbstractThe quality of RNA sequencing data relies on specific priming by the primer used for reverse transcription (RT-primer). Non-specific annealing of the RT-primer to the RNA template can generate reads with incorrect cDNA ends and can cause misinterpretation of data (RT mispriming). This kind of artifact in RNA-seq based technologies is underappreciated and currently no adequate tools exist to computationally remove them from published datasets. We show that mispriming can occur with as little as 2 bases of complementarity at the 3’ end of the primer followed by intermittent regions of complementarity. We also provide a computational pipeline that identifies cDNA reads produced from RT mispriming, allowing users to filter them out from any aligned dataset. Using this analysis pipeline, we identify thousands of mispriming events in a dozen published datasets from diverse technologies including short RNA-seq, total/mRNA-seq, HITS-CLIP and GRO-seq. We further show how RT-mispriming can lead to misinterpretation of data. In addition to providing a solution to computationally remove RT-misprimed reads, we also propose an experimental solution to avoid RT-mispriming by performing RNA-seq using thermostable group II intron derived reverse transcriptase (TGIRT-seq).