scholarly journals Bias in RNA-seq Library Preparation: Current Challenges and Solutions

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Huajuan Shi ◽  
Ying Zhou ◽  
Erteng Jia ◽  
Min Pan ◽  
Yunfei Bai ◽  
...  

Although RNA sequencing (RNA-seq) has become the most advanced technology for transcriptome analysis, it also confronts various challenges. As we all know, the workflow of RNA-seq is extremely complicated and it is easy to produce bias. This may damage the quality of RNA-seq dataset and lead to an incorrect interpretation for sequencing result. Thus, our detailed understanding of the source and nature of these biases is essential for the interpretation of RNA-seq data, finding methods to improve the quality of RNA-seq experimental, or development bioinformatics tools to compensate for these biases. Here, we discuss the sources of experimental bias in RNA-seq. And for each type of bias, we discussed the method for improvement, in order to provide some useful suggestions for researcher in RNA-seq experimental.

BMC Biology ◽  
2021 ◽  
Vol 19 (1) ◽  
Author(s):  
Milda Mickutė ◽  
Kotryna Kvederavičiūtė ◽  
Aleksandr Osipenko ◽  
Raminta Mineikaitė ◽  
Saulius Klimašauskas ◽  
...  

Abstract Background Targeted installation of designer chemical moieties on biopolymers provides an orthogonal means for their visualisation, manipulation and sequence analysis. Although high-throughput RNA sequencing is a widely used method for transcriptome analysis, certain steps, such as 3′ adapter ligation in strand-specific RNA sequencing, remain challenging due to structure- and sequence-related biases introduced by RNA ligases, leading to misrepresentation of particular RNA species. Here, we remedy this limitation by adapting two RNA 2′-O-methyltransferases from the Hen1 family for orthogonal chemo-enzymatic click tethering of a 3′ sequencing adapter that supports cDNA production by reverse transcription of the tagged RNA. Results We showed that the ssRNA-specific DmHen1 and dsRNA-specific AtHEN1 can be used to efficiently append an oligonucleotide adapter to the 3′ end of target RNA for sequencing library preparation. Using this new chemo-enzymatic approach, we identified miRNAs and prokaryotic small non-coding sRNAs in probiotic Lactobacillus casei BL23. We found that compared to a reference conventional RNA library preparation, methyltransferase-Directed Orthogonal Tagging and RNA sequencing, mDOT-seq, avoids misdetection of unspecific highly-structured RNA species, thus providing better accuracy in identifying the groups of transcripts analysed. Our results suggest that mDOT-seq has the potential to advance analysis of eukaryotic and prokaryotic ssRNAs. Conclusions Our findings provide a valuable resource for studies of the RNA-centred regulatory networks in Lactobacilli and pave the way to developing novel transcriptome and epitranscriptome profiling approaches in vitro and inside living cells. As RNA methyltransferases share the structure of the AdoMet-binding domain and several specific cofactor binding features, the basic principles of our approach could be easily translated to other AdoMet-dependent enzymes for the development of modification-specific RNA-seq techniques.


2020 ◽  
Vol 117 (6) ◽  
pp. 2886-2893 ◽  
Author(s):  
Lin Di ◽  
Yusi Fu ◽  
Yue Sun ◽  
Jie Li ◽  
Lu Liu ◽  
...  

Transcriptome profiling by RNA sequencing (RNA-seq) has been widely used to characterize cellular status, but it relies on second-strand complementary DNA (cDNA) synthesis to generate initial material for library preparation. Here we use bacterial transposase Tn5, which has been increasingly used in various high-throughput DNA analyses, to construct RNA-seq libraries without second-strand synthesis. We show that Tn5 transposome can randomly bind RNA/DNA heteroduplexes and add sequencing adapters onto RNA directly after reverse transcription. This method, Sequencing HEteRo RNA-DNA-hYbrid (SHERRY), is versatile and scalable. SHERRY accepts a wide range of starting materials, from bulk RNA to single cells. SHERRY offers a greatly simplified protocol and produces results with higher reproducibility and GC uniformity compared with prevailing RNA-seq methods.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
Beate Vieth ◽  
Swati Parekh ◽  
Christoph Ziegenhain ◽  
Wolfgang Enard ◽  
Ines Hellmann

Abstract The recent rapid spread of single cell RNA sequencing (scRNA-seq) methods has created a large variety of experimental and computational pipelines for which best practices have not yet been established. Here, we use simulations based on five scRNA-seq library protocols in combination with nine realistic differential expression (DE) setups to systematically evaluate three mapping, four imputation, seven normalisation and four differential expression testing approaches resulting in ~3000 pipelines, allowing us to also assess interactions among pipeline steps. We find that choices of normalisation and library preparation protocols have the biggest impact on scRNA-seq analyses. Specifically, we find that library preparation determines the ability to detect symmetric expression differences, while normalisation dominates pipeline performance in asymmetric DE-setups. Finally, we illustrate the importance of informed choices by showing that a good scRNA-seq pipeline can have the same impact on detecting a biological signal as quadrupling the sample size.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Thu Thi Minh Vo ◽  
Tuan Viet Nguyen ◽  
Gianluca Amoroso ◽  
Tomer Ventura ◽  
Abigail Elizur

Abstract Background The flesh pigmentation of farmed Atlantic salmon is formed by accumulation of carotenoids derived from commercial diets. In the salmon gastrointestinal system, the hindgut is considered critical in the processes of carotenoids uptake and metabolism. In Tasmania, flesh color depletion can noticeably affect farmed Atlantic salmon at different levels of severity following extremely hot summers. In this study, RNA sequencing (RNA-Seq) was performed to investigate the reduction in flesh pigmentation. Library preparation is a key step that significantly impacts the effectiveness of RNA sequencing (RNA-Seq) experiments. Besides the commonly used whole transcript RNA-Seq method, the 3’ mRNA-Seq method is being applied widely, owing to its reduced cost, enabling more repeats to be sequenced at the expense of lower resolution. Therefore, the output of the Illumina TruSeq kit (whole transcript RNA-Seq) and the Lexogen QuantSeq kit (3’ mRNA-Seq) was analyzed to identify genes in the Atlantic salmon hindgut that are differentially expressed (DEGs) between two flesh color phenotypes. Results In both methods, DEGs between the two color phenotypes were associated with metal ion transport, oxidation-reduction processes, and immune responses. We also found DEGs related to lipid metabolism in the QuantSeq method. In the TruSeq method, a missense mutation was detected in DEGs in different flesh color traits. The number of DEGs found in the TruSeq libraries was much higher than the QuantSeq; however, the trend of DEGs in both library methods was similar and validated by qPCR. Conclusions Flesh coloration in Atlantic salmon is related to lipid metabolism in which apolipoproteins, serum albumin and fatty acid-binding protein genes are hypothesized to be linked to the absorption, transport and deposition of carotenoids. Our findings suggest that Grp could inhibit the feeding behavior of low color-banded fish, resulting in the dietary carotenoid shortage. Several SNPs in genes involving in carotenoid-binding cholesterol and oxidative stress were detected in both flesh color phenotypes. Regarding the choice of the library preparation method, the selection criteria depend on the research design and purpose. The 3’ mRNA-Seq method is ideal for targeted identification of highly expressed genes, while the whole RNA-Seq method is recommended for identification of unknown genes, enabling the identification of splice variants and trait-associated SNPs, as we have found for duox2 and duoxa1.


2020 ◽  
Author(s):  
Mikhail G. Dozmorov ◽  
Katarzyna M. Tyc ◽  
Nathan C. Sheffield ◽  
David C. Boyd ◽  
Amy L. Olex ◽  
...  

AbstractSequencing of patient-derived xenograft (PDX) mouse models allows investigation of the molecular mechanisms of human tumor samples engrafted in a mouse host. Thus, both human and mouse genetic material is sequenced. Several methods have been developed to remove mouse sequencing reads from RNA-seq or exome sequencing PDX data and improve the downstream signal. However, for more recent chromatin conformation capture technologies (Hi-C), the effect of mouse reads remains undefined.We evaluated the effect of mouse read removal on the quality of Hi-C data using in silico created PDX Hi-C data with 10% and 30% mouse reads. Additionally, we generated two experimental PDX Hi-C datasets using different library preparation strategies. We evaluated three alignment strategies (Direct, Xenome, Combined) and three processing pipelines (Juicer, HiC-Pro, HiCExplorer) on the quality of Hi-C data.Removal of mouse reads had little-to-no effect on data quality than the results obtained with Direct alignment strategy. Juicer pipeline extracted the most useful information from PDX Hi-C data. However, library preparation strategy had the largest effect on all quality metrics. Together, our study presents comprehensive guidelines on PDX Hi-C data processing.


2014 ◽  
Author(s):  
Peter Acuña Combs ◽  
Michael B Eisen

Recently, a number of protocols extending RNA-sequencing to the single-cell regime have been published. However, we were concerned that the additional steps to deal with such minute quantities of input sample would introduce serious biases that would make analysis of the data using existing approaches invalid. In this study, we performed a critical evaluation of several of these low-volume RNA-seq protocols, and found that they performed slightly less well in metrics of interest to us than a more standard protocol, but with at least two orders of magnitude less sample required. We also explored a simple modification to one of these protocols that, for many samples, reduced the cost of library preparation to approximately $20/sample.


2019 ◽  
Author(s):  
Marie-Ange Palomares ◽  
Cyril Dalmasso ◽  
Eric Bonnet ◽  
Céline Derbois ◽  
Solène Brohard-Julien ◽  
...  

ABSTRACTHigh-throughput RNA-sequencing has become the gold standard method for whole-transcriptome gene expression analysis, and is widely used in numerous applications to study cell and tissue transcriptomes. It is also being increasingly used in a number of clinical applications, including expression profiling for diagnostics and alternative transcript detection. However, despite its many advantages, RNA sequencing can be challenging in some situations, for instance in cases of low input amounts or degraded RNA samples. Several protocols have been proposed to overcome these challenges, and many are available as commercial kits. In this study, we comprehensively test three recent commercial technologies for RNA-seq library preparation (TruSeq, SMARTer and SMARTer Ultra-Low) on human reference tissue preparations, using standard (1μg), low (100 and 10 ng) and ultra-low (< 1 ng) input amounts, and for mRNA and total RNA, stranded or unstranded. The results are analyzed using read quality and alignment metrics, gene detection and differential gene expression metrics. Overall, we show that the TruSeq kit performs well with an input amount of 100 ng, while the SMARTer kit shows degraded performance for inputs of 100 and 10 ng, and the SMARTer Ultra-Low kit performs relatively well for input amounts < 1 ng. All the results are discussed in detail, and we provide guidelines for biologists for the selection of a RNA-seq library preparation kit.


2020 ◽  
Author(s):  
Benjamin Kellman ◽  
Hratch Baghdassarian ◽  
Tiziano Pramparo ◽  
Isaac Shamie ◽  
Vahid Gazestani ◽  
...  

Abstract Background: Both RNA-Seq and sample freeze-thaw are ubiquitous. However, knowledge about the impact of freeze-thaw on downstream analyses is limited. The lack of common quality metrics that are sufficiently sensitive to freeze-thaw and RNA degradation, e.g. the RNA Integrity Score, makes such assessments challenging.Results: Here we quantify the impact of repeated freeze-thaw cycles on the reliability of RNA-Seq by examining poly(A)-enriched and ribosomal RNA depleted RNA-seq from frozen leukocytes drawn from a toddler Autism cohort. To do so, we estimate the relative noise, or percentage of random counts, separating technical replicates. Using this approach we measured noise associated with RIN and freeze-thaw cycles. As expected, RIN does not fully capture sample degradation due to freeze-thaw. We further examined differential expression results and found that three freeze-thaws should extinguish the differential expression reproducibility of similar experiments. Freeze-thaw also resulted in a 3’ shift in the read coverage distribution along the gene body of poly(A)-enriched samples compared to ribosomal RNA depleted samples, suggesting that library preparation may exacerbate freeze-thaw-induced sample degradation.Conclusion: The use of poly(A)-enrichment for RNA sequencing is pervasive in library preparation of frozen tissue, and thus, it is important during experimental design and data analysis to consider the impact of repeated freeze-thaw cycles on reproducibility.


2021 ◽  
Author(s):  
Thu Thi Minh Vo ◽  
Tuan Viet Nguyen ◽  
Gianluca Amoroso ◽  
Tomer Ventura ◽  
Abigail Elizur

Abstract Background: The flesh pigmentation of farmed Atlantic salmon is formed by accumulation of carotenoids derived from commercial diets. In the salmon gastrointestinal system, the hindgut is considered critical in the processes of carotenoids uptake and metabolism. In Tasmania, flesh color depletion can noticeably affect farmed Atlantic salmon at different levels of severity following extremely hot summers. In this study, RNA sequencing (RNA-Seq) was performed to investigate the reduction in flesh pigmentation. Library preparation is a key step that significantly impacts the effectiveness of RNA sequencing (RNA-Seq) experiments. Besides the commonly used whole transcript RNA-Seq method, the 3’ mRNA-Seq method is being applied widely, owing to its reduced cost, enabling more repeats to be sequenced at the expense of lower resolution. Therefore, the output of the Illumina TruSeq kit (whole transcript RNA-Seq) and the Lexogen QuantSeq kit (3’ mRNA-Seq) was analyzed to identify genes in the Atlantic salmon hindgut that are differentially expressed (DEGs) between two flesh color phenotypes.Results: In both methods, DEGs between the two color phenotypes were associated with metal ion transport, oxidation-reduction processes, and immune responses. We also found DEGs related to lipid metabolism in the QuantSeq method. In the TruSeq method, a missense mutation was detected in DEGs in different flesh color traits. The number of DEGs found in the TruSeq libraries was much higher than the QuantSeq; however, the trend of DEGs in both library methods was similar and validated by qPCR.Conclusion: Flesh coloration in Atlantic salmon is related to lipid metabolism in which apolipoproteins, serum albumin and fatty acid-binding protein genes are hypothesized to be linked to the absorption, transport and deposition of carotenoids. Our findings suggest that Grp could inhibit the feeding behavior of low color-banded fish, resulting in the dietary carotenoid shortage. Several SNPs in genes involving in carotenoid-binding cholesterol and oxidative stress were detected in both flesh color phenotypes. Regarding the choice of the library preparation method, the selection criteria depend on the research design and purpose. The 3’ mRNA-Seq method is ideal for targeted identification of highly expressed genes, while the whole RNA-Seq method is recommended for identification of unknown genes, enabling the identification of splice variants and trait-associated SNPs, as we have found for Duox2 and DuoxA1.


2018 ◽  
Author(s):  
Haridha Shivram ◽  
Vishwanath R. Iyer

AbstractThe quality of RNA sequencing data relies on specific priming by the primer used for reverse transcription (RT-primer). Non-specific annealing of the RT-primer to the RNA template can generate reads with incorrect cDNA ends and can cause misinterpretation of data (RT mispriming). This kind of artifact in RNA-seq based technologies is underappreciated and currently no adequate tools exist to computationally remove them from published datasets. We show that mispriming can occur with as little as 2 bases of complementarity at the 3’ end of the primer followed by intermittent regions of complementarity. We also provide a computational pipeline that identifies cDNA reads produced from RT mispriming, allowing users to filter them out from any aligned dataset. Using this analysis pipeline, we identify thousands of mispriming events in a dozen published datasets from diverse technologies including short RNA-seq, total/mRNA-seq, HITS-CLIP and GRO-seq. We further show how RT-mispriming can lead to misinterpretation of data. In addition to providing a solution to computationally remove RT-misprimed reads, we also propose an experimental solution to avoid RT-mispriming by performing RNA-seq using thermostable group II intron derived reverse transcriptase (TGIRT-seq).


Sign in / Sign up

Export Citation Format

Share Document