scholarly journals Baiting out a full length sequence from unmapped RNA-seq data

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Dongwei Li ◽  
Qitong Huang ◽  
Lei Huang ◽  
Jikai Wen ◽  
Jing Luo ◽  
...  

Abstract Background As a powerful tool, RNA-Seq has been widely used in various studies. Usually, unmapped RNA-seq reads have been considered as useless and been trashed or ignored. Results We develop a strategy to mining the full length sequence by unmapped reads combining with specific reverse transcription primers design and high throughput sequencing. In this study, we salvage 36 unmapped reads from standard RNA-Seq data and randomly select one 149 bp read as a model. Specific reverse transcription primers are designed to amplify its both ends, followed by next generation sequencing. Then we design a statistical model based on power law distribution to estimate its integrality and significance. Further, we validate it by Sanger sequencing. The result shows that the full length is 1556 bp, with insertion mutations in microsatellite structure. Conclusion We believe this method would be a useful strategy to extract the sequences information from the unmapped RNA-seq data. Further, it is an alternative way to get the full length sequence of unknown cDNA.

2021 ◽  
Vol 118 (13) ◽  
pp. e2025595118
Author(s):  
Hao Hu ◽  
Nora Flynn ◽  
Hailei Zhang ◽  
Chenjiang You ◽  
Runlai Hang ◽  
...  

Nicotinamide adenine diphosphate (NAD+) is a novel messenger RNA 5′ cap in Escherichia coli, yeast, mammals, and Arabidopsis. Transcriptome-wide identification of NAD+-capped RNAs (NAD-RNAs) was accomplished through NAD captureSeq, which combines chemoenzymatic RNA enrichment with high-throughput sequencing. NAD-RNAs are enzymatically converted to alkyne-RNAs that are then biotinylated using a copper-catalyzed azide–alkyne cycloaddition (CuAAC) reaction. Originally applied to E. coli RNA, which lacks the m7G cap, NAD captureSeq was then applied to eukaryotes without extensive verification of its specificity for NAD-RNAs vs. m7G-capped RNAs (m7G-RNAs). In addition, the Cu2+ ion in the CuAAC reaction causes RNA fragmentation, leading to greatly reduced yield and loss of full-length sequence information. We developed an NAD-RNA capture scheme utilizing the copper-free, strain-promoted azide–alkyne cycloaddition reaction (SPAAC). We examined the specificity of CuAAC and SPAAC reactions toward NAD-RNAs and m7G-RNAs and found that both prefer the former, but also act on the latter. We demonstrated that SPAAC-NAD sequencing (SPAAC-NAD-seq), when combined with immunodepletion of m7G-RNAs, enables NAD-RNA identification with accuracy and sensitivity, leading to the discovery of new NAD-RNA profiles in Arabidopsis. Furthermore, SPAAC-NAD-seq retained full-length sequence information. Therefore, SPAAC-NAD-seq would enable specific and efficient discovery of NAD-RNAs in prokaryotes and, when combined with m7G-RNA depletion, in eukaryotes.


2019 ◽  
Vol 21 (Supplement_6) ◽  
pp. vi101-vi101
Author(s):  
Piroon Jejaroenpun ◽  
Thidathip Wongsurawat ◽  
Annick DeLoose ◽  
David Ussery ◽  
Intawat Nookaew ◽  
...  

Abstract The RNA sequencing (RNA-Seq) technique is now routinely used to quantitatively explore genome-wide expression by various research fields including cancer research. The most common RNA-seq methodology produce billions of short-read sequencing in the range of 100–600 base pairs, from which it is occasionally difficult to reconstruct isoform-level transcriptome and fusion genes. The limitations of the short-reads can be overcome by using third-generation sequencing technologies, such as Oxford Nanopore Technologies (ONT). This study aims to perform full-length cDNA sequencing using ONT platform and investigate the abilities of ONT in (1) identifying differential gene expression, (2) detecting differential transcript isoform usage, and (3) detecting fusion genes. To do these methods, CNS-1 cells were implanted into the frontal lobes of three Lewis rats. The CNS-1 model is a histocompatible astrocytoma cell line with an invasive pattern mimicking glioblastoma (GBM). After two weeks of transplantation, the transplanted tumors and the normal brain on the other side were collected as matched normal-tumor pairs. Total RNA extracted from the samples were subjected to the full-length cDNA sequencing on a portable MinION sequencer. In tumors samples, 615 genes involved in cell cycle were upregulated, whereas 1067 genes involved in neurological functions were downregulated. Finally, we could identify differential transcript isoform expression and fusion genes from the matched normal-tumor pairs. Overall, full-length sequencing of the cDNA molecules permitted a detailed characterization of the differential gene expression, the isoform complexity, and fusion genes. In the near future, we will use these methods on human samples.


2019 ◽  
Author(s):  
Camille Sessegolo ◽  
Corinne Cruaud ◽  
Corinne Da Silva ◽  
Audric Cologne ◽  
Marion Dubarry ◽  
...  

AbstractOur vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T’s. This bias is marked for runs of at least 15 T’s, but is already detectable for runs of at least 9 T’s and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene. The entire dataset is available from http://www.genoscope.cns.fr/externe/ONT_mouse_RNA.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Camille Sessegolo ◽  
Corinne Cruaud ◽  
Corinne Da Silva ◽  
Audric Cologne ◽  
Marion Dubarry ◽  
...  

Abstract Our vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexity of any transcriptome. Generally gene expression levels are well-captured using these technologies, but there are still remaining caveats due to the limited read length and the fact that RNA molecules had to be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched a portable sequencer which offers the possibility of sequencing long reads and most importantly RNA molecules. Here we generated a full mouse transcriptome from brain and liver using the Oxford Nanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) molecules using both long and short reads technologies and tested the TeloPrime preparation kit, dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expression levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford Nanopore RNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further show that the cDNA library preparation of the Nanopore protocol induces read truncation for transcripts containing internal runs of T’s. This bias is marked for runs of at least 15 T’s, but is already detectable for runs of at least 9 T’s and therefore concerns more than 20% of expressed transcripts in mouse brain and liver. Finally, we outline that bioinformatics challenges remain ahead for quantifying at the transcript level, especially when reads are not full-length. Accurate quantification of repeat-associated genes such as processed pseudogenes also remains difficult, and we show that current mapping protocols which map reads to the genome largely over-estimate their expression, at the expense of their parent gene.


2019 ◽  
Author(s):  
Jessica M. Warren ◽  
Thalia Salinas-Giegé ◽  
Guillaume Hummel ◽  
Nicole L. Coots ◽  
Joshua M. Svendsen ◽  
...  

ABSTRACTDifferences in tRNA expression have been implicated in a remarkable number of biological processes. There is growing evidence that tRNA genes can play dramatically different roles depending on both expression and post-transcriptional modification, yet sequencing tRNAs to measure abundance and detect modifications remains challenging. Their secondary structure and extensive post-transcriptional modifications interfere with RNA-seq library preparation methods and have limited the utility of high-throughput sequencing technologies. Here, we combine two modifications to standard RNA-seq methods by treating with the demethylating enzyme AlkB and ligating with tRNA-specific adapters in order to sequence tRNAs from four species of flowering plants, a group that has been shown to have some of the most extensive rates of post-transcriptional tRNA modifications. This protocol has the advantage of detecting full-length tRNAs and sequence variants that can be used to infer many post-transcriptional modifications. We used the resulting data to produce a modification index of almost all unique reference tRNAs in Arabidopsis thaliana, which exhibited many anciently conserved similarities with humans but also positions that appear to be “hot spots” for modifications in angiosperm tRNAs. We also found evidence based on northern blot analysis and droplet digital PCR that, even after demethylation treatment, tRNA-seq can produce highly biased estimates of absolute expression levels most likely due to biased reverse transcription. Nevertheless, the generation of full-length tRNA sequences with modification data is still promising for assessing differences in relative tRNA expression across treatments, tissues or subcellular fractions and help elucidate the functional roles of tRNA modifications.


Animals ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 1423
Author(s):  
André Albuquerque ◽  
Cristina Óvilo ◽  
Yolanda Núñez ◽  
Rita Benítez ◽  
Adrián López-Garcia ◽  
...  

Gene expression is one of the main factors to influence meat quality by modulating fatty acid metabolism, composition, and deposition rates in muscle tissue. This study aimed to explore the transcriptomics of the Longissimus lumborum muscle in two local pig breeds with distinct genetic background using next-generation sequencing technology and Real-Time qPCR. RNA-seq yielded 49 differentially expressed genes between breeds, 34 overexpressed in the Alentejano (AL) and 15 in the Bísaro (BI) breed. Specific slow type myosin heavy chain components were associated with AL (MYH7) and BI (MYH3) pigs, while an overexpression of MAP3K14 in AL may be associated with their lower loin proportion, induced insulin resistance, and increased inflammatory response via NFkB activation. Overexpression of RUFY1 in AL pigs may explain the higher intramuscular (IMF) content via higher GLUT4 recruitment and consequently higher glucose uptake that can be stored as fat. Several candidate genes for lipid metabolism, excluded in the RNA-seq analysis due to low counts, such as ACLY, ADIPOQ, ELOVL6, LEP and ME1 were identified by qPCR as main gene factors defining the processes that influence meat composition and quality. These results agree with the fatter profile of the AL pig breed and adiponectin resistance can be postulated as responsible for the overexpression of MAP3K14′s coding product NIK, failing to restore insulin sensitivity.


Genes ◽  
2021 ◽  
Vol 12 (6) ◽  
pp. 794
Author(s):  
Cullen Horstmann ◽  
Victoria Davenport ◽  
Min Zhang ◽  
Alyse Peters ◽  
Kyoungtae Kim

Next-generation sequencing (NGS) technology has revolutionized sequence-based research. In recent years, high-throughput sequencing has become the method of choice in studying the toxicity of chemical agents through observing and measuring changes in transcript levels. Engineered nanomaterial (ENM)-toxicity has become a major field of research and has adopted microarray and newer RNA-Seq methods. Recently, nanotechnology has become a promising tool in the diagnosis and treatment of several diseases in humans. However, due to their high stability, they are likely capable of remaining in the body and environment for long periods of time. Their mechanisms of toxicity and long-lasting effects on our health is still poorly understood. This review explores the effects of three ENMs including carbon nanotubes (CNTs), quantum dots (QDs), and Ag nanoparticles (AgNPs) by cross examining publications on transcriptomic changes induced by these nanomaterials.


Pathogens ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 405
Author(s):  
Anna Matysiak ◽  
Michal Kabza ◽  
Justyna A. Karolak ◽  
Marcelina M. Jaworska ◽  
Malgorzata Rydzanicz ◽  
...  

The ocular microbiome composition has only been partially characterized. Here, we used RNA-sequencing (RNA-Seq) data to assess microbial diversity in human corneal tissue. Additionally, conjunctival swab samples were examined to characterize ocular surface microbiota. Short RNA-Seq reads, obtained from a previous transcriptome study of 50 corneal tissues, were mapped to the human reference genome GRCh38 to remove sequences of human origin. The unmapped reads were then used for taxonomic classification by comparing them with known bacterial, archaeal, and viral sequences from public databases. The components of microbial communities were identified and characterized using both conventional microbiology and polymerase chain reaction (PCR) techniques in 36 conjunctival swabs. The majority of ocular samples examined by conventional and molecular techniques showed very similar microbial taxonomic profiles, with most of the microorganisms being classified into Proteobacteria, Firmicutes, and Actinobacteria phyla. Only 50% of conjunctival samples exhibited bacterial growth. The PCR detection provided a broader overview of positive results for conjunctival materials. The RNA-Seq assessment revealed significant variability of the corneal microbial communities, including fastidious bacteria and viruses. The use of the combined techniques allowed for a comprehensive characterization of the eye microbiome’s elements, especially in aspects of microbiota diversity.


Sign in / Sign up

Export Citation Format

Share Document