scholarly journals SLIDR and SLOPPR: Flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data

2020 ◽  
Author(s):  
Marius A. Wenzel ◽  
Berndt Mueller ◽  
Jonathan Pettitt

AbstractBackgroundSpliced leader (SL) trans-splicing replaces the 5’ ends of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from a different genomic location. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons have independently evolved multiple times throughout Eukarya, but our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes.ResultsHere we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5’ read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogation of sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5’ SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that 1) SLIDR correctly identifies known SLs and often discovers novel SL variants; 2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons.ConclusionsSLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya, and improve gene discovery and annotation for a wide-range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be derived from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available.

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Marius A. Wenzel ◽  
Berndt Müller ◽  
Jonathan Pettitt

Abstract Background Spliced leader (SL) trans-splicing replaces the 5′ end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. Results Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5′ read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5′ SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. Conclusions SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available.


2019 ◽  
Author(s):  
Celine Everaert ◽  
Hetty Helsmoortel ◽  
Anneleen Decock ◽  
Eva Hulstaert ◽  
Ruben Van Paemel ◽  
...  

AbstractRNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify the total RNA content outside cells are rare. Here, we evaluate the performance of the SMARTer Stranded Total RNA-Seq method in human platelet-rich plasma, platelet-free plasma, urine, conditioned medium, and extracellular vesicles (EVs) from these biofluids. We found the method to be accurate, precise, compatible with low-input volumes and able to quantify a few thousand genes. We picked up distinct classes of RNA molecules, including mRNA, lncRNA, circRNA, miscRNA and pseudogenes. Notably, the read distribution and gene content drastically differ among biofluids. In conclusion, we are the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Celine Everaert ◽  
Hetty Helsmoortel ◽  
Anneleen Decock ◽  
Eva Hulstaert ◽  
Ruben Van Paemel ◽  
...  

AbstractRNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify the total RNA content outside cells are rare. Here, we evaluate the performance of the SMARTer Stranded Total RNA-Seq method in human platelet-rich plasma, platelet-free plasma, urine, conditioned medium, and extracellular vesicles (EVs) from these biofluids. We found the method to be accurate, precise, compatible with low-input volumes and able to quantify a few thousand genes. We picked up distinct classes of RNA molecules, including mRNA, lncRNA, circRNA, miscRNA and pseudogenes. Notably, the read distribution and gene content drastically differ among biofluids. In conclusion, we are the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.


2020 ◽  
Author(s):  
Ainara Elorza ◽  
Yamile Márquez ◽  
Jorge R. Cabrera ◽  
José Luis Sánchez-Trincado ◽  
María Santos-Galindo ◽  
...  

AbstractDeregulated alternative splicing has been implicated in a wide range of pathologies. Deep RNA-sequencing has revealed global mis-splicing signatures in multiple human diseases; however, for neurodegenerative diseases, these analyses are intrinsically hampered by neuronal loss and neuroinflammation in post-mortem brains. To infer splicing alterations relevant to Huntington’s disease (HD) pathogenesis, here we performed intersect-RNA-seq analyses of human post-mortem striatal tissue and of an early symptomatic mouse model in which neuronal loss and gliosis are not yet present. Together with a human/mouse parallel motif scan analysis, this approach allowed us to identify the shared mis-splicing signature triggered by the HD-causing mutation in both species and to infer upstream deregulated splicing factors. Moreover, we identified a plethora of downstream neurodegeneration-linked effector genes, whose aberrant splicing is associated with decreased protein levels in HD patients and mice. In summary, our intersect-RNA-seq approach unveiled the pathogenic contribution of mis-splicing to HD and could be readily applied to other neurodegenerative diseases for which bona fide animal models are available.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 311
Author(s):  
Zhenqiu Liu

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.


2021 ◽  
Vol 9 (5) ◽  
pp. 890
Author(s):  
Pietro Tedesco ◽  
Fortunato Palma Esposito ◽  
Antonio Masino ◽  
Giovanni Andrea Vitale ◽  
Emiliana Tortorella ◽  
...  

Extremophilic microorganisms represent a unique source of novel natural products. Among them, cold adapted bacteria and particularly alpine microorganisms are still underexplored. Here, we describe the isolation and characterization of a novel Gram-positive, aerobic rod-shaped alpine bacterium (KRL4), isolated from sediments from the Karuola glacier in Tibet, China. Complete phenotypic analysis was performed revealing the great adaptability of the strain to a wide range of temperatures (5–40 °C), pHs (5.5–8.5), and salinities (0–15% w/v NaCl). Genome sequencing identified KRL4 as a member of the placeholder genus Exiguobacterium_A and annotation revealed that only half of the protein-encoding genes (1522 of 3079) could be assigned a putative function. An analysis of the secondary metabolite clusters revealed the presence of two uncharacterized phytoene synthase containing pathways and a novel siderophore pathway. Biological assays confirmed that the strain produces molecules with antioxidant and siderophore activities. Furthermore, intracellular extracts showed nematocidal activity towards C. elegans, suggesting that strain KRL4 is a source of anthelmintic compounds.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Yance Feng ◽  
Lei M. Li

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.


Diagnostics ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 964
Author(s):  
Sarka Benesova ◽  
Mikael Kubista ◽  
Lukas Valihrach

MicroRNAs (miRNAs) are a class of small RNA molecules that have an important regulatory role in multiple physiological and pathological processes. Their disease-specific profiles and presence in biofluids are properties that enable miRNAs to be employed as non-invasive biomarkers. In the past decades, several methods have been developed for miRNA analysis, including small RNA sequencing (RNA-seq). Small RNA-seq enables genome-wide profiling and analysis of known, as well as novel, miRNA variants. Moreover, its high sensitivity allows for profiling of low input samples such as liquid biopsies, which have now found applications in diagnostics and prognostics. Still, due to technical bias and the limited ability to capture the true miRNA representation, its potential remains unfulfilled. The introduction of many new small RNA-seq approaches that tried to minimize this bias, has led to the existence of the many small RNA-seq protocols seen today. Here, we review all current approaches to cDNA library construction used during the small RNA-seq workflow, with particular focus on their implementation in commercially available protocols. We provide an overview of each protocol and discuss their applicability. We also review recent benchmarking studies comparing each protocol’s performance and summarize the major conclusions that can be gathered from their usage. The result documents variable performance of the protocols and highlights their different applications in miRNA research. Taken together, our review provides a comprehensive overview of all the current small RNA-seq approaches, summarizes their strengths and weaknesses, and provides guidelines for their applications in miRNA research.


Author(s):  
Yuedan Fan ◽  
Wenjuan Zou ◽  
Jia Liu ◽  
Umar Al-Sheikh ◽  
Hankui Cheng ◽  
...  

AbstractSensory modalities are important for survival but the molecular mechanisms remain challenging due to the polymodal functionality of sensory neurons. Here, we report the C. elegans outer labial lateral (OLL) sensilla sensory neurons respond to touch and cold. Mechanosensation of OLL neurons resulted in cell-autonomous mechanically-evoked Ca2+ transients and rapidly-adapting mechanoreceptor currents with a very short latency. Mechanotransduction of OLL neurons might be carried by a novel Na+ conductance channel, which is insensitive to amiloride. The bona fide mechano-gated Na+-selective degenerin/epithelial Na+ channels, TRP-4, TMC, and Piezo proteins are not involved in this mechanosensation. Interestingly, OLL neurons also mediated cold but not warm responses in a cell-autonomous manner. We further showed that the cold response of OLL neurons is not mediated by the cold receptor TRPA-1 or the temperature-sensitive glutamate receptor GLR-3. Thus, we propose the polymodal functionality of OLL neurons in mechanosensation and cold sensation.


Genetics ◽  
2003 ◽  
Vol 163 (2) ◽  
pp. 759-770 ◽  
Author(s):  
Kiyotaka Nagaki ◽  
Junqi Song ◽  
Robert M Stupar ◽  
Alexander S Parokonny ◽  
Qiaoping Yuan ◽  
...  

Abstract We sequenced two maize bacterial artificial chromosome (BAC) clones anchored by the centromere-specific satellite repeat CentC. The two BACs, consisting of ∼200 kb of cytologically defined centromeric DNA, are composed exclusively of satellite sequences and retrotransposons that can be classified as centromere specific or noncentromere specific on the basis of their distribution in the maize genome. Sequence analysis suggests that the original maize sequences were composed of CentC arrays that were expanded by retrotransposon invasions. Seven centromere-specific retrotransposons of maize (CRM) were found in BAC 16H10. The CRM elements inserted randomly into either CentC monomers or other retrotransposons. Sequence comparisons of the long terminal repeats (LTRs) of individual CRM elements indicated that these elements transposed within the last 1.22 million years. We observed that all of the previously reported centromere-specific retrotransposons in rice and barley, which belong to the same family as the CRM elements, also recently transposed with the oldest element having transposed ∼3.8 million years ago. Highly conserved sequence motifs were found in the LTRs of the centromere-specific retrotransposons in the grass species, suggesting that the LTRs may be important for the centromere specificity of this retrotransposon family.


Sign in / Sign up

Export Citation Format

Share Document