SLIDR and SLOPPR: Flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data

AbstractBackgroundSpliced leader (SL) trans-splicing replaces the 5’ ends of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from a different genomic location. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons have independently evolved multiple times throughout Eukarya, but our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes.ResultsHere we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5’ read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogation of sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5’ SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that 1) SLIDR correctly identifies known SLs and often discovers novel SL variants; 2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons.ConclusionsSLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya, and improve gene discovery and annotation for a wide-range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be derived from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available.

Download Full-text

SLIDR and SLOPPR: flexible identification of spliced leader trans-splicing and prediction of eukaryotic operons from RNA-Seq data

BMC Bioinformatics ◽

10.1186/s12859-021-04009-7 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Marius A. Wenzel ◽

Berndt Müller ◽

Jonathan Pettitt

Keyword(s):

Evolutionary Dynamics ◽

Sequence Motifs ◽

Rna Seq ◽

Rna Molecules ◽

Spliced Leader ◽

C Elegans ◽

Wide Range ◽

Biological Insight ◽

Bona Fide ◽

Eukaryotic Genomes

Abstract Background Spliced leader (SL) trans-splicing replaces the 5′ end of pre-mRNAs with the spliced leader, an exon derived from a specialised non-coding RNA originating from elsewhere in the genome. This process is essential for resolving polycistronic pre-mRNAs produced by eukaryotic operons into monocistronic transcripts. SL trans-splicing and operons may have independently evolved multiple times throughout Eukarya, yet our understanding of these phenomena is limited to only a few well-characterised organisms, most notably C. elegans and trypanosomes. The primary barrier to systematic discovery and characterisation of SL trans-splicing and operons is the lack of computational tools for exploiting the surge of transcriptomic and genomic resources for a wide range of eukaryotes. Results Here we present two novel pipelines that automate the discovery of SLs and the prediction of operons in eukaryotic genomes from RNA-Seq data. SLIDR assembles putative SLs from 5′ read tails present after read alignment to a reference genome or transcriptome, which are then verified by interrogating corresponding SL RNA genes for sequence motifs expected in bona fide SL RNA molecules. SLOPPR identifies RNA-Seq reads that contain a given 5′ SL sequence, quantifies genome-wide SL trans-splicing events and predicts operons via distinct patterns of SL trans-splicing events across adjacent genes. We tested both pipelines with organisms known to carry out SL trans-splicing and organise their genes into operons, and demonstrate that (1) SLIDR correctly detects expected SLs and often discovers novel SL variants; (2) SLOPPR correctly identifies functionally specialised SLs, correctly predicts known operons and detects plausible novel operons. Conclusions SLIDR and SLOPPR are flexible tools that will accelerate research into the evolutionary dynamics of SL trans-splicing and operons throughout Eukarya and improve gene discovery and annotation for a wide range of eukaryotic genomes. Both pipelines are implemented in Bash and R and are built upon readily available software commonly installed on most bioinformatics servers. Biological insight can be gleaned even from sparse, low-coverage datasets, implying that an untapped wealth of information can be retrieved from existing RNA-Seq datasets as well as from novel full-isoform sequencing protocols as they become more widely available.

Download Full-text

Performance assessment of total RNA sequencing of human biofluids and extracellular vesicles

10.1101/701524 ◽

2019 ◽

Author(s):

Celine Everaert ◽

Hetty Helsmoortel ◽

Anneleen Decock ◽

Eva Hulstaert ◽

Ruben Van Paemel ◽

...

Keyword(s):

Rna Sequencing ◽

Extracellular Vesicles ◽

Platelet Rich Plasma ◽

Rna Seq ◽

Total Rna ◽

Rna Molecules ◽

Rna Profiling ◽

Wide Range ◽

Read Distribution ◽

Free Plasma

AbstractRNA profiling has emerged as a powerful tool to investigate the biomarker potential of human biofluids. However, despite enormous interest in extracellular nucleic acids, RNA sequencing methods to quantify the total RNA content outside cells are rare. Here, we evaluate the performance of the SMARTer Stranded Total RNA-Seq method in human platelet-rich plasma, platelet-free plasma, urine, conditioned medium, and extracellular vesicles (EVs) from these biofluids. We found the method to be accurate, precise, compatible with low-input volumes and able to quantify a few thousand genes. We picked up distinct classes of RNA molecules, including mRNA, lncRNA, circRNA, miscRNA and pseudogenes. Notably, the read distribution and gene content drastically differ among biofluids. In conclusion, we are the first to show that the SMARTer method can be used for unbiased unraveling of the complete transcriptome of a wide range of biofluids and their extracellular vesicles.

Download Full-text

Performance assessment of total RNA sequencing of human biofluids and extracellular vesicles

Scientific Reports ◽

10.1038/s41598-019-53892-x ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 11

Author(s):

Celine Everaert ◽

Hetty Helsmoortel ◽

Anneleen Decock ◽

Eva Hulstaert ◽

Ruben Van Paemel ◽

...

Keyword(s):

Rna Sequencing ◽

Extracellular Vesicles ◽

Platelet Rich Plasma ◽

Rna Seq ◽

Total Rna ◽

Rna Molecules ◽

Rna Profiling ◽

Wide Range ◽

Read Distribution ◽

Free Plasma

Download Full-text

Huntington’s disease-specific mis-splicing captured by human-mouse intersect-RNA-seq unveils pathogenic effectors and reduced splicing factors

10.1101/2020.05.11.086017 ◽

2020 ◽

Author(s):

Ainara Elorza ◽

Yamile Márquez ◽

Jorge R. Cabrera ◽

José Luis Sánchez-Trincado ◽

María Santos-Galindo ◽

...

Keyword(s):

Neurodegenerative Diseases ◽

Huntington's Disease ◽

Huntington’S Disease ◽

Neuronal Loss ◽

Splicing Factors ◽

Post Mortem ◽

Rna Seq ◽

Protein Levels ◽

Wide Range ◽

Bona Fide

AbstractDeregulated alternative splicing has been implicated in a wide range of pathologies. Deep RNA-sequencing has revealed global mis-splicing signatures in multiple human diseases; however, for neurodegenerative diseases, these analyses are intrinsically hampered by neuronal loss and neuroinflammation in post-mortem brains. To infer splicing alterations relevant to Huntington’s disease (HD) pathogenesis, here we performed intersect-RNA-seq analyses of human post-mortem striatal tissue and of an early symptomatic mouse model in which neuronal loss and gliosis are not yet present. Together with a human/mouse parallel motif scan analysis, this approach allowed us to identify the shared mis-splicing signature triggered by the HD-causing mutation in both species and to infer upstream deregulated splicing factors. Moreover, we identified a plethora of downstream neurodegeneration-linked effector genes, whose aberrant splicing is associated with decreased protein levels in HD patients and mice. In summary, our intersect-RNA-seq approach unveiled the pathogenic contribution of mis-splicing to HD and could be readily applied to other neurodegenerative diseases for which bona fide animal models are available.

Download Full-text

Clustering Single-Cell RNA-Seq Data with Regularized Gaussian Graphical Model

Genes ◽

10.3390/genes12020311 ◽

2021 ◽

Vol 12 (2) ◽

pp. 311

Author(s):

Zhenqiu Liu

Keyword(s):

Single Cell ◽

Free Parameter ◽

Graphical Model ◽

Expression Patterns ◽

Information Criterion ◽

Log P ◽

Rna Seq ◽

Clustering Methods ◽

Wide Range ◽

Free Parameters

Single-cell RNA-seq (scRNA-seq) is a powerful tool to measure the expression patterns of individual cells and discover heterogeneity and functional diversity among cell populations. Due to variability, it is challenging to analyze such data efficiently. Many clustering methods have been developed using at least one free parameter. Different choices for free parameters may lead to substantially different visualizations and clusters. Tuning free parameters is also time consuming. Thus there is need for a simple, robust, and efficient clustering method. In this paper, we propose a new regularized Gaussian graphical clustering (RGGC) method for scRNA-seq data. RGGC is based on high-order (partial) correlations and subspace learning, and is robust over a wide-range of a regularized parameter λ. Therefore, we can simply set λ=2 or λ=log(p) for AIC (Akaike information criterion) or BIC (Bayesian information criterion) without cross-validation. Cell subpopulations are discovered by the Louvain community detection algorithm that determines the number of clusters automatically. There is no free parameter to be tuned with RGGC. When evaluated with simulated and benchmark scRNA-seq data sets against widely used methods, RGGC is computationally efficient and one of the top performers. It can detect inter-sample cell heterogeneity, when applied to glioblastoma scRNA-seq data.

Download Full-text

Isolation and Characterization of Strain Exiguobacterium sp. KRL4, a Producer of Bioactive Secondary Metabolites from a Tibetan Glacier

Microorganisms ◽

10.3390/microorganisms9050890 ◽

2021 ◽

Vol 9 (5) ◽

pp. 890

Author(s):

Pietro Tedesco ◽

Fortunato Palma Esposito ◽

Antonio Masino ◽

Giovanni Andrea Vitale ◽

Emiliana Tortorella ◽

...

Keyword(s):

Phenotypic Analysis ◽

Biological Assays ◽

C Elegans ◽

Isolation And Characterization ◽

Wide Range ◽

Protein Encoding ◽

Nematocidal Activity ◽

Extremophilic Microorganisms ◽

Unique Source

Extremophilic microorganisms represent a unique source of novel natural products. Among them, cold adapted bacteria and particularly alpine microorganisms are still underexplored. Here, we describe the isolation and characterization of a novel Gram-positive, aerobic rod-shaped alpine bacterium (KRL4), isolated from sediments from the Karuola glacier in Tibet, China. Complete phenotypic analysis was performed revealing the great adaptability of the strain to a wide range of temperatures (5–40 °C), pHs (5.5–8.5), and salinities (0–15% w/v NaCl). Genome sequencing identified KRL4 as a member of the placeholder genus Exiguobacterium_A and annotation revealed that only half of the protein-encoding genes (1522 of 3079) could be assigned a putative function. An analysis of the secondary metabolite clusters revealed the presence of two uncharacterized phytoene synthase containing pathways and a novel siderophore pathway. Biological assays confirmed that the strain produces molecules with antioxidant and siderophore activities. Furthermore, intracellular extracts showed nematocidal activity towards C. elegans, suggesting that strain KRL4 is a source of anthelmintic compounds.

Download Full-text

MUREN: a robust and multi-reference approach of RNA-seq transcript normalization

BMC Bioinformatics ◽

10.1186/s12859-021-04288-0 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Yance Feng ◽

Lei M. Li

Keyword(s):

Biological Significance ◽

Housekeeping Genes ◽

R Package ◽

Data Sets ◽

Statistical Regression ◽

Rna Seq ◽

Least Trimmed Squares ◽

Standard Data ◽

Wide Range ◽

Multiple References

Abstract Background Normalization of RNA-seq data aims at identifying biological expression differentiation between samples by removing the effects of unwanted confounding factors. Explicitly or implicitly, the justification of normalization requires a set of housekeeping genes. However, the existence of housekeeping genes common for a very large collection of samples, especially under a wide range of conditions, is questionable. Results We propose to carry out pairwise normalization with respect to multiple references, selected from representative samples. Then the pairwise intermediates are integrated based on a linear model that adjusts the reference effects. Motivated by the notion of housekeeping genes and their statistical counterparts, we adopt the robust least trimmed squares regression in pairwise normalization. The proposed method (MUREN) is compared with other existing tools on some standard data sets. The goodness of normalization emphasizes on preserving possible asymmetric differentiation, whose biological significance is exemplified by a single cell data of cell cycle. MUREN is implemented as an R package. The code under license GPL-3 is available on the github platform: github.com/hippo-yf/MUREN and on the conda platform: anaconda.org/hippo-yf/r-muren. Conclusions MUREN performs the RNA-seq normalization using a two-step statistical regression induced from a general principle. We propose that the densities of pairwise differentiations are used to evaluate the goodness of normalization. MUREN adjusts the mode of differentiation toward zero while preserving the skewness due to biological asymmetric differentiation. Moreover, by robustly integrating pre-normalized counts with respect to multiple references, MUREN is immune to individual outlier samples.

Download Full-text

Small RNA-Sequencing: Approaches and Considerations for miRNA Analysis

Diagnostics ◽

10.3390/diagnostics11060964 ◽

2021 ◽

Vol 11 (6) ◽

pp. 964

Author(s):

Sarka Benesova ◽

Mikael Kubista ◽

Lukas Valihrach

Keyword(s):

Rna Sequencing ◽

Small Rna ◽

High Sensitivity ◽

Small Rna Sequencing ◽

Rna Seq ◽

Liquid Biopsies ◽

Comprehensive Overview ◽

Rna Molecules ◽

Novel Mirna ◽

The Many

MicroRNAs (miRNAs) are a class of small RNA molecules that have an important regulatory role in multiple physiological and pathological processes. Their disease-specific profiles and presence in biofluids are properties that enable miRNAs to be employed as non-invasive biomarkers. In the past decades, several methods have been developed for miRNA analysis, including small RNA sequencing (RNA-seq). Small RNA-seq enables genome-wide profiling and analysis of known, as well as novel, miRNA variants. Moreover, its high sensitivity allows for profiling of low input samples such as liquid biopsies, which have now found applications in diagnostics and prognostics. Still, due to technical bias and the limited ability to capture the true miRNA representation, its potential remains unfulfilled. The introduction of many new small RNA-seq approaches that tried to minimize this bias, has led to the existence of the many small RNA-seq protocols seen today. Here, we review all current approaches to cDNA library construction used during the small RNA-seq workflow, with particular focus on their implementation in commercially available protocols. We provide an overview of each protocol and discuss their applicability. We also review recent benchmarking studies comparing each protocol’s performance and summarize the major conclusions that can be gathered from their usage. The result documents variable performance of the protocols and highlights their different applications in miRNA research. Taken together, our review provides a comprehensive overview of all the current small RNA-seq approaches, summarizes their strengths and weaknesses, and provides guidelines for their applications in miRNA research.

Download Full-text

Polymodal Functionality of C. elegans OLL Neurons in Mechanosensation and Thermosensation

Neuroscience Bulletin ◽

10.1007/s12264-021-00629-4 ◽

2021 ◽

Author(s):

Yuedan Fan ◽

Wenjuan Zou ◽

Jia Liu ◽

Umar Al-Sheikh ◽

Hankui Cheng ◽

...

Keyword(s):

Glutamate Receptor ◽

Sensory Neurons ◽

Molecular Mechanisms ◽

Temperature Sensitive ◽

Cold Sensation ◽

Cold Response ◽

C Elegans ◽

Sensory Modalities ◽

A Cell ◽

Bona Fide

AbstractSensory modalities are important for survival but the molecular mechanisms remain challenging due to the polymodal functionality of sensory neurons. Here, we report the C. elegans outer labial lateral (OLL) sensilla sensory neurons respond to touch and cold. Mechanosensation of OLL neurons resulted in cell-autonomous mechanically-evoked Ca2+ transients and rapidly-adapting mechanoreceptor currents with a very short latency. Mechanotransduction of OLL neurons might be carried by a novel Na+ conductance channel, which is insensitive to amiloride. The bona fide mechano-gated Na+-selective degenerin/epithelial Na+ channels, TRP-4, TMC, and Piezo proteins are not involved in this mechanosensation. Interestingly, OLL neurons also mediated cold but not warm responses in a cell-autonomous manner. We further showed that the cold response of OLL neurons is not mediated by the cold receptor TRPA-1 or the temperature-sensitive glutamate receptor GLR-3. Thus, we propose the polymodal functionality of OLL neurons in mechanosensation and cold sensation.

Download Full-text

Molecular and Cytological Analyses of Large Tracks of Centromeric DNA Reveal the Structure and Evolutionary Dynamics of Maize Centromeres

Genetics ◽

10.1093/genetics/163.2.759 ◽

2003 ◽

Vol 163 (2) ◽

pp. 759-770 ◽

Cited By ~ 5

Author(s):

Kiyotaka Nagaki ◽

Junqi Song ◽

Robert M Stupar ◽

Alexander S Parokonny ◽

Qiaoping Yuan ◽

...

Keyword(s):

Evolutionary Dynamics ◽

Artificial Chromosome ◽

Grass Species ◽

Sequence Motifs ◽

Long Terminal Repeats ◽

Satellite Repeat ◽

Sequence Comparisons ◽

Centromeric Dna ◽

Conserved Sequence ◽

Satellite Sequences

Abstract We sequenced two maize bacterial artificial chromosome (BAC) clones anchored by the centromere-specific satellite repeat CentC. The two BACs, consisting of ∼200 kb of cytologically defined centromeric DNA, are composed exclusively of satellite sequences and retrotransposons that can be classified as centromere specific or noncentromere specific on the basis of their distribution in the maize genome. Sequence analysis suggests that the original maize sequences were composed of CentC arrays that were expanded by retrotransposon invasions. Seven centromere-specific retrotransposons of maize (CRM) were found in BAC 16H10. The CRM elements inserted randomly into either CentC monomers or other retrotransposons. Sequence comparisons of the long terminal repeats (LTRs) of individual CRM elements indicated that these elements transposed within the last 1.22 million years. We observed that all of the previously reported centromere-specific retrotransposons in rice and barley, which belong to the same family as the CRM elements, also recently transposed with the oldest element having transposed ∼3.8 million years ago. Highly conserved sequence motifs were found in the LTRs of the centromere-specific retrotransposons in the grass species, suggesting that the LTRs may be important for the centromere specificity of this retrotransposon family.

Download Full-text