scholarly journals Sequence Features of Drosha and Dicer Cleavage Sites Affect the Complexity of IsomiRs

2015 ◽  
Vol 16 (12) ◽  
pp. 8110-8127 ◽  
Author(s):  
Julia Starega-Roslan ◽  
Tomasz Witkos ◽  
Paulina Galka-Marciniak ◽  
Wlodzimierz Krzyzosiak
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Pengyu Liu ◽  
Jiangning Song ◽  
Chun-Yu Lin ◽  
Tatsuya Akutsu

Abstract Background Human dicer is an enzyme that cleaves pre-miRNAs into miRNAs. Several models have been developed to predict human dicer cleavage sites, including PHDCleav and LBSizeCleav. Given an input sequence, these models can predict whether the sequence contains a cleavage site. However, these models only consider each sequence independently and lack interpretability. Therefore, it is necessary to develop an accurate and explainable predictor, which employs relations between different sequences, to enhance the understanding of the mechanism by which human dicer cleaves pre-miRNA. Results In this study, we develop an accurate and explainable predictor for human dicer cleavage site – ReCGBM. We design relational features and class features as inputs to a lightGBM model. Computational experiments show that ReCGBM achieves the best performance compared to the existing methods. Further, we find that features in close proximity to the center of pre-miRNA are more important and make a significant contribution to the performance improvement of the developed method. Conclusions The results of this study show that ReCGBM is an interpretable and accurate predictor. Besides, the analyses of feature importance show that it might be of particular interest to consider more informative features close to the center of the pre-miRNA in future predictors.


2021 ◽  
Author(s):  
Daishin Ueno ◽  
Shotaro Yamasaki ◽  
Yuta Sadakiyo ◽  
Takumi Teruyama ◽  
Taku Demura ◽  
...  

ABSTRACTRNA degradation is critical for control of gene expression, and endonucleolytic cleavage– dependent RNA degradation is conserved among eukaryotes. Some cleavage sites are secondarily capped in the cytoplasm and identified using the CAGE method. Although uncapped cleavage sites are widespread in eukaryotes, comparatively little information has been obtained about these sites using CAGE-based degradome analysis. Previously, we developed the truncated RNA-end sequencing (TREseq) method in plant species and used it to acquire comprehensive information about uncapped cleavage sites; we observed G-rich sequences near cleavage sites. However, it remains unclear whether this finding is general to other eukaryotes. In this study, we conducted TREseq analyses in fruit flies (Drosophila melanogaster) and budding yeast (Saccharomyces cerevisiae). The results revealed specific sequence features related to RNA cleavage in D. melanogaster and S. cerevisiae that were similar to sequence patterns in Arabidopsis thaliana. Although previous studies suggest that ribosome movements are important for determining cleavage position, feature selection using a random forest classifier showed that sequences around cleavage sites were major determinant for cleaved or uncleaved sites. Together, our results suggest that sequence features around cleavage sites are critical for determining cleavage position, and that sequence-specific endonucleolytic cleavage–dependent RNA degradation is highly conserved across eukaryotes.


2017 ◽  
Author(s):  
Igor I. Titov ◽  
Pavel S. Vorozheykin

AbstractBackgroundMicroRNAs proceeds through the different canonical and non-canonical pathways; the most frequent of the non-canonical ones is the splicing-dependent biogenesis of mirtrons. We compare the mirtrons and non-mirtrons of human and mouse to explore how their maturation appears in the precursor structure around the miRNA.ResultsWe found the coherence of the overhang lengths what indicates the dependence between the cleavage sites. To explain this dependence we suggest the 2-lever model of the Dicer structure that couples the imprecisions in Drosha and Dicer. Considering the secondary structure of all animal pre-miRNAs we confirmed that single-stranded nucleotides tend to be located near the miRNA boundaries and in its center and are characterized by a higher mutation rate. The 5′ end of the canonical 5′ miRNA approaches the nearest single-stranded nucleotides what suggests the extension of the loop-counting rule from the Dicer to the Drosha cleavage site. A typical structure of the annotated mirtron pre-miRNAs differs from the canonical pre-miRNA structure and possesses the 1- and 2nt hanging ends at the hairpin base. Together with the excessive variability of the mirtron Dicer cleavage site (that could be partially explained by guanine at its ends inherited from splicing) this is one more evidence for the 2-lever model. In contrast with the canonical miRNAs the mirtrons have higher snp densities and their pre-miRNAs are inversely associated with diseases. Therefore we supported the view that mirtrons are under positive selection while canonical miRNAs are under negative one and we suggested that mirtrons are an intrinsic source of silencing variability which produces the disease-promoting variants. Finally, we considered the interference of the pre-miRNA structure and the U2snRNA:pre-mRNA basepairing. We analyzed the location of the branchpoints and found that mirtron structure tends to expose the branchpoint site what suggests that the mirtrons can readily evolve from occasional hairpins in the immediate neighbourhood of the 3′ splice site.ConclusionThe miRNA biogenesis manifests itself in the footprints of the secondary structure. Close inspection of these structural properties can help to uncover new pathways of miRNA biogenesis and to refine the known miRNA data, in particular, new non-canonical miRNAs may be predicted or the known miRNAs can be re-classified.


2020 ◽  
Vol 15 ◽  
Author(s):  
Dicle Yalcin ◽  
Hasan H. Otu

Background: Epigenetic repression mechanisms play an important role in gene regulation, specifically in cancer development. In many cases, a CpG island’s (CGI) susceptibility or resistance to methylation are shown to be contributed by local DNA sequence features. Objective: To develop unbiased machine learning models–individually and combined for different biological features–that predict the methylation propensity of a CGI. Methods: We developed our model consisting of CGI sequence features on a dataset of 75 sequences (28 prone, 47 resistant) representing a genome-wide methylation structure. We tested our model on two independent datasets that are chromosome (132 sequences) and disease (70 sequences) specific. Results: We provided improvements in prediction accuracy over previous models. Our results indicate that combined features better predict the methylation propensity of a CGI (area under the curve (AUC) ~0.81). Our global methylation classifier performs well on independent datasets reaching an AUC of ~0.82 for the complete model and an AUC of ~0.88 for the model using select sequences that better represent their classes in the training set. We report certain de novo motifs and transcription factor binding site (TFBS) motifs that are consistently better in separating prone and resistant CGIs. Conclusion: Predictive models for the methylation propensity of CGIs lead to a better understanding of disease mechanisms and can be used to classify genes based on their tendency to contain methylation prone CGIs, which may lead to preventative treatment strategies. MATLAB and Python™ scripts used for model building, prediction, and downstream analyses are available at https://github.com/dicleyalcin/methylProp_predictor.


1980 ◽  
Vol 44 (7) ◽  
pp. 1713-1715
Author(s):  
Yoshihiko SAKO ◽  
Aritsune UCHIDA ◽  
Hajime KADOTA

Sign in / Sign up

Export Citation Format

Share Document