Accurate prediction of human miRNA targets via graph modeling of the miRNA-target duplex

2018 ◽  
Vol 16 (04) ◽  
pp. 1850013 ◽  
Author(s):  
Mohammad Mohebbi ◽  
Liang Ding ◽  
Russell L. Malmberg ◽  
Cory Momany ◽  
Khaled Rasheed ◽  
...  

miRNAs are involved in many critical cellular activities through binding to their mRNA targets, e.g. in cell proliferation, differentiation, death, growth control, and developmental timing. Accurate prediction of miRNA targets can assist efficient experimental investigations on the functional roles of miRNAs. Their prediction, however, remains a challengeable task due to the lack of experimental data about the tertiary structure of miRNA-target binding duplexes. In particular, correlations of nucleotides in the binding duplexes may not be limited to the canonical Watson Crick base pairs (BPs) as they have been perceived; methods based on secondary structure prediction (typically minimum free energy (MFE)) have only had mix success. In this work, we characterized miRNA binding duplexes with a graph model to capture the correlations between pairs of nucleotides of an miRNA and its target sequences. We developed machine learning algorithms to train the graph model to predict the target sites of miRNAs. In particular, because imbalance between positive and negative samples can significantly deteriorate the performance of machine learning methods, we designed a novel method to re-sample available dataset to produce more informative data learning process. We evaluated our model and miRNA target prediction method on human miRNAs and target data obtained from mirTarBase, a database of experimentally verified miRNA-target interactions. The performance of our method in target prediction achieved a sensitivity of 86% with a false positive rate below 13%. In comparison with the state-of-the-art methods miRanda and RNAhybrid on the test data, our method outperforms both of them by a significant margin. The source codes, test sets and model files all are available at http://rna-informatics.uga.edu/?f=software&p=GraB-miTarget .

2021 ◽  
Vol 11 (10) ◽  
pp. 4443
Author(s):  
Rokas Štrimaitis ◽  
Pavel Stefanovič ◽  
Simona Ramanauskaitė ◽  
Asta Slotkienė

Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%).


2021 ◽  
Vol 22 (10) ◽  
pp. 5118
Author(s):  
Matthieu Najm ◽  
Chloé-Agathe Azencott ◽  
Benoit Playe ◽  
Véronique Stoven

Identification of the protein targets of hit molecules is essential in the drug discovery process. Target prediction with machine learning algorithms can help accelerate this search, limiting the number of required experiments. However, Drug-Target Interactions databases used for training present high statistical bias, leading to a high number of false positives, thus increasing time and cost of experimental validation campaigns. To minimize the number of false positives among predicted targets, we propose a new scheme for choosing negative examples, so that each protein and each drug appears an equal number of times in positive and negative examples. We artificially reproduce the process of target identification for three specific drugs, and more globally for 200 approved drugs. For the detailed three drug examples, and for the larger set of 200 drugs, training with the proposed scheme for the choice of negative examples improved target prediction results: the average number of false positives among the top ranked predicted targets decreased, and overall, the rank of the true targets was improved.Our method corrects databases’ statistical bias and reduces the number of false positive predictions, and therefore the number of useless experiments potentially undertaken.


2019 ◽  
Vol 14 (5) ◽  
pp. 432-445 ◽  
Author(s):  
Muniba Faiza ◽  
Khushnuma Tanveer ◽  
Saman Fatihi ◽  
Yonghua Wang ◽  
Khalid Raza

Background: MicroRNAs (miRNAs) are small non-coding RNAs that control gene expression at the post-transcriptional level through complementary base pairing with the target mRNA, leading to mRNA degradation and blocking translation process. Many dysfunctions of these small regulatory molecules have been linked to the development and progression of several diseases. Therefore, it is necessary to reliably predict potential miRNA targets. Objective: A large number of computational prediction tools have been developed which provide a faster way to find putative miRNA targets, but at the same time, their results are often inconsistent. Hence, finding a reliable, functional miRNA target is still a challenging task. Also, each tool is equipped with different algorithms, and it is difficult for the biologists to know which tool is the best choice for their study. Methods: We analyzed eleven miRNA target predictors on Drosophila melanogaster and Homo sapiens by applying significant empirical methods to evaluate and assess their accuracy and performance using experimentally validated high confident mature miRNAs and their targets. In addition, this paper also describes miRNA target prediction algorithms, and discusses common features of frequently used target prediction tools. Results: The results show that MicroT, microRNA and CoMir are the best performing tool on Drosopihla melanogaster; while TargetScan and miRmap perform well for Homo sapiens. The predicted results of each tool were combined in order to improve the performance in both the datasets, but any significant improvement is not observed in terms of true positives. Conclusion: The currently available miRNA target prediction tools greatly suffer from a large number of false positives. Therefore, computational prediction of significant targets with high statistical confidence is still an open challenge.


2020 ◽  
Vol 6 ◽  
pp. e253
Author(s):  
Nafees Sadique ◽  
Al Amin Neaz Ahmed ◽  
Md Tajul Islam ◽  
Md. Nawshad Pervage ◽  
Swakkhar Shatabda

Proteins are the building blocks of all cells in both human and all living creatures of the world. Most of the work in the living organism is performed by proteins. Proteins are polymers of amino acid monomers which are biomolecules or macromolecules. The tertiary structure of protein represents the three-dimensional shape of a protein. The functions, classification and binding sites are governed by the protein’s tertiary structure. If two protein structures are alike, then the two proteins can be of the same kind implying similar structural class and ligand binding properties. In this paper, we have used the protein tertiary structure to generate effective features for applications in structural similarity to detect structural class and ligand binding. Firstly, we have analyzed the effectiveness of a group of image-based features to predict the structural class of a protein. These features are derived from the image generated by the distance matrix of the tertiary structure of a given protein. They include local binary pattern (LBP) histogram, Gabor filtered LBP histogram, separate row multiplication matrix with uniform LBP histogram, neighbor block subtraction matrix with uniform LBP histogram and atom bond. Separate row multiplication matrix and neighbor block subtraction matrix filters, as well as atom bond, are our novels. The experiments were done on a standard benchmark dataset. We have demonstrated the effectiveness of these features over a large variety of supervised machine learning algorithms. Experiments suggest support vector machines is the best performing classifier on the selected dataset using the set of features. We believe the excellent performance of Hybrid LBP in terms of accuracy would motivate the researchers and practitioners to use it to identify protein structural class. To facilitate that, a classification model using Hybrid LBP is readily available for use at http://brl.uiu.ac.bd/PL/. Protein-ligand binding is accountable for managing the tasks of biological receptors that help to cure diseases and many more. Therefore, binding prediction between protein and ligand is important for understanding a protein’s activity or to accelerate docking computations in virtual screening-based drug design. Protein-ligand binding prediction requires three-dimensional tertiary structure of the target protein to be searched for ligand binding. In this paper, we have proposed a supervised learning algorithm for predicting protein-ligand binding, which is a similarity-based clustering approach using the same set of features. Our algorithm works better than the most popular and widely used machine learning algorithms.


Author(s):  
Sudhanshu Akarshe ◽  
Rohit Khade ◽  
Nikhil Bankar ◽  
Prashant Khedkar ◽  
Prashant Ahire

Cricket is most popular sport played in India. It has huge spectator support and the masses show great interest in predicting the outcome of games in their Test, One-day international as well as in T-20 matches. The game is having number of rules and scoring system. Numerous parameters are present such as, cricketing skills and performances, match venues which has significant effect on the outcome of a game. Such parameters, along with their interdependence create a challenge to create an accurate prediction of a game. In this project, we are going to build a rigid prediction system that takes in historical match data, player performance and predicts future match events such as final results in a victory or loss. Our system will perform this prediction using various machine learning algorithms. We describe our system and algorithms and finally present quantitative results displayed by best suited algorithm having highest accuracy. Also, representing the winning team even before the match starts and provide best suited squad of both teams.


PeerJ ◽  
2020 ◽  
Vol 8 ◽  
pp. e8359
Author(s):  
Fakiha Ashraf ◽  
Muhammad Aleem Ashraf ◽  
Xiaowen Hu ◽  
Shuzhen Zhang

Sugarcane Bacilliform Guadeloupe A Virus (SCBGAV, genus Badnavirus, family Caulimoviridae) is an emerging, deleterious pathogen of sugarcane which presents a substantial barrier to producing high sugarcane earnings. Sugarcane bacilliform viruses (SCBVs) are one of the main species that infect sugarcane. During the last 30 years, significant genetic changes in SCBV strains have been observed with a high risk of disease incidence associated with crop damage. SCBV infection may lead to significant losses in biomass production in susceptible sugarcane cultivars. The circular, double-stranded (ds) DNA genome of SCBGAV (7.4 Kb) is composed of three open reading frames (ORFs) on the positive strand that replicate by a reverse transcriptase. SCBGAV can infect sugarcane in a semipersistent manner via the insect vectors sugarcane mealybug species. In the current study, we used miRNA target prediction algorithms to identify and comprehensively analyze the genome-wide sugarcane (Saccharum officinarum L.)-encoded microRNA (miRNA) targets against the SCBGAV. Mature miRNA target sequences were retrieved from the miRBase (miRNA database) and were further analyzed for hybridization to the SCBGAV genome. Multiple computational approaches—including miRNA-target seed pairing, multiple target positions, minimum free energy, target site accessibility, maximum complementarity, pattern recognition and minimum folding energy for attachments—were considered by all algorithms. Among them, sof-miR396 was identified as the top effective candidate, capable of targeting the vital ORF3 of the SCBGAV genome. miRanda, RNA22 and RNAhybrid algorithms predicted hybridization of sof-miR396 at common locus position 3394. The predicted sugarcane miRNAs against viral mRNA targets possess antiviral activities, leading to translational inhibition by mRNA cleavage. Interaction network of sugarcane-encoded miRNAs with SCBGAV genes, created using Circos, allow analyze new targets. The finding of the present study acts as a first step towards the creation of SCBGAV-resistant sugarcane through the expression of the identified miRNAs.


2016 ◽  
Author(s):  
Azim Dehghani Amirabad ◽  
Marcel H Schulz

Deregulation of miRNAs is implicated in many diseases in particular cancer, where miRNAs can act as tumour suppressors or oncogenes. As sequence-based miRNA target predictions do not provide condition-specific information, many algorithms combine expression data for miRNAs and genes for prioritization of miRNA targets. However, common strategies prioritize miRNA-gene associations, although a miRNA may only target a subset of the alternative transcripts produced by a gene. Thus, current approaches are suboptimal. Here we address the problem of transcript and not gene based miRNA target prioritization. We show how to leverage methods that were developed for gene expression based miRNA-target prioritization for transcripts. In addition, we introduce a new multitasking based learning (MTL) method that uses structured-sparsity inducing regularization to improve accuracy of the learning. The new MTL approach performs especially favorable in small sample size settings, for genes with many transcripts and with noisy transcript expression level estimates as shown with simulated data. In an analysis of real liver cancer RNA-seq data we show that the MTL approach better predicts transcript expression and outperforms simpler approaches for miRNA-target prediction.


2009 ◽  
Vol 2009 ◽  
pp. 1-9 ◽  
Author(s):  
Christian Barbato ◽  
Ivan Arisi ◽  
Marcos E. Frizzo ◽  
Rossella Brandi ◽  
Letizia Da Sacco ◽  
...  

All microRNA (miRNA) target—finder algorithms return lists of candidate target genes. How valid is that output in a biological setting? Transcriptome analysis has proven to be a useful approach to determine mRNA targets. Time course mRNA microarray experiments may reliably identify downregulated genes in response to overexpression of specific miRNA. The approach may miss some miRNA targets that are principally downregulated at the protein level. However, the high-throughput capacity of the assay makes it an effective tool to rapidly identify a large number of promising miRNA targets. Finally, loss and gain of function miRNA genetics have the clear potential of being critical in evaluating the biological relevance of thousands of target genes predicted by bioinformatic studies and to test the degree to which miRNA-mediated regulation of any “validated” target functionally matters to the animal or plant.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Tongjun Gu ◽  
Xiwu Zhao ◽  
William Bradley Barbazuk ◽  
Ji-Hyun Lee

Abstract Background microRNAs (miRNAs) have been shown to play essential roles in a wide range of biological processes. Many computational methods have been developed to identify targets of miRNAs. However, the majority of these methods depend on pre-defined features that require considerable efforts and resources to compute and often prove suboptimal at predicting miRNA targets. Results We developed a novel hybrid deep learning-based (DL-based) approach that is capable of predicting miRNA targets at a higher accuracy. This approach integrates convolutional neural networks (CNNs) that excel in learning spatial features and recurrent neural networks (RNNs) that discern sequential features. Therefore, our approach has the advantages of learning both the intrinsic spatial and sequential features of miRNA:target. The inputs for our approach are raw sequences of miRNAs and genes that can be obtained effortlessly. We applied our approach on two human datasets from recently miRNA target prediction studies and trained two models. We demonstrated that the two models consistently outperform the previous methods according to evaluation metrics on test datasets. Comparing our approach with currently available alternatives on independent datasets shows that our approach delivers substantial improvements in performance. We also show with multiple evidences that our approach is more robust than other methods on small datasets. Our study is the first study to perform comparisons across multiple existing DL-based approaches on miRNA target prediction. Furthermore, we examined the contribution of a Max pooling layer in between the CNN and RNN and demonstrated that it improves the performance of all our models. Finally, a unified model was developed that is robust on fitting different input datasets. Conclusions We present a new DL-based approach for predicting miRNA targets and demonstrate that our approach outperforms the current alternatives. We supplied an easy-to-use tool, miTAR, at https://github.com/tjgu/miTAR. Furthermore, our analysis results support that Max Pooling generally benefits the hybrid models and potentially prevents overfitting for hybrid models.


Sign in / Sign up

Export Citation Format

Share Document