scholarly journals A hybrid CNN-LSTM model for pre-miRNA classification

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Abdulkadir Tasdelen ◽  
Baha Sen

AbstractmiRNAs (or microRNAs) are small, endogenous, and noncoding RNAs construct of about 22 nucleotides. Cumulative evidence from biological experiments shows that miRNAs play a fundamental and important role in various biological processes. Therefore, the classification of miRNA is a critical problem in computational biology. Due to the short length of mature miRNAs, many researchers are working on precursor miRNAs (pre-miRNAs) with longer sequences and more structural features. Pre-miRNAs can be divided into two groups as mirtrons and canonical miRNAs in terms of biogenesis differences. Compared to mirtrons, canonical miRNAs are more conserved and easier to be identified. Many existing pre-miRNA classification methods rely on manual feature extraction. Moreover, these methods focus on either sequential structure or spatial structure of pre-miRNAs. To overcome the limitations of previous models, we propose a nucleotide-level hybrid deep learning method based on a CNN and LSTM network together. The prediction resulted in 0.943 (%95 CI ± 0.014) accuracy, 0.935 (%95 CI ± 0.016) sensitivity, 0.948 (%95 CI ± 0.029) specificity, 0.925 (%95 CI ± 0.016) F1 Score and 0.880 (%95 CI ± 0.028) Matthews Correlation Coefficient. When compared to the closest results, our proposed method revealed the best results for Acc., F1 Score, MCC. These were 2.51%, 1.00%, and 2.43% higher than the closest ones, respectively. The mean of sensitivity ranked first like Linear Discriminant Analysis. The results indicate that the hybrid CNN and LSTM networks can be employed to achieve better performance for pre-miRNA classification. In future work, we study on investigation of new classification models that deliver better performance in terms of all the evaluation criteria.

2021 ◽  
Author(s):  
Sandali Lokuge ◽  
Shyaman Jayasundara ◽  
Puwasuru Ihalagedara ◽  
Damayanthi Herath ◽  
Indika Kahanda

microRNAs (miRNAs) are known as one of the small non-coding RNA molecules, which control the expressions of genes at the RNA level. They typically range 20-24 nucleotides in length and can be found in the plant and animal kingdoms and in some viruses. Computational approaches have overcome the limitations in the experimental methods and have performed well in identifying miRNAs. Compared to mature miRNAs, precursor miRNAs (pre-miRNAs) are long and have a hairpin loop structure with structural features. Therefore, most in-silico tools are implemented for the pre-miRNAs identification. This study presents a multilayer perceptron (MLP) based classifier implemented using 180 features under sequential, structural, and thermodynamic feature categories for plant pre-miRNA identification. This classifier has a 92% accuracy, 94% specificity, and 90% sensitivity. We have further tested this model with other small non-coding RNA types and obtained 78% accuracy. Furthermore, we introduce a novel dataset to train and test machine learning models, addressing the overlapping data issue in positive training and testing datasets presented in PlantMiRNAPred, a study done by Xuan et al. for the classification of real and pseudo plant pre-miRNAs. The new dataset and the classifier are deployed on a web server which is freely accessible via http://mirnafinder.shyaman.me/.


Sensors ◽  
2019 ◽  
Vol 19 (11) ◽  
pp. 2547 ◽  
Author(s):  
Tuo Gao ◽  
Yongchen Wang ◽  
Chengwu Zhang ◽  
Zachariah A. Pittman ◽  
Alexandra M. Oliveira ◽  
...  

Nanoparticle based chemical sensor arrays with four types of organo-functionalized gold nanoparticles (AuNPs) were introduced to classify 35 different teas, including black teas, green teas, and herbal teas. Integrated sensor arrays were made using microfabrication methods including photolithography and lift-off processing. Different types of nanoparticle solutions were drop-cast on separate active regions of each sensor chip. Sensor responses, expressed as the ratio of resistance change to baseline resistance (ΔR/R0), were used as input data to discriminate different aromas by statistical analysis using multivariate techniques and machine learning algorithms. With five-fold cross validation, linear discriminant analysis (LDA) gave 99% accuracy for classification of all 35 teas, and 98% and 100% accuracy for separate datasets of herbal teas, and black and green teas, respectively. We find that classification accuracy improves significantly by using multiple types of nanoparticles compared to single type nanoparticle arrays. The results suggest a promising approach to monitor the freshness and quality of tea products.


Author(s):  
Pavan Kumar ◽  
Poornima B. ◽  
Nagendraswamy H. S. ◽  
Manjunath C.

The proposed abstraction framework manipulates the visual-features from low-illuminated and underexposed images while retaining the prominent structural, medium scale details, tonal information, and suppresses the superfluous details like noise, complexity, and irregular gradient. The significant image features are refined at every stage of the work by comprehensively integrating a series of AnshuTMO and NPR filters through rigorous experiments. The work effectively preserves the structural features in the foreground of an image and diminishes the background content of an image. Effectiveness of the work has been validated by conducting experiments on the standard datasets such as Mould, Wang, and many other interesting datasets and the obtained results are compared with similar contemporary work cited in the literature. In addition, user visual feedback and the quality assessment techniques were used to evaluate the work. Image abstraction and stylization applications, constraints, challenges, and future work in the fields of NPR domain are also envisaged in this paper.


2018 ◽  
Vol 61 (5) ◽  
pp. 1497-1504
Author(s):  
Zhenjie Wang ◽  
Ke Sun ◽  
Lihui Du ◽  
Jian Yuan ◽  
Kang Tu ◽  
...  

Abstract. In this study, computer vision was used for the identification and classification of fungi on moldy paddy. To develop a rapid and efficient method for the classification of common fungal species found in stored paddy, computer vision was used to acquire images of individual colonies of growing fungi for three consecutive days. After image processing, the color, shape, and texture features were acquired and used in a subsequent discriminant analysis. Both linear (i.e., linear discriminant analysis and partial least squares discriminant analysis) and nonlinear (i.e., random forest and support vector machine [SVM]) pattern recognition models were employed for the classification of fungal colonies, and the results were compared. The results indicate that when using all of the features for three consecutive days, the performance of the nonlinear tools was superior to that of the linear tools, especially in the case of the SVM models, which achieved an accuracy of 100% on the calibration sets and an accuracy of 93.2% to 97.6% on the prediction sets. After sequential selection of projection algorithm, ten common features were selected for building the classification models. The results showed that the SVM model achieved an overall accuracy of 95.6%, 98.3%, and 99.0% on the prediction sets on days 2, 3, and 4, respectively. This work demonstrated that computer vision with several features is suitable for the identification and classification of fungi on moldy paddy based on the form of the individual colonies at an early growth stage during paddy storage. Keywords: Classification, Computer vision, Fungal colony, Feature selection, SVM.


2017 ◽  
Author(s):  
Gokmen Zararsiz ◽  
Dinçer Göksülük ◽  
Selçuk Korkmaz ◽  
Vahap Eldem ◽  
Gözde Ertürk Zararsız ◽  
...  

RNA sequencing (RNA-Seq) is a powerful technique for thegene-expression profiling of organisms that uses the capabilities of next-generation sequencing technologies.Developing gene-expression-based classification algorithms is an emerging powerful method for diagnosis, disease classification and monitoring at molecular level, as well as providing potential markers of diseases. Most of the statistical methods proposed for the classification of geneexpression data are either based on a continuous scale (eg. microarray data) or require a normal distribution assumption. Hence, these methods cannot be directly applied to RNA-Seq data since they violate both data structure and distributional assumptions. However, it is possible to apply these algorithms with appropriate modifications to RNA-Seq data. One way is to develop count-based classifiers, such as Poisson linear discriminant analysis and negative binomial linear discriminant analysis. Another way is to bring the data hierarchically closer to microarrays and apply microarray-based classifiers.In this study, we compared several classifiers including PLDA with and without power transformation, NBLDA, single SVM, bagging SVM (bagSVM), classification and regression trees (CART), and random forests (RF). We also examined the effect of several parameters such asoverdispersion, sample size, number of genes, number of classes, differential-expression rate, andthe transformation method on model performances.A comprehensive simulation study is conducted and the results are compared with the results of two miRNA and two mRNA experimental datasets. The results revealed that increasing the sample size, differential-expression rate, and number of genes and decreasing the dispersion parameter and number of groups lead to an increase in classification accuracy. Similar with differential-expression studies, the classification of RNA-Seq data requires careful attention when handling data overdispersion. We conclude that, as a count-based classifier, the power transformed PLDA and, as a microarray-based classifier, vst or rlog transformed RF and SVM clas sifiers may be a good choice for classification. An R/BIOCONDUCTOR package, MLSeq, is freely available at https://www.bioconductor.org/packages/release/bioc/html/MLSeq.html .


Sign in / Sign up

Export Citation Format

Share Document