Novel PSSM-Based Approaches for Gene Identification Using Support Vector Machine

2021 ◽  
Vol 14 (2) ◽  
pp. 152-173
Author(s):  
Heena Farooq Bhat ◽  
M. Arif Wani

By understanding the function of each protein encoded in genome, the molecular mechanism of the cell can be recognized. In genome annotation field, several methods or techniques have been developed to locate or predict the patterns of genes in genome sequence. However, recognizing corresponding gene of a given protein sequence using conventional tools is inherently complicated and error prone. This paper first focuses on the issue of gene prediction and its challenges. The authors then present a novel method for identifying genes that involves a two-step process. First the research presents new features extracted from protein sequences using a position specific scoring matrix (PSSM). The PSSM profiles are converted into uniform numeric representation. Then, a new structured approach has been applied on PSSM vector which uses a decision tree-based technique for obtaining rules. Finally, the rules of single class are joined together to form a matrix which is then given as an input to SVM for classification purpose. The rules derived from algorithm correspond to genes. The authors also introduce another approach for predicting genes based on PSSM using SVM. Both the methods have been implemented on genome DNAset dataset. Empirical evaluation shows that PSSM based SAFARI approach produces better results.

2019 ◽  
Vol 16 (4) ◽  
pp. 317-324
Author(s):  
Liang Kong ◽  
Lichao Zhang ◽  
Xiaodong Han ◽  
Jinfeng Lv

Protein structural class prediction is beneficial to protein structure and function analysis. Exploring good feature representation is a key step for this prediction task. Prior works have demonstrated the effectiveness of the secondary structure based feature extraction methods especially for lowsimilarity protein sequences. However, the prediction accuracies still remain limited. To explore the potential of secondary structure information, a novel feature extraction method based on a generalized chaos game representation of predicted secondary structure is proposed. Each protein sequence is converted into a 20-dimensional distance-related statistical feature vector to characterize the distribution of secondary structure elements and segments. The feature vectors are then fed into a support vector machine classifier to predict the protein structural class. Our experiments on three widely used lowsimilarity benchmark datasets (25PDB, 1189 and 640) show that the proposed method achieves superior performance to the state-of-the-art methods. It is anticipated that our method could be extended to other graphical representations of protein sequence and be helpful in future protein research.


2019 ◽  
Vol 15 (3) ◽  
pp. 206-211 ◽  
Author(s):  
Jihui Tang ◽  
Jie Ning ◽  
Xiaoyan Liu ◽  
Baoming Wu ◽  
Rongfeng Hu

<P>Introduction: Machine Learning is a useful tool for the prediction of cell-penetration compounds as drug candidates. </P><P> Materials and Methods: In this study, we developed a novel method for predicting Cell-Penetrating Peptides (CPPs) membrane penetrating capability. For this, we used orthogonal encoding to encode amino acid and each amino acid position as one variable. Then a software of IBM spss modeler and a dataset including 533 CPPs, were used for model screening. </P><P> Results: The results indicated that the machine learning model of Support Vector Machine (SVM) was suitable for predicting membrane penetrating capability. For improvement, the three CPPs with the most longer lengths were used to predict CPPs. The penetration capability can be predicted with an accuracy of close to 95%. </P><P> Conclusion: All the results indicated that by using amino acid position as a variable can be a perspective method for predicting CPPs membrane penetrating capability.</P>


2021 ◽  
Vol 11 (2) ◽  
pp. 796
Author(s):  
Alhanoof Althnian ◽  
Duaa AlSaeed ◽  
Heyam Al-Baity ◽  
Amani Samha ◽  
Alanoud Bin Dris ◽  
...  

Dataset size is considered a major concern in the medical domain, where lack of data is a common occurrence. This study aims to investigate the impact of dataset size on the overall performance of supervised classification models. We examined the performance of six widely-used models in the medical field, including support vector machine (SVM), neural networks (NN), C4.5 decision tree (DT), random forest (RF), adaboost (AB), and naïve Bayes (NB) on eighteen small medical UCI datasets. We further implemented three dataset size reduction scenarios on two large datasets and analyze the performance of the models when trained on each resulting dataset with respect to accuracy, precision, recall, f-score, specificity, and area under the ROC curve (AUC). Our results indicated that the overall performance of classifiers depend on how much a dataset represents the original distribution rather than its size. Moreover, we found that the most robust model for limited medical data is AB and NB, followed by SVM, and then RF and NN, while the least robust model is DT. Furthermore, an interesting observation is that a robust machine learning model to limited dataset does not necessary imply that it provides the best performance compared to other models.


2021 ◽  
pp. 016173462199809
Author(s):  
Dhurgham Al-karawi ◽  
Hisham Al-Assam ◽  
Hongbo Du ◽  
Ahmad Sayasneh ◽  
Chiara Landolfo ◽  
...  

Significant successes in machine learning approaches to image analysis for various applications have energized strong interest in automated diagnostic support systems for medical images. The evolving in-depth understanding of the way carcinogenesis changes the texture of cellular networks of a mass/tumor has been informing such diagnostics systems with use of more suitable image texture features and their extraction methods. Several texture features have been recently applied in discriminating malignant and benign ovarian masses by analysing B-mode images from ultrasound scan of the ovary with different levels of performance. However, comparative performance evaluation of these reported features using common sets of clinically approved images is lacking. This paper presents an empirical evaluation of seven commonly used texture features (histograms, moments of histogram, local binary patterns [256-bin and 59-bin], histograms of oriented gradients, fractal dimensions, and Gabor filter), using a collection of 242 ultrasound scan images of ovarian masses of various pathological characteristics. The evaluation examines not only the effectiveness of classification schemes based on the individual texture features but also the effectiveness of various combinations of these schemes using the simple majority-rule decision level fusion. Trained support vector machine classifiers on the individual texture features without any specific pre-processing, achieve levels of accuracy between 75% and 85% where the seven moments and the 256-bin LBP are at the lower end while the Gabor filter is at the upper end. Combining the classification results of the top k ( k = 3, 5, 7) best performing features further improve the overall accuracy to a level between 86% and 90%. These evaluation results demonstrate that each of the investigated image-based texture features provides informative support in distinguishing benign or malignant ovarian masses.


2014 ◽  
Vol 26 (01) ◽  
pp. 1450002 ◽  
Author(s):  
Hanguang Xiao

The early detection and intervention of artery stenosis is very important to reduce the mortality of cardiovascular disease. A novel method for predicting artery stenosis was proposed by using the input impedance of the systemic arterial tree and support vector machine (SVM). Based on the built transmission line model of a 55-segment systemic arterial tree, the input impedance of the arterial tree was calculated by using a recursive algorithm. A sample database of the input impedance was established by specifying the different positions and degrees of artery stenosis. A SVM prediction model was trained by using the sample database. 10-fold cross-validation was used to evaluate the performance of the SVM. The effects of stenosis position and degree on the accuracy of the prediction were discussed. The results showed that the mean specificity, sensitivity and overall accuracy of the SVM are 80.2%, 98.2% and 89.2%, respectively, for the 50% threshold of stenosis degree. Increasing the threshold of the stenosis degree from 10% to 90% increases the overall accuracy from 82.2% to 97.4%. Increasing the distance of the stenosis artery from the heart gradually decreases the overall accuracy from 97.1% to 58%. The deterioration of the stenosis degree to 90% increases the prediction accuracy of the SVM to more than 90% for the stenosis of peripheral artery. The simulation demonstrated theoretically the feasibility of the proposed method for predicting artery stenosis via the input impedance of the systemic arterial tree and SVM.


2006 ◽  
Vol 35 (2) ◽  
pp. 540-549 ◽  
Author(s):  
L. Krause ◽  
A. C. McHardy ◽  
T. W. Nattkemper ◽  
A. Puhler ◽  
J. Stoye ◽  
...  

2018 ◽  
Vol 45 (1) ◽  
pp. 117-135 ◽  
Author(s):  
Amna Sarwar ◽  
Zahid Mehmood ◽  
Tanzila Saba ◽  
Khurram Ashfaq Qazi ◽  
Ahmed Adnan ◽  
...  

The advancements in the multimedia technologies result in the growth of the image databases. To retrieve images from such image databases using visual attributes of the images is a challenging task due to the close visual appearance among the visual attributes of these images, which also introduces the issue of the semantic gap. In this article, we recommend a novel method established on the bag-of-words (BoW) model, which perform visual words integration of the local intensity order pattern (LIOP) feature and local binary pattern variance (LBPV) feature to reduce the issue of the semantic gap and enhance the performance of the content-based image retrieval (CBIR). The recommended method uses LIOP and LBPV features to build two smaller size visual vocabularies (one from each feature), which are integrated together to build a larger size of the visual vocabulary, which also contains complementary features of both descriptors. Because for efficient CBIR, the smaller size of the visual vocabulary improves the recall, while the bigger size of the visual vocabulary improves the precision or accuracy of the CBIR. The comparative analysis of the recommended method is performed on three image databases, namely, WANG-1K, WANG-1.5K and Holidays. The experimental analysis of the recommended method on these image databases proves its robust performance as compared with the recent CBIR methods.


Sign in / Sign up

Export Citation Format

Share Document