N-MyristoylG-PseAAC: Sequence-based Prediction of N-Myristoyl Glycine Sites in Proteins by Integration of PseAAC and Statistical Moments

2019 ◽  
Vol 16 (3) ◽  
pp. 226-234 ◽  
Author(s):  
Sher Afzal Khan ◽  
Yaser Daanial Khan ◽  
Shakeel Ahmad ◽  
Khalid H. Allehaibi

N-Myristoylation, an irreversible protein modification, occurs by the covalent attachment of myristate with the N-terminal glycine of the eukaryotic and viral proteins, and is associated with a variety of pathogens and disease-related proteins. Identification of myristoylation sites through experimental mechanisms can be costly, labour associated and time-consuming. Due to the association of N-myristoylation with various diseases, its timely prediction can help in diagnosing and controlling the associated fatal diseases. Herein, we present a method named N-MyristoylG-PseAAC in which we have incorporated PseAAC with statistical moments for the prediction of N-Myristoyl Glycine (NMG) sites. A benchmark dataset of 893 positive and 1093 negative samples was collected and used in this study. For feature vector, various position and composition relative features along with the statistical moments were calculated. Later on, a back propagation neural network was trained using feature vectors and scaled conjugate gradient descent with adaptive learning was used as an optimizer. Selfconsistency testing and 10-fold cross-validation were performed to evaluate the performance of N-MyristoylG-PseAAC, by using accuracy metrics. For self-consistency testing, 99.80% Acc, 99.78% Sp, 99.81% Sn and 0.99 MCC were observed, whereas, for 10-fold cross validation, 97.18% Acc, 98.54% Sp, 96.07% Sn and 0.94 MCC were observed. Thus, it was found that the proposed predictor can help in predicting the myristoylation sites in an efficient and accurate way.

Technologies ◽  
2019 ◽  
Vol 7 (2) ◽  
pp. 30 ◽  
Author(s):  
Muhammad Fayaz ◽  
Habib Shah ◽  
Ali Aseere ◽  
Wali Mashwani ◽  
Abdul Shah

Energy is considered the most costly and scarce resource, and demand for it is increasing daily. Globally, a significant amount of energy is consumed in residential buildings, i.e., 30–40% of total energy consumption. An active energy prediction system is highly desirable for efficient energy production and utilization. In this paper, we have proposed a methodology to predict short-term energy consumption in a residential building. The proposed methodology consisted of four different layers, namely data acquisition, preprocessing, prediction, and performance evaluation. For experimental analysis, real data collected from 4 multi-storied buildings situated in Seoul, South Korea, has been used. The collected data is provided as input to the data acquisition layer. In the pre-processing layer afterwards, several data cleaning and preprocessing schemes are applied to the input data for the removal of abnormalities. Preprocessing further consisted of two processes, namely the computation of statistical moments (mean, variance, skewness, and kurtosis) and data normalization. In the prediction layer, the feed forward back propagation neural network has been used on normalized data and data with statistical moments. In the performance evaluation layer, the mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE) have been used to measure the performance of the proposed approach. The average values for data with statistical moments of MAE, MAPE, and RMSE are 4.3266, 11.9617, and 5.4625 respectively. These values of the statistical measures for data with statistical moments are less as compared to simple data and normalized data which indicates that the performance of the feed forward back propagation neural network (FFBPNN) on data with statistical moments is better when compared to simple data and normalized data.


2020 ◽  
Vol 15 (5) ◽  
pp. 396-407 ◽  
Author(s):  
Saba Amanat ◽  
Adeel Ashraf ◽  
Waqar Hussain ◽  
Nouman Rasool ◽  
Yaser D. Khan

Background: Carboxylation is one of the most biologically important post-translational modifications and occurs on lysine, arginine, and glutamine residues of a protein. Among all these three, the covalent attachment of the carboxyl group with the lysine side chain is the most frequent and biologically important type of carboxylation. For studying such biological functions, it is essential to correctly determine the lysine sites sensitive to carboxylation. Objective: Herein, we present a computational model for the prediction of the carboxylysine site which is based on machine learning. Methods: Various position and composition relative features have been incorporated into the Pse- AAC for construction of feature vectors and a neural network is employed as a classifier. The model is validated by jackknife, cross-validation, self-consistency, and independent testing. Results: The results of the self-consistency test elaborated that model has 99.76% Acc, 99.76% Sp, 99.76% Sp, and 0.99 MCC..Using the jackknife method, prediction model validation gave 97.07% Acc, while for 10-fold cross-validation, prediction model validation gave 95.16% Acc. Conclusion: The results of independent dataset testing were 94.3% which illustrated that the proposed model has better performance as compared to the existing model PreLysCar; however, the accuracy can be improved further, in the future, due to the increasing number of carboxylysine sites in proteins.


2021 ◽  
Vol 18 ◽  
Author(s):  
Wajdi Alghamdi ◽  
Yaser Daanial Khan ◽  
Ebraheem Alzahrani ◽  
Malik Zaka Ullah

Background: Chaperones are a group of proteins that have functional similarities and support protein folding. These are proteins that can prevent non-specific aggregation by binding to non-natural proteins. These are mainly linked with the folding or assembly, which are important biological procedures of molecular biology. Not only is chaperone an important stress protein for maintaining the survival of other proteins and cells, but its therapeutic applications are dramatically increasing. Objectives: Herein, we report the first and the novel predictor for identification of Chaperone proteins. Methods: The predictor is developed using Chou’s pseudo amino acid composition (PseAAC), statistical moments and various position-based features. Results: The predictor is validated through 10-fold cross-validation and Jackknife testing, which gave 94.04% and 96.62% accurate results. Conclusion: Thus, the proposed predictor can help predict the Chaperone proteins efficiently and accurately and provide baseline data for the discovery of new drugs and biomarkers.


2020 ◽  
Vol 23 (8) ◽  
pp. 797-804
Author(s):  
Waqar Hussain ◽  
Nouman Rasool ◽  
Yaser D. Khan

Background: IKV has been a well-known global threat, which hits almost all of the American countries and posed a serious threat to the entire globe in 2016. The first outbreak of ZIKV was reported in 2007 in the Pacific area, followed by another severe outbreak, which occurred in 2013/2014 and subsequently, ZIKV spread to all other Pacific islands. A broad spectrum of ZIKV associated neurological malformations in neonates and adults has driven this deadly virus into the limelight. Though tremendous efforts have been focused on understanding the molecular basis of ZIKV, the viral proteins of ZIKV have still not been studied extensively. Objectives: Herein, we report the first and the novel predictor for the identification of ZIKV proteins. Methods: We have employed Chou’s pseudo amino acid composition (PseAAC), statistical moments and various position-based features. Results: The predictor is validated through 10-fold cross-validation and Jackknife testing. In 10- fold cross-validation, 94.09% accuracy, 93.48% specificity, 94.20% sensitivity and 0.80 MCC were achieved while in Jackknife testing, 96.62% accuracy, 94.57% specificity, 97.00% sensitivity and 0.88 MCC were achieved. Conclusion: Thus, ZIKVPred-PseAAC can help in predicting the ZIKV proteins efficiently and accurately and can provide baseline data for the discovery of new drugs and biomarkers against ZIKV.


Author(s):  
Bonpagna Kann ◽  
Thodsaporn Chay-intr ◽  
Hour Kaing ◽  
Thanaruk Theeramunkong

Despite the fact that there are a number of researches working on Khmer Language in the field of Natural Language Processing along with some resources regarding words segmentation and POS Tagging, we still lack of high-level resources regarding syntax, Treebanks and grammars, for example. This paper illustrates the semi-automatic framework of constructing Khmer Treebank and the extraction of the Khmer grammar rules from a set of sentences taken from the Khmer grammar books. Initially, these sentences will be manually annotated and processed to generate a number of grammar rules with their probabilities once the Treebank is obtained. In our experiments, the annotated trees and the extracted grammar rules are analyzed in both quantitative and qualitative way. Finally, the results will be evaluated in three evaluation processes including Self-Consistency, 5-Fold Cross-Validation, Leave-One-Out Cross-Validation along with the three validation methods such as Precision, Recall, F1-Measure. According to the result of the three validations, Self-Consistency has shown the best result with more than 92%, followed by the Leave-One-Out Cross-Validation and 5-Fold Cross Validation with the average of 88% and 75% respectively. On the other hand, the crossing bracket data shows that Leave-One-Out Cross Validation holds the highest average with 96% while the other two are 85% and 89%, respectively.


2015 ◽  
Vol 738-739 ◽  
pp. 578-581 ◽  
Author(s):  
Xiong Yang ◽  
Xin Yu Jin ◽  
Jian Feng Shen

Computer-aided diagnosis of Premature Ventricular Contraction (PVC) plays an important role in timely detection and treatment of arrhythmias. Conventional identification methods based on back propagation neural network (BPNN) get problems of overlong training time and local optimum. This paper proposes an application of improved BPNN on PVC identification and the improvements of BPNN are based on self-adaptive learning rate and momentum in training. Denoising and feature extraction of ECG signal obtained from MIT-BIH arrhythmia database are processed first. A comparison between standard BPNN and improved BPNN shows that the latter gets less training time and better accuracy.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Muhammad Adeel Ashraf ◽  
Yaser Daanial Khan ◽  
Bilal Shoaib ◽  
Muhammad Adnan Khan ◽  
Faheem Khan ◽  
...  

Beta-lactamase (β-lactamase) produced by different bacteria confers resistance against β-lactam-containing drugs. The gene encoding β-lactamase is plasmid-borne and can easily be transferred from one bacterium to another during conjugation. By such transformations, the recipient also acquires resistance against the drugs of the β-lactam family. β-Lactam antibiotics play a vital significance in clinical treatment of disastrous diseases like soft tissue infections, gonorrhoea, skin infections, urinary tract infections, and bronchitis. Herein, we report a prediction classifier named as βLact-Pred for the identification of β-lactamase proteins. The computational model uses the primary amino acid sequence structure as its input. Various metrics are derived from the primary structure to form a feature vector. Experimentally determined data of positive and negative beta-lactamases are collected and transformed into feature vectors. An operating algorithm based on the artificial neural network is used by integrating the position relative features and sequence statistical moments in PseAAC for training the neural networks. The results for the proposed computational model were validated by employing numerous types of approach, i.e., self-consistency testing, jackknife testing, cross-validation, and independent testing. The overall accuracy of the predictor for self-consistency, jackknife testing, cross-validation, and independent testing presents 99.76%, 96.07%, 94.20%, and 91.65%, respectively, for the proposed model. Stupendous experimental results demonstrated that the proposed predictor “βLact-Pred” has surpassed results from the existing methods.


2019 ◽  
Vol 7 (1) ◽  
pp. 200-222
Author(s):  
Azzad Bader Saeed ◽  
Sabah Abdul-Hassan Gitaffa

In this paper,  a simulation of  artificial intelligent system has been designed for processing  the incoming data of  sensor  units and then presenting proper decision. The Back-propagation Neural Network BPNN has been used as the proposed  intelligent system for this work, whereas the BPNN is considered as a trained network in conjunction with an optimization method for changing the weights and biases of the overall network. The main two features of the  BPNN are: high speed processing, and producing  lowest Mean-Square-Error MSE ( cost function ) in few iterations. The proposed BPNN has used the linear activation functions 'Satlins' and 'Satline' for the hidden and output layer respectively, and has used the training function 'Traingda' ( which is gradient descent with adaptive learning rate)  as a powerful learning method. It is worth to mention, that no previous research used these three functions together for such analysis. The MATLAB software package has been used for  designing and testing the proposed system. An optimal result has been obtained in this work, where the value of  Mean-Square-Error has reached to zero   in 87 epochs, and the real and desired outputs have been fitted. In fact, there is  no previous work has reached to this optimal result.  The proposed BPNN has been implemented in FPGA, which is fast, and low power tool.


Sign in / Sign up

Export Citation Format

Share Document