Prediction of Citrullination Sites on the Basis of mRMR Method and SNN

2020 ◽  
Vol 22 (10) ◽  
pp. 705-715 ◽  
Author(s):  
Min Liu ◽  
Guangzhong Liu

Background: Citrullination, an important post-translational modification of proteins, alters the molecular weight and electrostatic charge of the protein side chains. Citrulline, in protein sequences, is catalyzed by a class of Peptidyl Arginine Deiminases (PADs). Dependent on Ca2+, PADs include five isozymes: PAD 1, 2, 3, 4/5, and 6. Citrullinated proteins have been identified in many biological and pathological processes. Among them, abnormal protein citrullination modification can lead to serious human diseases, including multiple sclerosis and rheumatoid arthritis. Objective: It is important to identify the citrullination sites in protein sequences. The accurate identification of citrullination sites may contribute to the studies on the molecular functions and pathological mechanisms of related diseases. Methods and Results: In this study, after an encoded training set (containing 116 positive and 348 negative samples) into the feature matrix, the mRMR method was used to analyze the 941- dimensional features which were sorted on the basis of their importance. Then, a predictive model based on a self-normalizing neural network (SNN) was proposed to predict the citrullination sites in protein sequences. Incremental Feature Selection (IFS) and 10-fold cross-validation were used as the model evaluation method. Three classical machine learning models, namely random forest, support vector machine, and k-nearest neighbor algorithm, were selected and compared with the SNN prediction model using the same evaluation methods. SNN may be the best tool for citrullination site prediction. The maximum value of the Matthews Correlation Coefficient (MCC) reached 0.672404 on the basis of the optimal classifier of SNN. Conclusion: The results showed that the SNN-based prediction methods performed better when evaluated by some common metrics, such as MCC, accuracy, and F1-Measure. SNN prediction model also achieved a better balance in the classification and recognition of positive and negative samples from datasets compared with the other three models.

Author(s):  
Tssehay Admassu Assegie

<span>In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model. The author conducted an experiment to evaluate the performance of the proposed model. Moreover, the result of the experimental evaluation of the predictive performance of the proposed model is analyzed. To conduct the study, the author obtained heart disease data from Kaggle machine learning data repository. The dataset consists of 1025 observations of which 499 or 48.68% is heart disease negative and 526 or 51.32% is heart disease positive. Finally, the performance of KNN algorithm is analyzed on the test set. The result of performance analysis on the experimental results on the Kaggle heart disease data repository shows that the accuracy of the KNN is 91.99%</span>


Flood are one of the unfavorable natural disasters. A flood can result in a huge loss of human lives and properties. It can also affect agricultural lands and destroy cultivated crops and trees. The flood can occur as a result of surface-runoff formed from melting snow, long-drawn-out rains, and derisory drainage of rainwater or collapse of dams. Today people have destroyed the rivers and lakes and have turned the natural water storage pools to buildings and construction lands. Flash floods can develop quickly within a few hours when compared with a regular flood. Research in prediction of flood has improved to reduce the loss of human life, property damages, and various problems related to the flood. Machine learning methods are widely used in building an efficient prediction model for weather forecasting. This advancement of the prediction system provides cost-effective solutions and better performance. In this paper, a prediction model is constructed using rainfall data to predict the occurrence of floods due to rainfall. The model predicts whether “flood may happen or not” based on the rainfall range for particular locations. Indian district rainfall data is used to build the prediction model. The dataset is trained with various algorithms like Linear Regression, K- Nearest Neighbor, Support Vector Machine, and Multilayer Perceptron. Among this, MLP algorithm performed efficiently with the highest accuracy of 97.40%. The MLP flash flood prediction model can be useful for the climate scientist to predict the flood during a heavy downpour with the highest accuracy.


2018 ◽  
Vol 14 (2) ◽  
pp. 261
Author(s):  
Lila Dini Utami

At this time the freedom to express opinions in oral and written forms about everything is very easy. This activity can be used to make decisions by some business people. Especially by service providers, such as hotels. This will be very useful in the development of the hotel business itself. But the review data must be processed using the right algorithm. So this study was conducted to find out which algorithms are more feasible to use to get the highest accuracy. The methods used are Naïve Bayes (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN). From the process that has been done, the results of Naïve Bayes accuracy are 71.50% with the AUC value is 0.500, Support Vector Machine is 72.50% with the AUC value is 0.936 and the accuracy results if using the k-Nearest Neighbor algorithm is 75.00% with the AUC value is 0.500. The use of the k-Nearest Neighbor algorithm can help in making more appropriate decisions for hotel reviews at this time.


2008 ◽  
Vol 02 (03) ◽  
pp. 403-423 ◽  
Author(s):  
NICOLA FANIZZI ◽  
CLAUDIA D'AMATO ◽  
FLORIANA ESPOSITO

This work concerns non-parametric approaches for statistical learning applied to the standard knowledge representation languages adopted in the Semantic Web context. We present methods based on epistemic inference that are able to elicit and exploit the semantic similarity of individuals in OWL knowledge bases. Specifically, a totally semantic and language-independent semi-distance function is introduced, whence also an epistemic kernel function for Semantic Web representations is derived. Both the measure and the kernel function are embedded in non-parametric statistical learning algorithms customized for coping with Semantic Web representations. Particularly, the measure is embedded in a k-Nearest Neighbor algorithm and the kernel function is embedded in a Support Vector Machine. The implemented algorithms are used to perform inductive concept retrieval and query answering. An experimentation on real ontologies proves that the methods can be effectively employed for performing the target tasks, and moreover that it is possible to induce new assertions that are not logically derivable.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Xiangrong Zhang ◽  
Licheng Jiao ◽  
Anand Paul ◽  
Yongfu Yuan ◽  
Zhengli Wei ◽  
...  

A semisupervised classification method based on particle swarm optimization (PSO) is proposed. The semisupervised PSO simultaneously uses limited labeled samples and large amounts of unlabeled samples to find a collection of prototypes (or centroids) that are considered to precisely represent the patterns of the whole data, and then, in principle of the “nearest neighborhood,” the unlabeled data can be classified with the obtained prototypes. In order to validate the performance of the proposed method, we compare the classification accuracy of PSO classifier, k-nearest neighbor algorithm, and support vector machine on six UCI datasets, four typical artificial datasets, and the USPS handwritten dataset. Experimental results demonstrate that the proposed method has good performance even with very limited labeled samples due to the usage of both discriminant information provided by labeled samples and the structure information provided by unlabeled samples.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Jie Pan ◽  
Li-Ping Li ◽  
Chang-Qing Yu ◽  
Zhu-Hong You ◽  
Zhong-Hao Ren ◽  
...  

Protein-protein interactions (PPIs) in plants are crucial for understanding biological processes. Although high-throughput techniques produced valuable information to identify PPIs in plants, they are usually expensive, inefficient, and extremely time-consuming. Hence, there is an urgent need to develop novel computational methods to predict PPIs in plants. In this article, we proposed a novel approach to predict PPIs in plants only using the information of protein sequences. Specifically, plants’ protein sequences are first converted as position-specific scoring matrix (PSSM); then, the fast Walsh–Hadamard transform (FWHT) algorithm is used to extract feature vectors from PSSM to obtain evolutionary information of plant proteins. Lastly, the rotation forest (RF) classifier is trained for prediction and produced a series of evaluation results. In this work, we named this approach FWHT-RF because FWHT and RF are used for feature extraction and classification, respectively. When applying FWHT-RF on three plants’ PPI datasets Maize, Rice, and Arabidopsis thaliana (Arabidopsis), the average accuracies of FWHT-RF using 5-fold cross validation were achieved as high as 95.20%, 94.42%, and 83.85%, respectively. To further evaluate the predictive power of FWHT-RF, we compared it with the state-of-art support vector machine (SVM) and K-nearest neighbor (KNN) classifier in different aspects. The experimental results demonstrated that FWHT-RF can be a useful supplementary method to predict potential PPIs in plants.


Sign in / Sign up

Export Citation Format

Share Document