Prediction of Citrullination Sites on the Basis of mRMR Method and SNN

Background: Citrullination, an important post-translational modification of proteins, alters the molecular weight and electrostatic charge of the protein side chains. Citrulline, in protein sequences, is catalyzed by a class of Peptidyl Arginine Deiminases (PADs). Dependent on Ca2+, PADs include five isozymes: PAD 1, 2, 3, 4/5, and 6. Citrullinated proteins have been identified in many biological and pathological processes. Among them, abnormal protein citrullination modification can lead to serious human diseases, including multiple sclerosis and rheumatoid arthritis. Objective: It is important to identify the citrullination sites in protein sequences. The accurate identification of citrullination sites may contribute to the studies on the molecular functions and pathological mechanisms of related diseases. Methods and Results: In this study, after an encoded training set (containing 116 positive and 348 negative samples) into the feature matrix, the mRMR method was used to analyze the 941- dimensional features which were sorted on the basis of their importance. Then, a predictive model based on a self-normalizing neural network (SNN) was proposed to predict the citrullination sites in protein sequences. Incremental Feature Selection (IFS) and 10-fold cross-validation were used as the model evaluation method. Three classical machine learning models, namely random forest, support vector machine, and k-nearest neighbor algorithm, were selected and compared with the SNN prediction model using the same evaluation methods. SNN may be the best tool for citrullination site prediction. The maximum value of the Matthews Correlation Coefficient (MCC) reached 0.672404 on the basis of the optimal classifier of SNN. Conclusion: The results showed that the SNN-based prediction methods performed better when evaluated by some common metrics, such as MCC, accuracy, and F1-Measure. SNN prediction model also achieved a better balance in the classification and recognition of positive and negative samples from datasets compared with the other three models.

Download Full-text

A feature weighted support vector machine and K-nearest neighbor algorithm for stock market indices prediction

Expert Systems with Applications ◽

10.1016/j.eswa.2017.02.044 ◽

2017 ◽

Vol 80 ◽

pp. 340-355 ◽

Cited By ~ 74

Author(s):

Yingjun Chen ◽

Yongtao Hao

Keyword(s):

Support Vector Machine ◽

Stock Market ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Download Full-text

Comparison of accuracy level K-Nearest Neighbor algorithm and support vector machine algorithm in classification water quality status

2016 International Conference on Frontiers of Information Technology (FIT) ◽

10.1109/fit.2016.7857553 ◽

2016 ◽

Cited By ~ 1

Author(s):

Amri Danades ◽

Devie Pratama ◽

Dian Anggraini ◽

Diny Anggriani

Keyword(s):

Water Quality ◽

Support Vector Machine ◽

Nearest Neighbor ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

K Nearest Neighbor ◽

Accuracy Level ◽

Water Quality Status ◽

K Nearest Neighbor Algorithm ◽

Quality Status

Download Full-text

Heart disease prediction model with k-nearest neighbor algorithm

International Journal of Informatics and Communication Technology (IJ-ICT) ◽

10.11591/ijict.v10i3.pp225-230 ◽

2021 ◽

Vol 10 (3) ◽

pp. 225

Author(s):

Tssehay Admassu Assegie

Keyword(s):

Heart Disease ◽

Prediction Model ◽

Nearest Neighbor ◽

Predictive Performance ◽

Data Repository ◽

Disease Prediction ◽

K Nearest Neighbor ◽

Proposed Model ◽

K Nearest Neighbor Algorithm ◽

Learning Data

<span>In this study, the author proposed k-nearest neighbor (KNN) based heart disease prediction model. The author conducted an experiment to evaluate the performance of the proposed model. Moreover, the result of the experimental evaluation of the predictive performance of the proposed model is analyzed. To conduct the study, the author obtained heart disease data from Kaggle machine learning data repository. The dataset consists of 1025 observations of which 499 or 48.68% is heart disease negative and 526 or 51.32% is heart disease positive. Finally, the performance of KNN algorithm is analyzed on the test set. The result of performance analysis on the experimental results on the Kaggle heart disease data repository shows that the accuracy of the KNN is 91.99%</span>

Download Full-text

Prediction of Flash Flood using Rainfall by MLP Classifier

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9880.059120 ◽

2020 ◽

Vol 9 (1) ◽

pp. 425-429

Keyword(s):

Prediction Model ◽

Nearest Neighbor ◽

Weather Forecasting ◽

Flash Flood ◽

Human Life ◽

Cost Effective ◽

Rainfall Data ◽

Support Vector ◽

K Nearest Neighbor ◽

Flood Prediction

Flood are one of the unfavorable natural disasters. A flood can result in a huge loss of human lives and properties. It can also affect agricultural lands and destroy cultivated crops and trees. The flood can occur as a result of surface-runoff formed from melting snow, long-drawn-out rains, and derisory drainage of rainwater or collapse of dams. Today people have destroyed the rivers and lakes and have turned the natural water storage pools to buildings and construction lands. Flash floods can develop quickly within a few hours when compared with a regular flood. Research in prediction of flood has improved to reduce the loss of human life, property damages, and various problems related to the flood. Machine learning methods are widely used in building an efficient prediction model for weather forecasting. This advancement of the prediction system provides cost-effective solutions and better performance. In this paper, a prediction model is constructed using rainfall data to predict the occurrence of floods due to rainfall. The model predicts whether “flood may happen or not” based on the rainfall range for particular locations. Indian district rainfall data is used to build the prediction model. The dataset is trained with various algorithms like Linear Regression, K- Nearest Neighbor, Support Vector Machine, and Multilayer Perceptron. Among this, MLP algorithm performed efficiently with the highest accuracy of 97.40%. The MLP flash flood prediction model can be useful for the climate scientist to predict the flood during a heavy downpour with the highest accuracy.

Download Full-text

KOMPARASI ALGORITMA KLASIFIKASI PADA ANALISIS REVIEW HOTEL

Jurnal Pilar Nusa Mandiri ◽

10.33480/pilar.v14i2.1023 ◽

2018 ◽

Vol 14 (2) ◽

pp. 261

Author(s):

Lila Dini Utami

Keyword(s):

Support Vector Machine ◽

Nearest Neighbor ◽

Naive Bayes ◽

Service Providers ◽

Naïve Bayes ◽

Support Vector ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm ◽

Auc Value

At this time the freedom to express opinions in oral and written forms about everything is very easy. This activity can be used to make decisions by some business people. Especially by service providers, such as hotels. This will be very useful in the development of the hotel business itself. But the review data must be processed using the right algorithm. So this study was conducted to find out which algorithms are more feasible to use to get the highest accuracy. The methods used are Naïve Bayes (NB), Support Vector Machine (SVM), and k-Nearest Neighbor (k-NN). From the process that has been done, the results of Naïve Bayes accuracy are 71.50% with the AUC value is 0.500, Support Vector Machine is 72.50% with the AUC value is 0.936 and the accuracy results if using the k-Nearest Neighbor algorithm is 75.00% with the AUC value is 0.500. The use of the k-Nearest Neighbor algorithm can help in making more appropriate decisions for hotel reviews at this time.

Download Full-text

INDUCTION OF CLASSIFIERS THROUGH NON-PARAMETRIC METHODS FOR APPROXIMATE CLASSIFICATION AND RETRIEVAL WITH ONTOLOGIES

International Journal of Semantic Computing ◽

10.1142/s1793351x0800049x ◽

2008 ◽

Vol 02 (03) ◽

pp. 403-423 ◽

Cited By ~ 8

Author(s):

NICOLA FANIZZI ◽

CLAUDIA D'AMATO ◽

FLORIANA ESPOSITO

Keyword(s):

Semantic Web ◽

Statistical Learning ◽

Kernel Function ◽

Nearest Neighbor ◽

Knowledge Bases ◽

Support Vector ◽

Parametric Methods ◽

K Nearest Neighbor ◽

K Nearest Neighbor Algorithm ◽

Non Parametric

This work concerns non-parametric approaches for statistical learning applied to the standard knowledge representation languages adopted in the Semantic Web context. We present methods based on epistemic inference that are able to elicit and exploit the semantic similarity of individuals in OWL knowledge bases. Specifically, a totally semantic and language-independent semi-distance function is introduced, whence also an epistemic kernel function for Semantic Web representations is derived. Both the measure and the kernel function are embedded in non-parametric statistical learning algorithms customized for coping with Semantic Web representations. Particularly, the measure is embedded in a k-Nearest Neighbor algorithm and the kernel function is embedded in a Support Vector Machine. The implemented algorithms are used to perform inductive concept retrieval and query answering. An experimentation on real ontologies proves that the methods can be effectively employed for performing the target tasks, and moreover that it is possible to induce new assertions that are not logically derivable.

Download Full-text

Classification of EMG signals by k-Nearest Neighbor algorithm and Support vector machine methods

2013 21st Signal Processing and Communications Applications Conference (SIU) ◽

10.1109/siu.2013.6531240 ◽

2013 ◽

Cited By ~ 2

Author(s):

H. Kucuk ◽

C. Tepe ◽

I. Eminoglu

Keyword(s):

Support Vector Machine ◽

Nearest Neighbor ◽

Support Vector ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Download Full-text

Hybrid of Support Vector Machine Algorithm and K-Nearest Neighbor Algorithm to Optimize the Diagnosis of Eye Disease

2020 3rd International Conference on Mechanical, Electronics, Computer, and Industrial Technology (MECnIT) ◽

10.1109/mecnit48290.2020.9166599 ◽

2020 ◽

Author(s):

Sumita Wardani ◽

Sawaluddin ◽

Poltak Sihombing

Keyword(s):

Support Vector Machine ◽

Nearest Neighbor ◽

Eye Disease ◽

Support Vector ◽

Support Vector Machine Algorithm ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Download Full-text

Semisupervised Particle Swarm Optimization for Classification

Mathematical Problems in Engineering ◽

10.1155/2014/832135 ◽

2014 ◽

Vol 2014 ◽

pp. 1-11 ◽

Cited By ~ 2

Author(s):

Xiangrong Zhang ◽

Licheng Jiao ◽

Anand Paul ◽

Yongfu Yuan ◽

Zhengli Wei ◽

...

Keyword(s):

Particle Swarm Optimization ◽

Nearest Neighbor ◽

Particle Swarm ◽

Support Vector ◽

K Nearest Neighbor ◽

Swarm Optimization ◽

Structure Information ◽

Semisupervised Classification ◽

K Nearest Neighbor Algorithm ◽

Nearest Neighborhood

A semisupervised classification method based on particle swarm optimization (PSO) is proposed. The semisupervised PSO simultaneously uses limited labeled samples and large amounts of unlabeled samples to find a collection of prototypes (or centroids) that are considered to precisely represent the patterns of the whole data, and then, in principle of the “nearest neighborhood,” the unlabeled data can be classified with the obtained prototypes. In order to validate the performance of the proposed method, we compare the classification accuracy of PSO classifier, k-nearest neighbor algorithm, and support vector machine on six UCI datasets, four typical artificial datasets, and the USPS handwritten dataset. Experimental results demonstrate that the proposed method has good performance even with very limited labeled samples due to the usage of both discriminant information provided by labeled samples and the structure information provided by unlabeled samples.

Download Full-text

FWHT-RF: A Novel Computational Approach to Predict Plant Protein-Protein Interactions via an Ensemble Learning Method

Scientific Programming ◽

10.1155/2021/1607946 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Jie Pan ◽

Li-Ping Li ◽

Chang-Qing Yu ◽

Zhu-Hong You ◽

Zhong-Hao Ren ◽

...

Keyword(s):

Protein Interactions ◽

Nearest Neighbor ◽

Protein Sequences ◽

Evolutionary Information ◽

Support Vector ◽

Protein Protein Interactions ◽

K Nearest Neighbor ◽

Novel Approach ◽

Knn Classifier ◽

Scoring Matrix

Protein-protein interactions (PPIs) in plants are crucial for understanding biological processes. Although high-throughput techniques produced valuable information to identify PPIs in plants, they are usually expensive, inefficient, and extremely time-consuming. Hence, there is an urgent need to develop novel computational methods to predict PPIs in plants. In this article, we proposed a novel approach to predict PPIs in plants only using the information of protein sequences. Specifically, plants’ protein sequences are first converted as position-specific scoring matrix (PSSM); then, the fast Walsh–Hadamard transform (FWHT) algorithm is used to extract feature vectors from PSSM to obtain evolutionary information of plant proteins. Lastly, the rotation forest (RF) classifier is trained for prediction and produced a series of evaluation results. In this work, we named this approach FWHT-RF because FWHT and RF are used for feature extraction and classification, respectively. When applying FWHT-RF on three plants’ PPI datasets Maize, Rice, and Arabidopsis thaliana (Arabidopsis), the average accuracies of FWHT-RF using 5-fold cross validation were achieved as high as 95.20%, 94.42%, and 83.85%, respectively. To further evaluate the predictive power of FWHT-RF, we compared it with the state-of-art support vector machine (SVM) and K-nearest neighbor (KNN) classifier in different aspects. The experimental results demonstrated that FWHT-RF can be a useful supplementary method to predict potential PPIs in plants.

Download Full-text