Using Stratified Sample and Grid Search to Improve Disease Prediction Accuracy of SVM

2013 ◽  
Vol 295-298 ◽  
pp. 644-647 ◽  
Author(s):  
Yu Kai Yao ◽  
Hong Mei Cui ◽  
Ming Wei Len ◽  
Xiao Yun Chen

SVM (Support Vector Machine) is a powerful data mining algorithm, and is mainly used to finish classification or regression tasks. In this literature, SVM is used to conduct disease prediction. We focus on integrating with stratified sample and grid search technology to improve the classification accuracy of SVM, thus, we propose an improved algorithm named SGSVM: Stratified sample and Grid search based SVM. To testify the performance of SGSVM, heart-disease data from UCI are used in our experiment, and the results show SGSVM has obvious improvement in classification accuracy, and this is very valuable especially in disease prediction.

2021 ◽  
Vol 28 (5) ◽  
pp. 118-129
Author(s):  
Alabi Waheed Banjoko ◽  
◽  
Kawthar Opeyemi Abdulazeez ◽  

Background: The computerised classification and prediction of heart disease can be useful for medical personnel for the purpose of fast diagnosis with accurate results. This study presents an efficient classification method for predicting heart disease using a data-mining algorithm. Methods: The algorithm utilises the weighted support vector machine method for efficient classification of heart disease based on a binary response that indicates the presence or absence of heart disease as the result of an angiographic test. The optimal values of the support vector machine and the Radial Basis Function kernel parameters for the heart disease classification were determined via a 10-fold cross-validation method. The heart disease data was partitioned into training and testing sets using different percentages of the splitting ratio. Each of the training sets was used in training the classification method while the predictive power of the method was evaluated on each of the test sets using the Monte-Carlo cross-validation resampling technique. The effect of different percentages of the splitting ratio on the method was also observed. Results: The misclassification error rate was used to compare the performance of the method with three selected machine learning methods and was observed that the proposed method performs best over others in all cases considered. Conclusion: Finally, the results illustrate that the classification algorithm presented can effectively predict the heart disease status of an individual based on the results of an angiographic test.


2016 ◽  
pp. 738-761
Author(s):  
Ahmad Al-Khasawneh

Many researchers in the health information system field have been attracted to develop computer applications that help in the diagnosis process. Imperatively, data mining algorithms address the vital role in all of these applications. Many contributions were made in this area. There has always been a debate on the algorithm that gives the best classifier, the parameters to be used, the dataset pre-processing steps, etc. In this paper, the author largely emphasizes that the best way to build a predictive model with relatively high classification accuracy is to build several predictive models and to choose the model that gives the best results through parameters optimization. Diagnosing diabetes mellitus has gained considerable attention in the last few decades due to the increased severity of the disease. In this research, the author reviews four predictive data mining approaches that are being used in diagnosing diabetes. Four models were implemented to diagnose diabetes from PIMA dataset; k-nearest neighbour, support vector machine, multilayer perceptron neural network, and naive bayesian network. Giving the highest classification accuracy, support vector machine technique outperformed the others with a value of 78.83%.


2020 ◽  
Vol 8 (5) ◽  
pp. 3164-3167

Data mining is the withdrawal of concealed prescient information also obscure data, examples, connections and learning by investigating the enormous informational collections which are hard to discover and distinguish with customary measurable techniques. The major issues in text categorization are classification accuracy and computation time. To overcome these issues, an efficient classification method is needed for high differentiation exactness as fine as minimizing the computation period. In this work, we propose the classification of data using support vector machine for text categorization along with principle component analysis. Bolster Vector Machines is a managed learning system with numerous attractive characteristics that make it a prevalent calculation. Principle Component Analysis (PCA) is the feature removal technique is used towards mine the features with in the text. Chi-Square is a further assortment technique it is used to selecting the features from removed features. Finally by this proposed work, the classification accuracy also computation period is improved than other existing algorithms in many applications


2016 ◽  
pp. 426-449
Author(s):  
Ahmad Al-Khasawneh

Many researchers in the health information system field have been attracted to develop computer applications that help in the diagnosis process. Imperatively, data mining algorithms address the vital role in all of these applications. Many contributions were made in this area. There has always been a debate on the algorithm that gives the best classifier, the parameters to be used, the dataset pre-processing steps, etc. In this paper, the author largely emphasizes that the best way to build a predictive model with relatively high classification accuracy is to build several predictive models and to choose the model that gives the best results through parameters optimization. Diagnosing diabetes mellitus has gained considerable attention in the last few decades due to the increased severity of the disease. In this research, the author reviews four predictive data mining approaches that are being used in diagnosing diabetes. Four models were implemented to diagnose diabetes from PIMA dataset; k-nearest neighbour, support vector machine, multilayer perceptron neural network, and naive bayesian network. Giving the highest classification accuracy, support vector machine technique outperformed the others with a value of 78.83%.


Author(s):  
Ahmad Al-Khasawneh

Many researchers in the health information system field have been attracted to develop computer applications that help in the diagnosis process. Imperatively, data mining algorithms address the vital role in all of these applications. Many contributions were made in this area. There has always been a debate on the algorithm that gives the best classifier, the parameters to be used, the dataset pre-processing steps, etc. In this paper, the author largely emphasizes that the best way to build a predictive model with relatively high classification accuracy is to build several predictive models and to choose the model that gives the best results through parameters optimization. Diagnosing diabetes mellitus has gained considerable attention in the last few decades due to the increased severity of the disease. In this research, the author reviews four predictive data mining approaches that are being used in diagnosing diabetes. Four models were implemented to diagnose diabetes from PIMA dataset; k-nearest neighbour, support vector machine, multilayer perceptron neural network, and naive bayesian network. Giving the highest classification accuracy, support vector machine technique outperformed the others with a value of 78.83%.


2020 ◽  
pp. 127-150
Author(s):  
Ahmad Al-Khasawneh

Many researchers in the health information system field have been attracted to develop computer applications that help in the diagnosis process. Imperatively, data mining algorithms address the vital role in all of these applications. Many contributions were made in this area. There has always been a debate on the algorithm that gives the best classifier, the parameters to be used, the dataset pre-processing steps, etc. In this paper, the author largely emphasizes that the best way to build a predictive model with relatively high classification accuracy is to build several predictive models and to choose the model that gives the best results through parameters optimization. Diagnosing diabetes mellitus has gained considerable attention in the last few decades due to the increased severity of the disease. In this research, the author reviews four predictive data mining approaches that are being used in diagnosing diabetes. Four models were implemented to diagnose diabetes from PIMA dataset; k-nearest neighbour, support vector machine, multilayer perceptron neural network, and naive bayesian network. Giving the highest classification accuracy, support vector machine technique outperformed the others with a value of 78.83%.


There are many lives lost every year due to cancer and among them; among the women breast cancer causes the most deaths. For the better prediction of breast cancer risks, numerous studies have been undertaken incorporating data mining techniques. 1.1 million Cases of breast cancer were reported in 2004. It has been seen over the years that, that the numbers increase with the increasing industrialization and urbanization. It was earlier observed that mostly affected countries with breast cancer were high income countries such as America but now a days it is also very serious issue in middle and low income countries like Africa, Latin America and Asia. The main objective of this paper is to create a model which can more efficiently and accurately categorize a cancer as malignant or benevolent based on interpretation of the numerical values of attributes of ultrasound images of breast cancer. In this paper various data mining algorithm used like SVM(Support Vector Machine) for prediction and compared it with various other algorithms such as CART, Logistic Regression, KNN for the best training and test accuracy. SVM algorithm gives the most accurate results among the rest algorithm.


Author(s):  
Ariesta Lestari ◽  
Elga Mariati ◽  
Widiatry Widiatry

Student in one of the stakeholder in a university. Therefore, student’s perception in the quality of learning facilities and infrastructures become important to ensure the university’s performance.  The Faculty of Engineering of University of Palangka Raya has not comprehensively evaluated the students’ satisfactory of the learning’s facilities. In this research, methods from data mining approach was implemented to classify whether the students satisfy or not with the quality of the learning’s facility in Engineering Faculty.  This research compared three data mining algorithm, Decision Tree C4.5, Support Vector Machine, and Naïve Bayes to obtain the best algorithm for the prediction system. 948 responses were collected, 61% of the respondent were satisfied with the quality of the learning facilities and infrastructures, while 39% of the respondents were dissatisfied. The Decision Tree c4.5 had the best performance with accuracy of 88%  and precision of 98% compared to the Naïve Bayes and support vector machine.


2019 ◽  
Vol 15 (2) ◽  
pp. 275-280
Author(s):  
Agus Setiyono ◽  
Hilman F Pardede

It is now common for a cellphone to receive spam messages. Great number of received messages making it difficult for human to classify those messages to Spam or no Spam.  One way to overcome this problem is to use Data Mining for automatic classifications. In this paper, we investigate various data mining techniques, named Support Vector Machine, Multinomial Naïve Bayes and Decision Tree for automatic spam detection. Our experimental results show that Support Vector Machine algorithm is the best algorithm over three evaluated algorithms. Support Vector Machine achieves 98.33%, while Multinomial Naïve Bayes achieves 98.13% and Decision Tree is at 97.10 % accuracy.


Sign in / Sign up

Export Citation Format

Share Document