Classification of Diabetes by Kernel based SVM with PSO

Author(s):  
Dilip Kumar Choubey ◽  
Sudhakar Tripathi ◽  
Prabhat Kumar ◽  
Vaibhav Shukla ◽  
Vinay Kumar Dhandhania

Background: Classification method is needed to deduce the possible errors and assist the doctor’s. These methods are used in every many of our lives to take suitable decisions. It is well known that classification is an efficient, effective and broadly utilized strategy in several applications such as medical disease diagnosis, etc. The prime objective of this research paper is to achieve an efficient and effective classification method for Diabetes. Discussion: The proposed methodology comprises of two phases: The first phase deals with description of Pima Indian Diabetes Dataset and Localized Diabetes Dataset whereas in the second phase dataset has been processed through two different approaches. First approach entails classification through Polynomial Kernel, RBF Kernel, Sigmoid Function Kernel and Linear Kernel SVM on Pima Indian Diabetes Dataset and Localized Diabetes Dataset. In the second approach, PSO have been utilized as a feature reduction method followed by using the same set of classification methods used in the first approach. PSO_Linear Kernel SVM provides the highest accuracy and ROC for both the above mentioned dataset. Conclusion: In this research paper, comparative analysis of outcomes w.r.t. performance assessment has been done using both with PSO and without PSO for the same set of classification methods. Finally, it has been concluded that PSO is selecting the relevant features, reducing the expense and computation time while improving the ROC and accuracy. The used methodology may similarly be implemented in other medical diseases.

Author(s):  
Dilip Kumar Choubey ◽  
Sanchita Paul

The modern society is prone to many life-threatening diseases which if diagnosis early can be easily controlled. The implementation of a disease diagnostic system has gained popularity over the years. The main aim of this research is to provide a better diagnosis of diabetes. There are already several existing methods, which have been implemented for the diagnosis of diabetes. In this manuscript, firstly, Polynomial Kernel, RBF Kernel, Sigmoid Function Kernel, Linear Kernel SVM used for the classification of PIDD. Secondly GA used as an Attribute selection method and then used Polynomial Kernel, RBF Kernel, Sigmoid Function Kernel, Linear Kernel SVM on that selected attributes of PIDD for classification. So, here compared the results with and without GA in PIDD, and Linear Kernel proved better among all of the noted above classification methods. It directly seems in the paper that GA is removing insignificant features, reducing the cost and computation time and improving the accuracy, ROC of classification. The proposed method can be also used for other kinds of medical diseases.


2020 ◽  
Vol 14 (3) ◽  
pp. 337-347
Author(s):  
Mohammad Mahdi Ershadi ◽  
Abbas Seifi

There are many useful data mining methods for diagnosis of diseases and cancers. However, early diagnosis of a disease or cancer could significantly affect the chance of patient survival in some cases. The objective of this study is to develop a method for helping accurate diagnosis of different diseases based on various classification methods. Knowledge collection from domain experts is challenging, inaccessible and time-consuming; so we design a multi-classifier using a dynamic classifier and clustering selection approach to takes advantages of these methods based on data. We combine Forward-backward and Principal Component Analysis for feature reduction. The multi-classifier evaluates three clustering methods and ascertains the best classification methods in each cluster based on some training data. In this study, we use ten datasets taken from Machine Learning Repository datasets of the University of California at Irvine (UCI). The proposed multi-classifier improves both computation time and accuracy as compared with all other classification methods. It achieves maximum accuracy with minimum standard deviation over the sampled datasets.


2020 ◽  
Vol 13 (1) ◽  
pp. 103-126 ◽  
Author(s):  
Mohammad Mahdi Ershadi ◽  
Abbas Seifi

PurposeThis study aims to differential diagnosis of some diseases using classification methods to support effective medical treatment. For this purpose, different classification methods based on data, experts’ knowledge and both are considered in some cases. Besides, feature reduction and some clustering methods are used to improve their performance.Design/methodology/approachFirst, the performances of classification methods are evaluated for differential diagnosis of different diseases. Then, experts' knowledge is utilized to modify the Bayesian networks' structures. Analyses of the results show that using experts' knowledge is more effective than other algorithms for increasing the accuracy of Bayesian network classification. A total of ten different diseases are used for testing, taken from the Machine Learning Repository datasets of the University of California at Irvine (UCI).FindingsThe proposed method improves both the computation time and accuracy of the classification methods used in this paper. Bayesian networks based on experts' knowledge achieve a maximum average accuracy of 87 percent, with a minimum standard deviation average of 0.04 over the sample datasets among all classification methods.Practical implicationsThe proposed methodology can be applied to perform disease differential diagnosis analysis.Originality/valueThis study presents the usefulness of experts' knowledge in the diagnosis while proposing an adopted improvement method for classifications. Besides, the Bayesian network based on experts' knowledge is useful for different diseases neglected by previous papers.


2020 ◽  
Vol 16 (8) ◽  
pp. 833-850 ◽  
Author(s):  
Dilip Kumar Choubey ◽  
Manish Kumar ◽  
Vaibhav Shukla ◽  
Sudhakar Tripathi ◽  
Vinay Kumar Dhandhania

Background: The modern society is extremely prone to many life-threatening diseases, which can be easily controlled as well as cured if diagnosed at an early stage. The development and implementation of a disease diagnostic system have gained huge popularity over the years. In the current scenario, there are certain factors such as environment, sedentary lifestyle, genetic (hereditary) are the major factors behind the life threatening diseases such as ‘diabetes.’ Moreover, diabetes has achieved the status of the modern man’s leading chronic disease. So one of the prime needs of this generation is to develop a state-of-the-art expert system which can predict diabetes at a very early stage with a minimum of complexity and in an expedited manner. The primary objective of this work is to develop an indigenous and efficient diagnostic technique for detection of diabetes. Method & Discussion: The proposed methodology comprises of two phases: In the first phase The Pima Indian Diabetes Dataset (PIDD) has been collected from the UCI machine learning repository databases and Localized Diabetes Dataset (LDD) has been gathered from Bombay Medical Hall, Upper Bazar Ranchi, Jharkhand, India. In the second phase, the dataset has been processed through two different approaches. The first approach entails classification through Adaboost, Classification via Regression (CVR), Radial Basis Function Network (RBFN), K-Nearest Neighbor (KNN) on Pima Indian Diabetes Dataset and Localized Diabetes Dataset. In the second approach, Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) have been applied as a feature reduction method followed by using the same set of classification methods used in the first approach. Among all of the implemented classification methods, PCA_CVR achieves the maximum performance for both the above mentioned datasets. Conclusion: In this article, comparative analysis of outcomes obtained by with and without the use of PCA and LDA for the same set of classification method has been done w.r.t performance assessment. Finally, it has been concluded that PCA & LDA both are useful to remove the insignificant features, decreasing the expense and computation time while improving the ROC and accuracy. The used methodology may similarly be applied to other medical diseases.


2021 ◽  
Vol 13 (3) ◽  
pp. 355
Author(s):  
Weixian Tan ◽  
Borong Sun ◽  
Chenyu Xiao ◽  
Pingping Huang ◽  
Wei Xu ◽  
...  

Classification based on polarimetric synthetic aperture radar (PolSAR) images is an emerging technology, and recent years have seen the introduction of various classification methods that have been proven to be effective to identify typical features of many terrain types. Among the many regions of the study, the Hunshandake Sandy Land in Inner Mongolia, China stands out for its vast area of sandy land, variety of ground objects, and intricate structure, with more irregular characteristics than conventional land cover. Accounting for the particular surface features of the Hunshandake Sandy Land, an unsupervised classification method based on new decomposition and large-scale spectral clustering with superpixels (ND-LSC) is proposed in this study. Firstly, the polarization scattering parameters are extracted through a new decomposition, rather than other decomposition approaches, which gives rise to more accurate feature vector estimate. Secondly, a large-scale spectral clustering is applied as appropriate to meet the massive land and complex terrain. More specifically, this involves a beginning sub-step of superpixels generation via the Adaptive Simple Linear Iterative Clustering (ASLIC) algorithm when the feature vector combined with the spatial coordinate information are employed as input, and subsequently a sub-step of representative points selection as well as bipartite graph formation, followed by the spectral clustering algorithm to complete the classification task. Finally, testing and analysis are conducted on the RADARSAT-2 fully PolSAR dataset acquired over the Hunshandake Sandy Land in 2016. Both qualitative and quantitative experiments compared with several classification methods are conducted to show that proposed method can significantly improve performance on classification.


2013 ◽  
Vol 443 ◽  
pp. 741-745
Author(s):  
Hu Li ◽  
Peng Zou ◽  
Wei Hong Han ◽  
Rong Ze Xia

Many real world data is imbalanced, i.e. one category contains significantly more samples than other categories. Traditional classification methods take different categories equally and are often ineffective. Based on the comprehensive analysis of existing researches, we propose a new imbalanced data classification method based on clustering. The method clusters both majority class and minority class at first. Then, clustered minority class will be over-sampled by SMOTE while clustered majority class be under-sampled randomly. Through clustering, the proposed method can avoid the loss of useful information while resampling. Experiments on several UCI datasets show that the proposed method can effectively improve the classification results on imbalanced data.


Interest in computer-assisted image analysis in increasing among the radiologist as it provides them the additional information to take decision and also for better disease diagnosis. Traditionally, MR image is manually examined by medical practitioner through naked eye for the detection and diagnosis of tumor location, size, and intensity; these are difficult and not sufficient for accurate analysis and treatment. For this purpose, there is need for additional automated analysis system for accurate detection of normal and abnormal tumor region. This paper introduces the new semi-automated image processing method to identify the brain tumor region in Magnetic Resonance Image (MRI) using c means clustering technique along with meta-heuristic optimization, based on Jaya optimization algorithm. The resultant performance of the proposed algorithm (FCM +JA) is examined with the help of key analyzing parameters, MSE-Mean Square Error, PSNR-Peak Signal to Noise Ratio, DOI-Dice Overlap Index and CPU memory utilization. The experimental results of this method show better and enhanced tumor region display in reduced computation time.


Author(s):  
Anand Joseph Daniel ◽  
◽  
M Janaki Meena ◽  

With the massive development of Internet technologies and e-commerce technology, people rely on the product reviews provided by users through web. Sentiment analysis of online reviews has become a mainstream way for businesses on e-commerce platforms to satisfy the customers. This paper proposes a novel hybrid framework with Black Widow Optimization (BWO) based feature reduction technique which combines the merits of both machine learning and lexicon-based approaches to attain better scalability and accuracy. The scalability problem arises due to noisy, irrelevant and unique features present in the extracted features from proposed approach, which can be eliminated by adopting an effective feature reduction technique. In our proposed BWO approach, without changing the accuracy (90%), the feature-set size is reduced up to 43%. The proposed feature selection technique outperforms other commonly used Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) based feature selection techniques with reduced computation time of 21 sec. Moreover, our sentiment analysis approach is analyzed using performance metrics such as precision, recall, F-measure, and computation time. Many organizations can use these online reviews to make well-informed decisions towards the users’ interests and preferences to enhance customer satisfaction, product quality and to find the aspects to improve the products, thereby to generate more profits.


2021 ◽  
Vol 23 (06) ◽  
pp. 36-46
Author(s):  
Vrunda Kusanur ◽  
◽  
Veena S Chakravarthi ◽  

Soil temperature and humidity straight away influence plant growth and the availability of plant nutrients. In this work, we carried out experiments to identify the relationship between climatic parameters and plant nutrients. When the relative humidity was very high, deficiency symptoms were shown on plant leaves and fruits. But, recognizing and managing these plant nutrients manually would become difficult. However, no much research has been done in this field. The main objective of this research was to propose a machine learning model to manage nutrient deficiencies in the plant. There were two main phases in the proposed research. In the first phase, the humidity, temperature, and soil moisture in the greenhouse environment were collected using WSN and the influence of these parameters on the growth of plants was studied. During experimentation, it was investigated that the transpiration rate decreased significantly and the macronutrient contents in the plant leave decreased when the humidity was 95%. In the second phase, a machine learning model was developed to identify and classify nutrient deficiency symptoms in a tomato plant. A total of 880 images were collected from Bingo images to form a dataset. Among all these images, 80% (704 images) of the dataset were used to train the machine learning model and 20% (176 images) of the dataset were used for testing the model performance. In this study, we selected K-means Clustering for key points detection and SVM for classification and prediction of nutrient stress in the plant. SVM using linear kernel performed better with the accuracy rates of 89.77 % as compared to SVM using a polynomial kernel.


2021 ◽  
Vol 22 (1) ◽  
pp. 53-66
Author(s):  
D. Anand Joseph Daniel ◽  
M. Janaki Meena

Sentiment analysis of online product reviews has become a mainstream way for businesses on e-commerce platforms to promote their products and improve user satisfaction. Hence, it is necessary to construct an automatic sentiment analyser for automatic identification of sentiment polarity of the online product reviews. Traditional lexicon-based approaches used for sentiment analysis suffered from several accuracy issues while machine learning techniques require labelled training data. This paper introduces a hybrid sentiment analysis framework to bond the gap between both machine learning and lexicon-based approaches. A novel tunicate swarm algorithm (TSA) based feature reduction is integrated with the proposed hybrid method to solve the scalability issue that arises due to a large feature set. It reduces the feature set size to 43% without changing the accuracy (93%). Besides, it improves the scalability, reduces the computation time and enhances the overall performance of the proposed framework. From experimental analysis, it can be observed that TSA outperforms existing feature selection techniques such as particle swarm optimization and genetic algorithm. Moreover, the proposed approach is analysed with performance metrics such as recall, precision, F1-score, feature size and computation time.


Sign in / Sign up

Export Citation Format

Share Document