Developing an Effective Classification Model for Medical Data Analysis

Advances in Medical Technologies and Clinical Practice - Advanced Classification Techniques for Healthcare Analysis ◽

10.4018/978-1-5225-7796-6.ch001 ◽

2019 ◽

pp. 1-17

Author(s):

Naeem Ahmed Mahoto ◽

Abdul Hafeez Babar

Keyword(s):

Data Analysis ◽

Ensemble Learning ◽

Ensemble Methods ◽

Medical Data ◽

Machine Learning Algorithms ◽

Classification Model ◽

Classification Models ◽

Learning Methods ◽

K Nearest Neighbors ◽

Conjunctive Rule

The sparse nature of medical data makes knowledge discovery and prediction a complex task for analysis. Machine learning algorithms have produced promising results for diversified data. This chapter constructs the effective classification model for medical data analysis. In particular, nine classification models, namely Naïve Bayes, decision tree (i.e., J48 and Random Forest), multilayer perceptron, radial bias function, k-nearest neighbors, single conjunctive rule learner, support vector machine, and simple logistics have been applied for developing an effective model. Besides, classification models have also been used in conjunction with ensemble learning methods, since ensemble methods significantly increase the predictive outcomes of the classification models. The evaluation of classification models has been measured using accuracy, f-measure, precision, and recall metrics. The empirical results revealed that the combination of ensemble learning methods with classification models produces better predictions in comparison with sole classification model for the medical data.

Download Full-text

Overview and Comparison of Machine Learning Methods to Build Classification Model for Prediction of Categorical Outcome Based on Medical Data

Advances in Intelligent Systems and Computing - Cybernetics Approaches in Intelligent Systems ◽

10.1007/978-3-319-67618-0_20 ◽

2017 ◽

pp. 216-224

Author(s):

Andrea Peterkova ◽

German Michalconok ◽

Allan Bohm

Keyword(s):

Machine Learning ◽

Medical Data ◽

Classification Model ◽

Learning Methods ◽

Machine Learning Methods

Download Full-text

A Unified Classification Model Based on Robust Optimization

Neural Computation ◽

10.1162/neco_a_00412 ◽

2013 ◽

Vol 25 (3) ◽

pp. 759-804 ◽

Cited By ~ 6

Author(s):

Akiko Takeda ◽

Hiroyuki Mitsugi ◽

Takafumi Kanamori

Keyword(s):

Robust Optimization ◽

Unified Model ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Optimization Approach ◽

Statistical Interpretation ◽

Fisher Discriminant Analysis ◽

Learning Methods ◽

Worst Case

A wide variety of machine learning algorithms such as the support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA) exist for binary classification. The purpose of this letter is to provide a unified classification model that includes these models through a robust optimization approach. This unified model has several benefits. One is that the extensions and improvements intended for SVMs become applicable to MPM and FDA, and vice versa. For example, we can obtain nonconvex variants of MPM and FDA by mimicking Perez-Cruz, Weston, Hermann, and Schölkopf's ( 2003 ) extension from convex ν-SVM to nonconvex Eν-SVM. Another benefit is to provide theoretical results concerning these learning methods at once by dealing with the unified model. We give a statistical interpretation of the unified classification model and prove that the model is a good approximation for the worst-case minimization of an expected loss with respect to the uncertain probability distribution. We also propose a nonconvex optimization algorithm that can be applied to nonconvex variants of existing learning methods and show promising numerical results.

Download Full-text

Predictive Models for the Binary Diffusion Coefficient at Infinite Dilution in Polar and Nonpolar Fluids

Materials ◽

10.3390/ma14030542 ◽

2021 ◽

Vol 14 (3) ◽

pp. 542

Author(s):

José P. S. Aniceto ◽

Bruno Zêzere ◽

Carlos M. Silva

Keyword(s):

Machine Learning ◽

Molar Mass ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Multilinear Regression ◽

K Nearest Neighbors ◽

Average Deviation ◽

Lennard Jones ◽

Nonpolar Solvents ◽

Two Parameters

Experimental diffusivities are scarcely available, though their knowledge is essential to model rate-controlled processes. In this work various machine learning models to estimate diffusivities in polar and nonpolar solvents (except water and supercritical CO2) were developed. Such models were trained on a database of 90 polar systems (1431 points) and 154 nonpolar systems (1129 points) with data on 20 properties. Five machine learning algorithms were evaluated: multilinear regression, k-nearest neighbors, decision tree, and two ensemble methods (random forest and gradient boosted). For both polar and nonpolar data, the best results were found using the gradient boosted algorithm. The model for polar systems contains 6 variables/parameters (temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and solvent Lennard-Jones energy constant) and showed an average deviation (AARD) of 5.07%. The nonpolar model requires five variables/parameters (the same of polar systems except the Lennard-Jones constant) and presents AARD = 5.86%. These results were compared with four classic models, including the 2-parameter correlation of Magalhães et al. (AARD = 5.19/6.19% for polar/nonpolar) and the predictive Wilke-Chang equation (AARD = 40.92/29.19%). Nonetheless Magalhães et al. requires two parameters per system that must be previously fitted to data. The developed models are coded and provided as command line program.

Download Full-text

Improving the Accuracy for Analyzing Heart Diseases Prediction Based on the Ensemble Method

Complexity ◽

10.1155/2021/6663455 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Xiao-Yan Gao ◽

Abdelmegeid Amin Ali ◽

Hassan Shaban Hassan ◽

Eman M. Anwar

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Ensemble Learning ◽

Heart Diseases ◽

Principal Component ◽

Extraction Methods ◽

Machine Learning Algorithms ◽

Learning Methods ◽

Linear Discriminant ◽

Bagging Ensemble

Heart disease is the deadliest disease and one of leading causes of death worldwide. Machine learning is playing an essential role in the medical side. In this paper, ensemble learning methods are used to enhance the performance of predicting heart disease. Two features of extraction methods: linear discriminant analysis (LDA) and principal component analysis (PCA), are used to select essential features from the dataset. The comparison between machine learning algorithms and ensemble learning methods is applied to selected features. The different methods are used to evaluate models: accuracy, recall, precision, F-measure, and ROC.The results show the bagging ensemble learning method with decision tree has achieved the best performance.

Download Full-text

Using Machine Learning to Build a Classification Model for IoT Networks to Detect Attack Signatures

International journal of Computer Networks & Communications ◽

10.5121/ijcnc.2020.12607 ◽

2020 ◽

Vol 12 (6) ◽

pp. 99-116

Author(s):

Mousa Al-Akhras ◽

Mohammed Alawairdhi ◽

Ali Alkoudari ◽

Samer Atawneh

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Denial Of Service ◽

Learning Algorithms ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Classification Model ◽

Security And Privacy ◽

K Nearest Neighbors ◽

Detection Model

Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies at various levels. Denial of Service (DoS) and Distributed DoS (DDoS) attacks, among others, are the most common attack types that face the IoT networks. To counter such attacks, companies should implement an efficient classification/detection model, which is not an easy task. This paper proposes a classification model to examine the effectiveness of several machine-learning algorithms, namely, Random Forest (RF), k-Nearest Neighbors (KNN), and Naïve Bayes. The machine learning algorithms are used to detect attacks on the UNSW-NB15 benchmark dataset. The UNSW-NB15 contains normal network traffic and malicious traffic instants. The experimental results reveal that RF and KNN classifiers give the best performance with an accuracy of 100% (without noise injection) and 99% (with 10% noise filtering), while the Naïve Bayes classifier gives the worst performance with an accuracy of 95.35% and 82.77 without noise and with 10% noise, respectively. Other evaluation matrices, such as precision and recall, also show the effectiveness of RF and KNN classifiers over Naïve Bayes.

Download Full-text

Different Approaches to Reducing Bias in Classification of Medical Data by Ensemble Learning Methods

International Journal of Big Data and Analytics in Healthcare ◽

10.4018/ijbdah.20210701.oa2 ◽

2021 ◽

Vol 6 (2) ◽

pp. 15-30

Author(s):

Adem Doganer

Keyword(s):

Sample Size ◽

Ensemble Learning ◽

Classification Performance ◽

Medical Data ◽

Bias Error ◽

Learning Method ◽

Sample Sizes ◽

Learning Methods ◽

Learning Performances

In this study, different models were created to reduce bias by ensemble learning methods. Reducing the bias error will improve the classification performance. In order to increase the classification performance, the most appropriate ensemble learning method and ideal sample size were investigated. Bias values and learning performances of different ensemble learning methods were compared. AdaBoost ensemble learning method provided the lowest bias value with n: 250 sample size while Stacking ensemble learning method provided the lowest bias value with n: 500, n: 750, n: 1000, n: 2000, n: 4000, n: 6000, n: 8000, n: 10000, and n: 20000 sample sizes. When the learning performances were compared, AdaBoost ensemble learning method and RBF classifier achieved the best performance with n: 250 sample size (ACC = 0.956, AUC: 0.987). The AdaBoost ensemble learning method and REPTree classifier achieved the best performance with n: 20000 sample size (ACC = 0.990, AUC = 0.999). In conclusion, for reduction of bias, methods based on stacking displayed a higher performance compared to other methods.

Download Full-text

The Detection of Counterfeit Banknotes Using Ensemble Learning Techniques of AdaBoost and Voting

International Journal of Intelligent Engineering and Systems ◽

10.22266/ijies2021.0228.31 ◽

2021 ◽

Vol 14 (1) ◽

pp. 326-339

Author(s):

Rihab Khairy ◽

◽

Ameer Hussein ◽

Haider ALRikabi ◽

◽

...

Keyword(s):

Financial Markets ◽

Ensemble Learning ◽

Economic Integration ◽

Learning Algorithms ◽

Ensemble Methods ◽

Machine Learning Algorithms ◽

Learning Techniques ◽

Robust Techniques ◽

Simulation Results ◽

Electronic Channels

The movement of cash flow transactions by either electronic channels or physically created openings for the influx of counterfeit banknotes in financial markets. Aided by global economic integration and expanding international trade, attention must be geared at robust techniques for the recognition and detection of counterfeit banknotes. This paper presents ensemble learning algorithms for banknotes detection. The AdaBoost and voting ensemble are deployed in combination with machine learning algorithms. Improved detection accuracies are produced by the ensemble methods. Simulation results certify that the ensemble models of AdaBoost and voting provided accuracies of up to 100% for counterfeit banknotes.

Download Full-text

Co- Disease prediction in Diabetic Patients using Ensemble learning for Decision Support System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c4428.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 1638-1642

Keyword(s):

Data Mining ◽

Ensemble Learning ◽

Prediction Models ◽

Research Work ◽

Ensemble Methods ◽

Ensemble Prediction ◽

Diabetic Patients ◽

Disease Prediction ◽

Classification Models ◽

Ensemble Prediction System

The methods of classification that are available in the data mining concepts along with Ensemble methods of data prediction in data mining and machine learning gradually helps to predict the data for the by building the various classification models for future analysis in a better as well as accurate way. The Ensemble learning method algorithms can be used to build the classifiers by taking the weighted vote of the classifiers in order to construct the new data predictions and points. Two or more different data models are taken into consideration for running the process to predict the results in Ensemble Prediction System. In this paper, the the research work carried out by us on diabetic medical data using various classification models like Naive Bayes, Random Forest, Zero R etc. are compared and analyzed with the Ensemble prediction models to prove the efficiency of the used method so as to predict the diabetic syndrome possibility in the patients of various health symptoms. The algorithm used for voting and their uses as well as application on such data to predict the diseases is discussed. The rules developed in this work can be helpful to predict and find the co-disease in the patients of diabetes for decision making and these rules developed have been then ranked according to the final classifier for better form of the disease prediction. The classification methods that are proposed can not only effectively but also can accurately predict the datasets in the various context of disease analysis by improving the accuracy of the classifiers

Download Full-text

A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring

Journal of Information Technology Research ◽

10.4018/jitr.2019010106 ◽

2019 ◽

Vol 12 (1) ◽

pp. 77-88

Author(s):

Jian-Rong Yao ◽

Jia-Rui Chen

Keyword(s):

Credit Scoring ◽

Ensemble Methods ◽

Ensemble Classification ◽

Classification Model ◽

Support Vector ◽

Ensemble Model ◽

Financial Industry ◽

K Nearest Neighbors ◽

Regression Methods ◽

Vector Machines

Credit scoring plays important role in the financial industry. There are different ways employed in the field of credit scoring, such as the traditional logistic regression, discriminant analysis, and linear regression; methods used in the field of machine learning include neural network, k-nearest neighbors, genetic algorithm, support vector machines (SVM), decision tree, and so on. SVM has been demonstrated with good performance in classification. This paper proposes a new hybrid RF-SVM ensemble model, which uses random forest to select important variables, and employs ensemble methods (bagging and boosting) to aggregate single base models (SVM) as a robust classifier. The experimental results suggest that this new model could achieve effective improvement, and has promising potential in the field of credit scoring.

Download Full-text

Hybrid Thyroid Stage Prediction Models Combining Classification, Clustering and Ensemble Systems

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.7.20565 ◽

2018 ◽

Vol 7 (4.7) ◽

pp. 297

Author(s):

K. Pavya ◽

Dr. B.Srinivasan

Keyword(s):

Thyroid Disease ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Disease Classification ◽

Classification Model ◽

Svm Classifier ◽

Classification Models ◽

Efficient Manner ◽

Clustering And Classification ◽

Ensemble Systems

Early and correct detection of thyroid disease is very important for correct and timely treatment. The need to increase the accuracy of detecting and classifying thyroid disease poses a great challenge not only to the research community but also to healthcare industries. Usage of machine learning algorithms for thyroid disease classification is an area of research that is gaining popularity for the past few years. Automatic thyroid disease computer aided system for diagnosing the disease requires sophisticated and effective algorithms to perform classification in an accurate and time efficient manner. As a solution to this demand, hybrid models that combine clustering and classification algorithms along with ensemble technology are proposed. Four category of thyroid disease prediction system are proposed. They are Clustering + Classification models, Classification + Classification Models, Clustering + Clustering Models and Classification + Clustering Models. Two types of ensembles, namely, homogeneous and heterogeneous, are also considered and analyzed. Performance evaluation showed that the Classification + Classification model based on the combination of SVM and heterogeneous KNN + SVM classifier produce highest prediction accuracy.

Download Full-text