Developing an Effective Classification Model for Medical Data Analysis

Author(s):  
Naeem Ahmed Mahoto ◽  
Abdul Hafeez Babar

The sparse nature of medical data makes knowledge discovery and prediction a complex task for analysis. Machine learning algorithms have produced promising results for diversified data. This chapter constructs the effective classification model for medical data analysis. In particular, nine classification models, namely Naïve Bayes, decision tree (i.e., J48 and Random Forest), multilayer perceptron, radial bias function, k-nearest neighbors, single conjunctive rule learner, support vector machine, and simple logistics have been applied for developing an effective model. Besides, classification models have also been used in conjunction with ensemble learning methods, since ensemble methods significantly increase the predictive outcomes of the classification models. The evaluation of classification models has been measured using accuracy, f-measure, precision, and recall metrics. The empirical results revealed that the combination of ensemble learning methods with classification models produces better predictions in comparison with sole classification model for the medical data.

2013 ◽  
Vol 25 (3) ◽  
pp. 759-804 ◽  
Author(s):  
Akiko Takeda ◽  
Hiroyuki Mitsugi ◽  
Takafumi Kanamori

A wide variety of machine learning algorithms such as the support vector machine (SVM), minimax probability machine (MPM), and Fisher discriminant analysis (FDA) exist for binary classification. The purpose of this letter is to provide a unified classification model that includes these models through a robust optimization approach. This unified model has several benefits. One is that the extensions and improvements intended for SVMs become applicable to MPM and FDA, and vice versa. For example, we can obtain nonconvex variants of MPM and FDA by mimicking Perez-Cruz, Weston, Hermann, and Schölkopf's ( 2003 ) extension from convex ν-SVM to nonconvex Eν-SVM. Another benefit is to provide theoretical results concerning these learning methods at once by dealing with the unified model. We give a statistical interpretation of the unified classification model and prove that the model is a good approximation for the worst-case minimization of an expected loss with respect to the uncertain probability distribution. We also propose a nonconvex optimization algorithm that can be applied to nonconvex variants of existing learning methods and show promising numerical results.


Materials ◽  
2021 ◽  
Vol 14 (3) ◽  
pp. 542
Author(s):  
José P. S. Aniceto ◽  
Bruno Zêzere ◽  
Carlos M. Silva

Experimental diffusivities are scarcely available, though their knowledge is essential to model rate-controlled processes. In this work various machine learning models to estimate diffusivities in polar and nonpolar solvents (except water and supercritical CO2) were developed. Such models were trained on a database of 90 polar systems (1431 points) and 154 nonpolar systems (1129 points) with data on 20 properties. Five machine learning algorithms were evaluated: multilinear regression, k-nearest neighbors, decision tree, and two ensemble methods (random forest and gradient boosted). For both polar and nonpolar data, the best results were found using the gradient boosted algorithm. The model for polar systems contains 6 variables/parameters (temperature, solvent viscosity, solute molar mass, solute critical pressure, solvent molar mass, and solvent Lennard-Jones energy constant) and showed an average deviation (AARD) of 5.07%. The nonpolar model requires five variables/parameters (the same of polar systems except the Lennard-Jones constant) and presents AARD = 5.86%. These results were compared with four classic models, including the 2-parameter correlation of Magalhães et al. (AARD = 5.19/6.19% for polar/nonpolar) and the predictive Wilke-Chang equation (AARD = 40.92/29.19%). Nonetheless Magalhães et al. requires two parameters per system that must be previously fitted to data. The developed models are coded and provided as command line program.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Xiao-Yan Gao ◽  
Abdelmegeid Amin Ali ◽  
Hassan Shaban Hassan ◽  
Eman M. Anwar

Heart disease is the deadliest disease and one of leading causes of death worldwide. Machine learning is playing an essential role in the medical side. In this paper, ensemble learning methods are used to enhance the performance of predicting heart disease. Two features of extraction methods: linear discriminant analysis (LDA) and principal component analysis (PCA), are used to select essential features from the dataset. The comparison between machine learning algorithms and ensemble learning methods is applied to selected features. The different methods are used to evaluate models: accuracy, recall, precision, F-measure, and ROC.The results show the bagging ensemble learning method with decision tree has achieved the best performance.


2020 ◽  
Vol 12 (6) ◽  
pp. 99-116
Author(s):  
Mousa Al-Akhras ◽  
Mohammed Alawairdhi ◽  
Ali Alkoudari ◽  
Samer Atawneh

Internet of things (IoT) has led to several security threats and challenges within society. Regardless of the benefits that it has brought with it to the society, IoT could compromise the security and privacy of individuals and companies at various levels. Denial of Service (DoS) and Distributed DoS (DDoS) attacks, among others, are the most common attack types that face the IoT networks. To counter such attacks, companies should implement an efficient classification/detection model, which is not an easy task. This paper proposes a classification model to examine the effectiveness of several machine-learning algorithms, namely, Random Forest (RF), k-Nearest Neighbors (KNN), and Naïve Bayes. The machine learning algorithms are used to detect attacks on the UNSW-NB15 benchmark dataset. The UNSW-NB15 contains normal network traffic and malicious traffic instants. The experimental results reveal that RF and KNN classifiers give the best performance with an accuracy of 100% (without noise injection) and 99% (with 10% noise filtering), while the Naïve Bayes classifier gives the worst performance with an accuracy of 95.35% and 82.77 without noise and with 10% noise, respectively. Other evaluation matrices, such as precision and recall, also show the effectiveness of RF and KNN classifiers over Naïve Bayes.


Author(s):  
Adem Doganer

In this study, different models were created to reduce bias by ensemble learning methods. Reducing the bias error will improve the classification performance. In order to increase the classification performance, the most appropriate ensemble learning method and ideal sample size were investigated. Bias values and learning performances of different ensemble learning methods were compared. AdaBoost ensemble learning method provided the lowest bias value with n: 250 sample size while Stacking ensemble learning method provided the lowest bias value with n: 500, n: 750, n: 1000, n: 2000, n: 4000, n: 6000, n: 8000, n: 10000, and n: 20000 sample sizes. When the learning performances were compared, AdaBoost ensemble learning method and RBF classifier achieved the best performance with n: 250 sample size (ACC = 0.956, AUC: 0.987). The AdaBoost ensemble learning method and REPTree classifier achieved the best performance with n: 20000 sample size (ACC = 0.990, AUC = 0.999). In conclusion, for reduction of bias, methods based on stacking displayed a higher performance compared to other methods.


2021 ◽  
Vol 14 (1) ◽  
pp. 326-339
Author(s):  
Rihab Khairy ◽  
◽  
Ameer Hussein ◽  
Haider ALRikabi ◽  
◽  
...  

The movement of cash flow transactions by either electronic channels or physically created openings for the influx of counterfeit banknotes in financial markets. Aided by global economic integration and expanding international trade, attention must be geared at robust techniques for the recognition and detection of counterfeit banknotes. This paper presents ensemble learning algorithms for banknotes detection. The AdaBoost and voting ensemble are deployed in combination with machine learning algorithms. Improved detection accuracies are produced by the ensemble methods. Simulation results certify that the ensemble models of AdaBoost and voting provided accuracies of up to 100% for counterfeit banknotes.


2019 ◽  
Vol 8 (3) ◽  
pp. 1638-1642

The methods of classification that are available in the data mining concepts along with Ensemble methods of data prediction in data mining and machine learning gradually helps to predict the data for the by building the various classification models for future analysis in a better as well as accurate way. The Ensemble learning method algorithms can be used to build the classifiers by taking the weighted vote of the classifiers in order to construct the new data predictions and points. Two or more different data models are taken into consideration for running the process to predict the results in Ensemble Prediction System. In this paper, the the research work carried out by us on diabetic medical data using various classification models like Naive Bayes, Random Forest, Zero R etc. are compared and analyzed with the Ensemble prediction models to prove the efficiency of the used method so as to predict the diabetic syndrome possibility in the patients of various health symptoms. The algorithm used for voting and their uses as well as application on such data to predict the diseases is discussed. The rules developed in this work can be helpful to predict and find the co-disease in the patients of diabetes for decision making and these rules developed have been then ranked according to the final classifier for better form of the disease prediction. The classification methods that are proposed can not only effectively but also can accurately predict the datasets in the various context of disease analysis by improving the accuracy of the classifiers


2019 ◽  
Vol 12 (1) ◽  
pp. 77-88
Author(s):  
Jian-Rong Yao ◽  
Jia-Rui Chen

Credit scoring plays important role in the financial industry. There are different ways employed in the field of credit scoring, such as the traditional logistic regression, discriminant analysis, and linear regression; methods used in the field of machine learning include neural network, k-nearest neighbors, genetic algorithm, support vector machines (SVM), decision tree, and so on. SVM has been demonstrated with good performance in classification. This paper proposes a new hybrid RF-SVM ensemble model, which uses random forest to select important variables, and employs ensemble methods (bagging and boosting) to aggregate single base models (SVM) as a robust classifier. The experimental results suggest that this new model could achieve effective improvement, and has promising potential in the field of credit scoring.


2018 ◽  
Vol 7 (4.7) ◽  
pp. 297
Author(s):  
K. Pavya ◽  
Dr. B.Srinivasan

Early and correct detection of thyroid disease is very important for correct and timely treatment. The need to increase the accuracy of detecting and classifying thyroid disease poses a great challenge not only to the research community but also to healthcare industries. Usage of machine learning algorithms for thyroid disease classification is an area of research that is gaining popularity for the past few years. Automatic thyroid disease computer aided system for diagnosing the disease requires sophisticated and effective algorithms to perform classification in an accurate and time efficient manner. As a solution to this demand, hybrid models that combine clustering and classification algorithms along with ensemble technology are proposed. Four category of thyroid disease prediction system are proposed. They are Clustering + Classification models, Classification + Classification Models, Clustering + Clustering Models and Classification + Clustering Models. Two types of ensembles, namely, homogeneous and heterogeneous, are also considered and analyzed. Performance evaluation showed that the Classification + Classification model based on the combination of SVM and heterogeneous KNN + SVM classifier produce highest prediction accuracy.  


Sign in / Sign up

Export Citation Format

Share Document