Improving Classification Performance for a Novel Imbalanced Medical Dataset using SMOTE Method

Author(s):  
Ahmed Jameel Mohammed
Author(s):  
Latifa Nass ◽  
Stephen Swift ◽  
Ammar Al Dallal

Most of the healthcare organizations and medical research institutions store their patient’s data digitally for future references and for planning their future treatments. This heterogeneous medical dataset is very difficult to analyze due to its complexity and volume of data, in addition to having missing values and noise which makes this mining a tedious task. Efficient classification of medical dataset is a major data mining problem then and now. Diagnosis, prediction of diseases and the precision of results can be improved if relationships and patterns from these complex medical datasets are extracted efficiently. This paper analyses some of the major classification algorithms such as C4.5 ( J48), SMO, Naïve Bayes, KNN Classification algorithms and Random Forest and the performance of these algorithms are compared using WEKA. Performance evaluation of these algorithms is based on Accuracy, Sensitivity and Specificity and Error rate. The medical data set used in this study are Heart-Statlog Medical Data Set which holds medical data related to heart disease and Pima Diabetes Dataset which holds data related to Diabetics. This study contributes in finding the most suitable algorithm for classifying medical data and also reveals the importance of preprocessing in improving the classification performance. Comparative study of various performances of machine learning algorithms is done through graphical representation of the results. Keywords: Data Mining, Health Care, Classification Algorithms, Accuracy, Sensitivity, Specificity, Error Rate


Author(s):  
Seiki Ubukata ◽  
◽  
Keisuke Umado ◽  
Akira Notsu ◽  
Katsuhiro Honda

HardC-means (HCM), which is one of the most popular clustering techniques, has been extended by using soft computing approaches such as fuzzy theory and rough set theory. FuzzyC-means (FCM) and roughC-means (RCM) are respectively fuzzy and rough set extensions of HCM. RCM can detect the positive and the possible regions of clusters by using the lower and the upper areas which are respectively analogous to the lower and the upper approximations in rough set theory. RCM-type methods have the problem that the original definitions of the lower and the upper approximations are not actually used. In this paper, rough setC-means (RSCM), which is an extension of HCM based on the original rough set definition, is proposed as a rough set-based counterpart of RCM. Specifically, RSCM is proposed as a clustering model on an approximation space considering a space granulated by a binary relation and uses the lower and the upper approximations of temporal clusters. For this study, we investigated the characteristics of the proposed RSCM through basic considerations, visual demonstrations, and comparative experiments. We observed the geometric characteristics of the examined methods by using visualizations and numerical experiments conducted for the problem of classifying patients as having benign or malignant disease based on a medical dataset. We compared the classification performance by viewing the trade-off between the classification accuracy in the positive region and the fraction of objects classified as being in the positive region.


Author(s):  
Diane Pecher ◽  
Inge Boot ◽  
Saskia van Dantzig ◽  
Carol J. Madden ◽  
David E. Huber ◽  
...  

Previous studies (e.g., Pecher, Zeelenberg, & Wagenmakers, 2005) found that semantic classification performance is better for target words with orthographic neighbors that are mostly from the same semantic class (e.g., living) compared to target words with orthographic neighbors that are mostly from the opposite semantic class (e.g., nonliving). In the present study we investigated the contribution of phonology to orthographic neighborhood effects by comparing effects of phonologically congruent orthographic neighbors (book-hook) to phonologically incongruent orthographic neighbors (sand-wand). The prior presentation of a semantically congruent word produced larger effects on subsequent animacy decisions when the previously presented word was a phonologically congruent neighbor than when it was a phonologically incongruent neighbor. In a second experiment, performance differences between target words with versus without semantically congruent orthographic neighbors were larger if the orthographic neighbors were also phonologically congruent. These results support models of visual word recognition that assume an important role for phonology in cascaded access to meaning.


2012 ◽  
Vol 58 (4) ◽  
pp. 425-431 ◽  
Author(s):  
D. Selvathi ◽  
N. Emimal ◽  
Henry Selvaraj

Abstract The medical imaging field has grown significantly in recent years and demands high accuracy since it deals with human life. The idea is to reduce human error as much as possible by assisting physicians and radiologists with some automatic techniques. The use of artificial intelligent techniques has shown great potential in this field. Hence, in this paper the neuro fuzzy classifier is applied for the automated characterization of atheromatous plaque to identify the fibrotic, lipidic and calcified tissues in Intravascular Ultrasound images (IVUS) which is designed using sixteen inputs, corresponds to sixteen pixels of instantaneous scanning matrix, one output that tells whether the pixel under consideration is Fibrotic, Lipidic, Calcified or Normal pixel. The classification performance was evaluated in terms of sensitivity, specificity and accuracy and the results confirmed that the proposed system has potential in detecting the respective plaque with the average accuracy of 98.9%.


2020 ◽  
Author(s):  
Nalika Ulapane ◽  
Karthick Thiyagarajan ◽  
sarath kodagoda

<div>Classification has become a vital task in modern machine learning and Artificial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classification. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classifier performance. In this paper, we consider the case of a given supervised learning classification task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classification performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classification accuracy of a Support Vector Machine (SVM) classifier increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>


Author(s):  
Inzamam Mashood Nasir ◽  
Muhammad Rashid ◽  
Jamal Hussain Shah ◽  
Muhammad Sharif ◽  
Muhammad Yahiya Haider Awan ◽  
...  

Background: Breast cancer is considered as the most perilous sickness among females worldwide and the ratio of new cases is expanding yearly. Many researchers have proposed efficient algorithms to diagnose breast cancer at early stages, which have increased the efficiency and performance by utilizing the learned features of gold standard histopathological images. Objective: Most of these systems have either used traditional handcrafted features or deep features which had a lot of noise and redundancy, which ultimately decrease the performance of the system. Methods: A hybrid approach is proposed by fusing and optimizing the properties of handcrafted and deep features to classify the breast cancer images. HOG and LBP features are serially fused with pretrained models VGG19 and InceptionV3. PCR and ICR are used to evaluate the classification performance of proposed method. Results: The method concentrates on histopathological images to classify the breast cancer. The performance is compared with state-of-the-art techniques, where an overall patient-level accuracy of 97.2% and image-level accuracy of 96.7% is recorded. Conclusion: The proposed hybrid method achieves the best performance as compared to previous methods and it can be used for the intelligent healthcare systems and early breast cancer detection.


Sign in / Sign up

Export Citation Format

Share Document