Improving Classification Performance for a Novel Imbalanced Medical Dataset using SMOTE Method

Ahmed Jameel Mohammed

doi:10.30534/ijatcse/2020/104932020

Indepth Analysis of Medical Dataset Mining: A Comparitive Analysis on a Diabetes Dataset Before and After Preprocessing

KnE Social Sciences ◽

10.18502/kss.v3i25.5190 ◽

2019 ◽

Cited By ~ 1

Author(s):

Latifa Nass ◽

Stephen Swift ◽

Ammar Al Dallal

Keyword(s):

Data Mining ◽

Error Rate ◽

Missing Values ◽

Graphical Representation ◽

Classification Performance ◽

Medical Data ◽

Machine Learning Algorithms ◽

Classification Algorithms ◽

Data Set ◽

Medical Dataset

Most of the healthcare organizations and medical research institutions store their patient’s data digitally for future references and for planning their future treatments. This heterogeneous medical dataset is very difficult to analyze due to its complexity and volume of data, in addition to having missing values and noise which makes this mining a tedious task. Efficient classification of medical dataset is a major data mining problem then and now. Diagnosis, prediction of diseases and the precision of results can be improved if relationships and patterns from these complex medical datasets are extracted efficiently. This paper analyses some of the major classification algorithms such as C4.5 ( J48), SMO, Naïve Bayes, KNN Classification algorithms and Random Forest and the performance of these algorithms are compared using WEKA. Performance evaluation of these algorithms is based on Accuracy, Sensitivity and Specificity and Error rate. The medical data set used in this study are Heart-Statlog Medical Data Set which holds medical data related to heart disease and Pima Diabetes Dataset which holds data related to Diabetics. This study contributes in finding the most suitable algorithm for classifying medical data and also reveals the importance of preprocessing in improving the classification performance. Comparative study of various performances of machine learning algorithms is done through graphical representation of the results. Keywords: Data Mining, Health Care, Classification Algorithms, Accuracy, Sensitivity, Specificity, Error Rate

Download Full-text

Characteristics of Rough SetC-Means Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2018.p0551 ◽

2018 ◽

Vol 22 (4) ◽

pp. 551-564 ◽

Cited By ~ 3

Author(s):

Seiki Ubukata ◽

◽

Keisuke Umado ◽

Akira Notsu ◽

Katsuhiro Honda

Keyword(s):

Set Theory ◽

Rough Set ◽

Malignant Disease ◽

Rough Set Theory ◽

Fuzzy Theory ◽

Classification Performance ◽

Approximation Space ◽

Clustering Model ◽

Positive Region ◽

Medical Dataset

HardC-means (HCM), which is one of the most popular clustering techniques, has been extended by using soft computing approaches such as fuzzy theory and rough set theory. FuzzyC-means (FCM) and roughC-means (RCM) are respectively fuzzy and rough set extensions of HCM. RCM can detect the positive and the possible regions of clusters by using the lower and the upper areas which are respectively analogous to the lower and the upper approximations in rough set theory. RCM-type methods have the problem that the original definitions of the lower and the upper approximations are not actually used. In this paper, rough setC-means (RSCM), which is an extension of HCM based on the original rough set definition, is proposed as a rough set-based counterpart of RCM. Specifically, RSCM is proposed as a clustering model on an approximation space considering a space granulated by a binary relation and uses the lower and the upper approximations of temporal clusters. For this study, we investigated the characteristics of the proposed RSCM through basic considerations, visual demonstrations, and comparative experiments. We observed the geometric characteristics of the examined methods by using visualizations and numerical experiments conducted for the problem of classifying patients as having benign or malignant disease based on a medical dataset. We compared the classification performance by viewing the trade-off between the classification accuracy in the positive region and the fraction of objects classified as being in the positive region.

Download Full-text

The Sound of Enemies and Friends in the Neighborhood

Experimental Psychology (formerly Zeitschrift für Experimentelle Psychologie) ◽

10.1027/1618-3169/a000113 ◽

2011 ◽

Vol 58 (6) ◽

pp. 454-463 ◽

Cited By ~ 1

Author(s):

Diane Pecher ◽

Inge Boot ◽

Saskia van Dantzig ◽

Carol J. Madden ◽

David E. Huber ◽

...

Keyword(s):

Word Recognition ◽

Visual Word Recognition ◽

Neighborhood Effects ◽

Classification Performance ◽

Visual Word ◽

Orthographic Neighborhood ◽

Semantic Classification ◽

Semantic Class ◽

Orthographic Neighbors ◽

Target Words

Previous studies (e.g., Pecher, Zeelenberg, & Wagenmakers, 2005) found that semantic classification performance is better for target words with orthographic neighbors that are mostly from the same semantic class (e.g., living) compared to target words with orthographic neighbors that are mostly from the opposite semantic class (e.g., nonliving). In the present study we investigated the contribution of phonology to orthographic neighborhood effects by comparing effects of phonologically congruent orthographic neighbors (book-hook) to phonologically incongruent orthographic neighbors (sand-wand). The prior presentation of a semantically congruent word produced larger effects on subsequent animacy decisions when the previously presented word was a phonologically congruent neighbor than when it was a phonologically incongruent neighbor. In a second experiment, performance differences between target words with versus without semantically congruent orthographic neighbors were larger if the orthographic neighbors were also phonologically congruent. These results support models of visual word recognition that assume an important role for phonology in cascaded access to meaning.

Download Full-text

Automated Characterization of Atheromatous Plaque in Intravascular Ultrasound Images Using Neuro Fuzzy Classifier

International Journal of Electronics and Telecommunications ◽

10.2478/v10177-012-0058-7 ◽

2012 ◽

Vol 58 (4) ◽

pp. 425-431 ◽

Cited By ~ 3

Author(s):

D. Selvathi ◽

N. Emimal ◽

Henry Selvaraj

Keyword(s):

Intravascular Ultrasound ◽

Human Error ◽

Human Life ◽

Classification Performance ◽

Atheromatous Plaque ◽

Ultrasound Images ◽

Fuzzy Classifier ◽

Neuro Fuzzy ◽

Calcified Tissues

Abstract The medical imaging field has grown significantly in recent years and demands high accuracy since it deals with human life. The idea is to reduce human error as much as possible by assisting physicians and radiologists with some automatic techniques. The use of artificial intelligent techniques has shown great potential in this field. Hence, in this paper the neuro fuzzy classifier is applied for the automated characterization of atheromatous plaque to identify the fibrotic, lipidic and calcified tissues in Intravascular Ultrasound images (IVUS) which is designed using sixteen inputs, corresponds to sixteen pixels of instantaneous scanning matrix, one output that tells whether the pixel under consideration is Fibrotic, Lipidic, Calcified or Normal pixel. The classification performance was evaluated in terms of sensitivity, specificity and accuracy and the results confirmed that the proposed system has potential in detecting the respective plaque with the average accuracy of 98.9%.

Download Full-text

A Survey of Classification Methods and Techniques for Improving Classification Performance

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i8.233240 ◽

2019 ◽

Vol 7 (8) ◽

pp. 233-240

Author(s):

M. Balasaraswathi ◽

A. Uthiramoorthy

Keyword(s):

Classification Performance ◽

Classification Methods ◽

Methods And Techniques

Download Full-text

Predicting Heart-Diseases from Medical Dataset Through Frequent Itemsets Using Improved Algorithm

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i8.325331 ◽

2018 ◽

Vol 6 (8) ◽

pp. 325-331

Author(s):

V. Vijayalakshmi

Keyword(s):

Heart Diseases ◽

Frequent Itemsets ◽

Medical Dataset ◽

Improved Algorithm

Download Full-text

Safety colours and safety signs. Classification, performance and durability of safety signs

10.3403/03105511u ◽

2015 ◽

Keyword(s):

Classification Performance

Download Full-text

Safety colours and safety signs. Classification, performance and durability of safety signs

10.3403/03105511 ◽

2004 ◽

Keyword(s):

Classification Performance

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

Download Full-text

An Optimized Approach for Breast Cancer Classification for Histopathological Images Based on Hybrid Feature Set

Current Medical Imaging Formerly Current Medical Imaging Reviews ◽

10.2174/1573405616666200423085826 ◽

2020 ◽

Vol 16 ◽

Cited By ~ 1

Author(s):

Inzamam Mashood Nasir ◽

Muhammad Rashid ◽

Jamal Hussain Shah ◽

Muhammad Sharif ◽

Muhammad Yahiya Haider Awan ◽

...

Keyword(s):

Breast Cancer ◽

Cancer Detection ◽

State Of The Art ◽

Hybrid Approach ◽

Classification Performance ◽

Diagnose Breast Cancer ◽

Histopathological Images ◽

And Performance ◽

Learned Features ◽

Intelligent Healthcare

Background: Breast cancer is considered as the most perilous sickness among females worldwide and the ratio of new cases is expanding yearly. Many researchers have proposed efficient algorithms to diagnose breast cancer at early stages, which have increased the efficiency and performance by utilizing the learned features of gold standard histopathological images. Objective: Most of these systems have either used traditional handcrafted features or deep features which had a lot of noise and redundancy, which ultimately decrease the performance of the system. Methods: A hybrid approach is proposed by fusing and optimizing the properties of handcrafted and deep features to classify the breast cancer images. HOG and LBP features are serially fused with pretrained models VGG19 and InceptionV3. PCR and ICR are used to evaluate the classification performance of proposed method. Results: The method concentrates on histopathological images to classify the breast cancer. The performance is compared with state-of-the-art techniques, where an overall patient-level accuracy of 97.2% and image-level accuracy of 96.7% is recorded. Conclusion: The proposed hybrid method achieves the best performance as compared to previous methods and it can be used for the intelligent healthcare systems and early breast cancer detection.

Download Full-text