Comparative Analysis of Machine Learning Techniques with Principal Component Analysis on Kidney and Heart Disease

Detection of disease at earlier stages is the most challenging one. Datasets of different diseases are available online with different number of features corresponding to a particular disease. Many dimensionality reduction and feature extraction techniques are used nowadays to reduce the number of features in dataset and finding the most appropriate ones. This paper explores the difference in performance of different machine learning models using Principal Component Analysis dimensionality reduction technique on the datasets of Chronic kidney disease and Cardiovascular disease. Further, the authors apply Logistic Regression, K Nearest Neighbour, Naïve Bayes, Support Vector Machine and Random Forest Model on the datasets and compare the performance of the model with and without PCA. A key challenge in the field of data mining and machine learning is building accurate and computationally efficient classifiers for medical applications. With an accuracy of 100% in chronic kidney disease and 85% for heart disease, KNN classifier and logistic regression were revealed to be the most optimal method of predictions for kidney and heart disease respectively.

Download Full-text

A Detailed Analysis on Kidney and Heart Disease Prediction using Machine Learning

Journal of Computing and Natural Science ◽

10.53759/181x/jcns202101003 ◽

2021 ◽

pp. 9-14

Author(s):

Claire Salkar

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Logistic Regression ◽

Heart Disease ◽

Kidney Disease ◽

Principal Component ◽

Support Vector ◽

Computationally Efficient ◽

Optimal Method ◽

The Difference

Detection of disease at earlier stages is the most challenging one. Datasets of different diseases are available online with different number of features corresponding to a particular disease. Many dimensionalities reduction and feature extraction techniques are used nowadays to reduce the number of features in dataset and finding the most appropriate ones. This paper explores the difference in performance of different machine learning models using Principal Component Analysis dimensionality reduction technique on the datasets of Chronic kidney disease and Cardiovascular disease. Further, the authors apply Logistic Regression, K Nearest Neighbour, Naïve Bayes, Support Vector Machine and Random Forest Model on the datasets and compare the performance of the model with and without PCA. A key challenge in the field of data mining and machine learning is building accurate and computationally efficient classifiers for medical applications. With an accuracy of 100% in chronic kidney disease and 85% for heart disease, KNN classifier and logistic regression were revealed to be the most optimal method of predictions for kidney and heart disease respectively.

Download Full-text

A Novel Integrated Principal Component Analysis and Support vector Machines based diagnostic system for detection of Chronic Kidney disease

International Journal of Data Analysis Techniques and Strategies ◽

10.1504/ijdats.2020.10018953 ◽

2020 ◽

Vol 12 (2) ◽

pp. 1

Author(s):

Babita Pandey ◽

Aditya Khamparia

Keyword(s):

Chronic Kidney Disease ◽

Principal Component Analysis ◽

Support Vector Machines ◽

Kidney Disease ◽

Diagnostic System ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Vector Machines

Download Full-text

A novel integrated principal component analysis and support vector machines-based diagnostic system for detection of chronic kidney disease

International Journal of Data Analysis Techniques and Strategies ◽

10.1504/ijdats.2020.106641 ◽

2020 ◽

Vol 12 (2) ◽

pp. 99

Author(s):

Aditya Khamparia ◽

Babita Pandey

Keyword(s):

Chronic Kidney Disease ◽

Principal Component Analysis ◽

Support Vector Machines ◽

Kidney Disease ◽

Diagnostic System ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Vector Machines

Download Full-text

Comparative Analysis of Machine Learning Techniques with Principal Component Analysis on Kidney and Heart Disease

10.1109/icesc51422.2021.9533011 ◽

2021 ◽

Author(s):

Reena Chandra ◽

Manoj Kapil ◽

Avinash Sharma

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Heart Disease ◽

Comparative Analysis ◽

Principal Component ◽

Component Analysis ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Physical-oriented and machine learning-based emission modeling in a diesel compression ignition engine: Dimensionality reduction and regression

International Journal of Engine Research ◽

10.1177/14680874211070736 ◽

2022 ◽

pp. 146808742110707

Author(s):

Aran Mohammad ◽

Reza Rezaei ◽

Christopher Hayduk ◽

Thaddaeus Delebinski ◽

Saeid Shahpouri ◽

...

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Factor Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Data Driven ◽

Support Vector ◽

Emission Models ◽

Emission Modeling

The development of internal combustion engines is affected by the exhaust gas emissions legislation and the striving to increase performance. This demands for engine-out emission models that can be used for engine optimization for real driving emission controls. The prediction capability of physically and data-driven engine-out emission models is influenced by the system inputs, which are specified by the user and can lead to an improved accuracy with increasing number of inputs. Thereby the occurrence of irrelevant inputs becomes more probable, which have a low functional relation to the emissions and can lead to overfitting. Alternatively, data-driven methods can be used to detect irrelevant and redundant inputs. In this work, thermodynamic states are modeled based on 772 stationary measured test bench data from a commercial vehicle diesel engine. Afterward, 37 measured and modeled variables are led into a data-driven dimensionality reduction. For this purpose, approaches of supervised learning, such as lasso regression and linear support vector machine, and unsupervised learning methods like principal component analysis and factor analysis are applied to select and extract the relevant features. The selected and extracted features are used for regression by the support vector machine and the feedforward neural network to model the NOx, CO, HC, and soot emissions. This enables an evaluation of the modeling accuracy as a result of the dimensionality reduction. Using the methods in this work, the 37 variables are reduced to 25, 22, 11, and 16 inputs for NOx, CO, HC, and soot emission modeling while maintaining the accuracy. The features selected using the lasso algorithm provide more accurate learning of the regression models than the extracted features through principal component analysis and factor analysis. This results in test errors RMSETe for modeling NOx, CO, HC, and soot emissions 19.22 ppm, 6.46 ppm, 1.29 ppm, and 0.06 FSN, respectively.

Download Full-text

Dimensionality Reduction using PCA and K-Means Clustering for Breast Cancer Prediction

Lontar Komputer Jurnal Ilmiah Teknologi Informasi ◽

10.24843/lkjiti.2018.v09.i03.p08 ◽

2018 ◽

pp. 192 ◽

Cited By ~ 2

Author(s):

Ade Jamal ◽

Annisa Handayani ◽

Ali Akbar Septiandri ◽

Endang Ripmiatin ◽

Yunus Effendi

Keyword(s):

Breast Cancer ◽

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Gradient Boosting ◽

Support Vector ◽

Breast Cancer Dataset ◽

Cancer Prediction ◽

Extreme Gradient Boosting

Breast cancer is the most important cause of death among women. A prediction of breast cancer in early stage provides a greater possibility of its cure. It needs a breast cancer prediction tool that can classify a breast tumor whether it was a harmful malignant tumor or un-harmful benign tumor. In this paper, two algorithms of machine learning, namely Support Vector Machine and Extreme Gradient Boosting technique will be compared for classification purpose. Prior to the classification, the number of data attribute will be reduced from the raw data by extracting features using Principal Component Analysis. A clustering method, namely K-Means is also used for dimensionality reduction besides the Principal Component Analysis. This paper will present a comparison among four models based on two dimensionality reduction methods combined with two classifiers which applied on Wisconsin Breast Cancer Dataset. The comparison will be measured by using accuracy, sensitivity and specificity metrics evaluated from the confusion matrices. The experimental results have indicated that the K-Means method, which is not usually used for dimensionality reduction can perform well compared to the popular Principal Component Analysis.

Download Full-text

Multivariate Analysis and Machine Learning for Ripeness Classification of Cape Gooseberry Fruits

Processes ◽

10.3390/pr7120928 ◽

2019 ◽

Vol 7 (12) ◽

pp. 928 ◽

Cited By ~ 2

Author(s):

Miguel De-la-Torre ◽

Omar Zatarain ◽

Himer Avila-George ◽

Mirna Muñoz ◽

Jimy Oblitas ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Feature Selection ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Color Spaces ◽

Combination Methods ◽

Fruit Samples ◽

Cape Gooseberry

This paper explores five multivariate techniques for information fusion on sorting the visual ripeness of Cape gooseberry fruits (principal component analysis, linear discriminant analysis, independent component analysis, eigenvector centrality feature selection, and multi-cluster feature selection.) These techniques are applied to the concatenated channels corresponding to red, green, and blue (RGB), hue, saturation, value (HSV), and lightness, red/green value, and blue/yellow value (L*a*b) color spaces (9 features in total). Machine learning techniques have been reported for sorting the Cape gooseberry fruits’ ripeness. Classifiers such as neural networks, support vector machines, and nearest neighbors discriminate on fruit samples using different color spaces. Despite the color spaces being equivalent up to a transformation, a few classifiers enable better performances due to differences in the pixel distribution of samples. Experimental results show that selection and combination of color channels allow classifiers to reach similar levels of accuracy; however, combination methods still require higher computational complexity. The highest level of accuracy was obtained using the seven-dimensional principal component analysis feature space.

Download Full-text

Application of a combination between Principal Component Analysis and Logistic Regression Based on Support Vector Machine on Educational Data Mining with Overlapping Data Problem

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/874/1/012018 ◽

2020 ◽

Vol 874 ◽

pp. 012018

Author(s):

Siti Mutrofin ◽

Maisarah Maisarah ◽

Slamet Widodo ◽

Raden Venantius Hari Ginardi ◽

Chastine Fatichah

Keyword(s):

Data Mining ◽

Principal Component Analysis ◽

Support Vector Machine ◽

Logistic Regression ◽

Educational Data Mining ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Overlapping Data ◽

Data Problem

Download Full-text

Exploration of machine learning methods for the classification of infrared limb spectra of polar stratospheric clouds

Atmospheric Measurement Techniques ◽

10.5194/amt-13-3661-2020 ◽

2020 ◽

Vol 13 (7) ◽

pp. 3661-3682

Author(s):

Rocco Sedona ◽

Lars Hoffmann ◽

Reinhold Spang ◽

Gabriele Cavallaro ◽

Sabine Griessbach ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Infrared Spectra ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Polar Stratospheric Clouds ◽

Hemisphere Winter ◽

Stratospheric Clouds ◽

Polar Ozone

Abstract. Polar stratospheric clouds (PSCs) play a key role in polar ozone depletion in the stratosphere. Improved observations and continuous monitoring of PSCs can help to validate and improve chemistry–climate models that are used to predict the evolution of the polar ozone hole. In this paper, we explore the potential of applying machine learning (ML) methods to classify PSC observations of infrared limb sounders. Two datasets were considered in this study. The first dataset is a collection of infrared spectra captured in Northern Hemisphere winter 2006/2007 and Southern Hemisphere winter 2009 by the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) instrument on board the European Space Agency's (ESA) Envisat satellite. The second dataset is the cloud scenario database (CSDB) of simulated MIPAS spectra. We first performed an initial analysis to assess the basic characteristics of the CSDB and to decide which features to extract from it. Here, we focused on an approach using brightness temperature differences (BTDs). From both the measured and the simulated infrared spectra, more than 10 000 BTD features were generated. Next, we assessed the use of ML methods for the reduction of the dimensionality of this large feature space using principal component analysis (PCA) and kernel principal component analysis (KPCA) followed by a classification with the support vector machine (SVM). The random forest (RF) technique, which embeds the feature selection step, has also been used as a classifier. All methods were found to be suitable to retrieve information on the composition of PSCs. Of these, RF seems to be the most promising method, being less prone to overfitting and producing results that agree well with established results based on conventional classification methods.

Download Full-text

Quantification of Tumor Micro-Environment Acidity in Glioblastoma Using Principal Component Analysis of Dynamic Susceptibility Contrast-Enhanced MR Imaging and Machine Learning

10.21203/rs.3.rs-431537/v1 ◽

2021 ◽

Author(s):

Hamed Akbari ◽

Anahita Kazerooni ◽

Jeffery B. Ware ◽

Elizabeth Mamourian ◽

Hannah Anderson ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Dynamic Susceptibility ◽

Dynamic Susceptibility Contrast ◽

Tumor Ph ◽

Contrast Enhanced ◽

Susceptibility Contrast

Abstract Glioblastoma (GBM) has high metabolic demands, which can lead to acidification of the tumor microenvironment. We hypothesize that a machine learning model built on temporal principal component analysis (PCA) of dynamic susceptibility contrast-enhanced (DSC) perfusion MRI can be used to estimate tumor acidity in GBM, as estimated by pH-sensitive amine chemical exchange saturation transfer echo-planar imaging (CEST-EPI). We analyzed 78 MRI scans in 32 treatment naïve and post-treatment GBM patients. All patients were imaged with DSC-MRI, and pH-weighting that was quantified from CEST-EPI estimation of the magnetization transfer ratio asymmetry (MTRasym) at 3 ppm. Enhancing tumor (ET), non-enhancing core (NC), and peritumoral T2 hyperintensity (namely, edema, ED) were used to extract principal components (PCs) and to build support vector machines regression (SVR) models to predict MTRasym values using PCs. Our predicted map correlated with MTRasym values with Spearman’s r equal to 0.66, 0.47, 0.67, 0.71, in NC, ET, ED, and overall, respectively (p<0.006). The results of this study demonstrates that PCA analysis of DSC imaging data can provide information about tumor pH in GBM patients, with the strongest association within the peritumoral regions.

Download Full-text