Multivariate Analysis and Machine Learning for Ripeness Classification of Cape Gooseberry Fruits

This paper explores five multivariate techniques for information fusion on sorting the visual ripeness of Cape gooseberry fruits (principal component analysis, linear discriminant analysis, independent component analysis, eigenvector centrality feature selection, and multi-cluster feature selection.) These techniques are applied to the concatenated channels corresponding to red, green, and blue (RGB), hue, saturation, value (HSV), and lightness, red/green value, and blue/yellow value (L*a*b) color spaces (9 features in total). Machine learning techniques have been reported for sorting the Cape gooseberry fruits’ ripeness. Classifiers such as neural networks, support vector machines, and nearest neighbors discriminate on fruit samples using different color spaces. Despite the color spaces being equivalent up to a transformation, a few classifiers enable better performances due to differences in the pixel distribution of samples. Experimental results show that selection and combination of color channels allow classifiers to reach similar levels of accuracy; however, combination methods still require higher computational complexity. The highest level of accuracy was obtained using the seven-dimensional principal component analysis feature space.

Download Full-text

Multiclass classification of leukemia cancer data using Fuzzy Support Vector Machine (FSVM) with feature selection using Principal Component Analysis (PCA)

Journal of Physics Conference Series ◽

10.1088/1742-6596/1725/1/012012 ◽

2021 ◽

Vol 1725 ◽

pp. 012012

Author(s):

I R Fauzi ◽

Z Rustam ◽

A Wibowo

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Feature Selection ◽

Principal Component ◽

Component Analysis ◽

Multiclass Classification ◽

Support Vector ◽

Fuzzy Support Vector Machine ◽

Cancer Data

Download Full-text

Towards a software defect proneness model: feature selection

Applied Aspects of Information Technology ◽

10.15276/aait.04.2021.5 ◽

2021 ◽

Vol 4 (4) ◽

pp. 354-365

Author(s):

Vitaliy S. Yakovyna ◽

◽

Ivan I. Symets

Keyword(s):

Principal Component Analysis ◽

Feature Selection ◽

Random Forest ◽

Software Reliability ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Tree Classifier ◽

Code Metrics ◽

Software Code

This article is focused on improving static models of software reliability based on using machine learning methods to select the software code metrics that most strongly affect its reliability. The study used a merged dataset from the PROMISE Software Engineering repository, which contained data on testing software modules of five programs and twenty-one code metrics. For the prepared sampling, the most important features that affect the quality of software code have been selected using the following methods of feature selection: Boruta, Stepwise selection, Exhaustive Feature Selection, Random Forest Importance, LightGBM Importance, Genetic Algorithms, Principal Component Analysis, Xverse python. Basing on the voting on the results of the work of the methods of feature selection, a static (deterministic) model of software reliability has been built, which establishes the relationship between the probability of a defect in the software module and the metrics of its code. It has been shown that this model includes such code metrics as branch count of a program, McCabe’s lines of code and cyclomatic complexity, Halstead’s total number of operators and operands, intelligence, volume, and effort value. A comparison of the effectiveness of different methods of feature selection has been put into practice, in particular, a study of the effect of the method of feature selection on the accuracy of classification using the following classifiers: Random Forest, Support Vector Machine, k-Nearest Neighbors, Decision Tree classifier, AdaBoost classifier, Gradient Boosting for classification. It has been shown that the use of any method of feature selection increases the accuracy of classification by at least ten percent compared to the original dataset, which confirms the importance of this procedure for predicting software defects based on metric datasets that contain a significant number of highly correlated software code metrics. It has been found that the best accuracy of the forecast for most classifiers was reached using a set of features obtained from the proposed static model of software reliability. In addition, it has been shown that it is also possible to use separate methods, such as Autoencoder, Exhaustive Feature Selection and Principal Component Analysis with an insignificant loss of classification and prediction accuracy

Download Full-text

Exploration of machine learning methods for the classification of infrared limb spectra of polar stratospheric clouds

Atmospheric Measurement Techniques ◽

10.5194/amt-13-3661-2020 ◽

2020 ◽

Vol 13 (7) ◽

pp. 3661-3682

Author(s):

Rocco Sedona ◽

Lars Hoffmann ◽

Reinhold Spang ◽

Gabriele Cavallaro ◽

Sabine Griessbach ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Infrared Spectra ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Polar Stratospheric Clouds ◽

Hemisphere Winter ◽

Stratospheric Clouds ◽

Polar Ozone

Abstract. Polar stratospheric clouds (PSCs) play a key role in polar ozone depletion in the stratosphere. Improved observations and continuous monitoring of PSCs can help to validate and improve chemistry–climate models that are used to predict the evolution of the polar ozone hole. In this paper, we explore the potential of applying machine learning (ML) methods to classify PSC observations of infrared limb sounders. Two datasets were considered in this study. The first dataset is a collection of infrared spectra captured in Northern Hemisphere winter 2006/2007 and Southern Hemisphere winter 2009 by the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) instrument on board the European Space Agency's (ESA) Envisat satellite. The second dataset is the cloud scenario database (CSDB) of simulated MIPAS spectra. We first performed an initial analysis to assess the basic characteristics of the CSDB and to decide which features to extract from it. Here, we focused on an approach using brightness temperature differences (BTDs). From both the measured and the simulated infrared spectra, more than 10 000 BTD features were generated. Next, we assessed the use of ML methods for the reduction of the dimensionality of this large feature space using principal component analysis (PCA) and kernel principal component analysis (KPCA) followed by a classification with the support vector machine (SVM). The random forest (RF) technique, which embeds the feature selection step, has also been used as a classifier. All methods were found to be suitable to retrieve information on the composition of PSCs. Of these, RF seems to be the most promising method, being less prone to overfitting and producing results that agree well with established results based on conventional classification methods.

Download Full-text

Quantification of Tumor Micro-Environment Acidity in Glioblastoma Using Principal Component Analysis of Dynamic Susceptibility Contrast-Enhanced MR Imaging and Machine Learning

10.21203/rs.3.rs-431537/v1 ◽

2021 ◽

Author(s):

Hamed Akbari ◽

Anahita Kazerooni ◽

Jeffery B. Ware ◽

Elizabeth Mamourian ◽

Hannah Anderson ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Dynamic Susceptibility ◽

Dynamic Susceptibility Contrast ◽

Tumor Ph ◽

Contrast Enhanced ◽

Susceptibility Contrast

Abstract Glioblastoma (GBM) has high metabolic demands, which can lead to acidification of the tumor microenvironment. We hypothesize that a machine learning model built on temporal principal component analysis (PCA) of dynamic susceptibility contrast-enhanced (DSC) perfusion MRI can be used to estimate tumor acidity in GBM, as estimated by pH-sensitive amine chemical exchange saturation transfer echo-planar imaging (CEST-EPI). We analyzed 78 MRI scans in 32 treatment naïve and post-treatment GBM patients. All patients were imaged with DSC-MRI, and pH-weighting that was quantified from CEST-EPI estimation of the magnetization transfer ratio asymmetry (MTRasym) at 3 ppm. Enhancing tumor (ET), non-enhancing core (NC), and peritumoral T2 hyperintensity (namely, edema, ED) were used to extract principal components (PCs) and to build support vector machines regression (SVR) models to predict MTRasym values using PCs. Our predicted map correlated with MTRasym values with Spearman’s r equal to 0.66, 0.47, 0.67, 0.71, in NC, ET, ED, and overall, respectively (p<0.006). The results of this study demonstrates that PCA analysis of DSC imaging data can provide information about tumor pH in GBM patients, with the strongest association within the peritumoral regions.

Download Full-text

Kombinasi Feature Selection Fisher Score dan Principal Component Analysis (PCA) untuk Klasifikasi Cervix Dysplasia

Jurnal Teknologi Informasi dan Ilmu Komputer ◽

10.25126/jtiik.2020702987 ◽

2020 ◽

Vol 7 (3) ◽

pp. 565

Author(s):

Krisan Aprian Widagdo ◽

Kusworo Adi ◽

Rahmat Gernowo

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Feature Selection ◽

Cross Validation ◽

Pap Smear ◽

Principal Component ◽

Component Analysis ◽

Fisher Score ◽

Fold Cross Validation ◽

Cervix Dysplasia

Pengamatan citra Pap Smear merupakan langkah yang sangat penting dalam mendiagnosis awal terhadap gangguan servik. Pengamatan tersebut membutuhkan sumber daya yang besar. Dalam hal ini machine learning dapat mengatasi masalah tersebut. Akan tetapi, keakuratan machine learning bergantung pada fitur yang digunakan. Hanya fitur relevan dan diskriminatif yang mampu memberikan hasil klasifikasi akurat. Pada penelitian ini menggabungkan Fisher Score dan Principal Component Analysis (PCA). Pertama Fisher Score memilih fitur relevan berdasarkan perangkingan. Langkah selanjutnya PCA mentransformasikan kandidat fitur menjadi dataset baru yang tidak saling berkorelasi. Metode jaringan syaraf tiruan Backpropagation digunakan untuk mengevaluasi performa kombinasi Fisher Score dan PCA. Model dievaluasi dengan metode 5 fold cross validation. Selain itu kombinasi ini dibandingkan dengan model fitur asli dan model fitur hasil Fscore. Hasil percobaan menunjukkan kombinasi fisher score dan PCA menghasilkan performa terbaik (akurasi 0.964±0.006, Sensitivity 0.990±0.005 dan Specificity 0.889±0.009). Dari segi waktu komputasi, kombinasi Fisher Score dan PCA membutuhkan waktu relative cepat. Penelitian ini membuktikan bahwa penggunaan feature selection dan feature extraction mampu meningkatkan kinerja klasifikasi dengan waktu yang relative singkat. Abstract Examination Pap Smear images is an important step to early diagnose cervix dysplasia. It needs a lot of resources. In this case, Machine Learning can solve this problem. However, Machine learning depends on the features used. Only relevant and discriminant features can provide an accurate classification result. In this work, combining feature selection Fisher Score (FScore) and Principal Component Analysis (PCA) is applied. First, FScore selects relevant features based on rangking score. And then PCA transforms candidate features into a new uncorrelated dataset. Artificial Neural Network Backpropagation used to evaluate performance combination FScore PCA. The model evaluated with 5 fold cross validation. The other hand, this combination compared with original features model and FScore model. Experimental result shows the combination of Fscore PCA produced the best performance (Accuracy 0.964±0.006, Sensitivity 0.990±0.005 and Specificity 0.889±0.009). In term of computational time, this combination needed a reasonable time. In this work, it was proved that applying feature selection and feature extraction could improve performance classification with a promising time.

Download Full-text

Detection of Knee Joint Disorders using SVM Classifier

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst218535 ◽

2021 ◽

pp. 261-271

Author(s):

Alphonsa Salu S. J. ◽

Jeraldin Auxillia D

Keyword(s):

Principal Component Analysis ◽

Feature Selection ◽

Knee Joint ◽

Principal Component ◽

Approximate Entropy ◽

Component Analysis ◽

Invasive Technique ◽

Support Vector ◽

Svm Classifier ◽

Entropy Measures

A non-invasive technique using knee joint vibroarthographic (VAG) signals can be used for the early diagnosis of knee joint disorders. Among the algorithms devised for the detection of knee joint disorders using VAG signals, algorithms based on entropy measures can provide better performance. In this work, the VAG signal is preprocessed using wavelet decomposition into sub band signals. Features of the decomposed sub bands such as approximate entropy, sample entropy & wavelet energy are extracted as a quantified measure of complexity of the signal. A feature selection based on Principal Component Analysis (PCA) is performed in order to select the significant features. The extracted features are then used for classification of VAG signal into normal and abnormal VAG using support vector machine. It is observed that the classifier provides a better accuracy with feature selection using principal component analysis. And the results show that the classifier was able to classify the signal with an accuracy of 82.6%, error rate of 0.174, sensitivity of 1.0 and specificity of 0.888.

Download Full-text

Exploration of machine learning methods for the classification of infrared limb spectra of polar stratospheric clouds

10.5194/amt-2019-481 ◽

2020 ◽

Author(s):

Rocco Sedona ◽

Lars Hoffmann ◽

Reinhold Spang ◽

Gabriele Cavallaro ◽

Sabine Griessbach ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Infrared Spectra ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Polar Stratospheric Clouds ◽

Hemisphere Winter ◽

Stratospheric Clouds ◽

Polar Ozone

Abstract. Polar stratospheric clouds (PSC) play a key role in polar ozone depletion in the stratosphere. Improved observations and continuous monitoring of PSCs can help to validate and enhance chemistry-climate models that are used to predict the evolution of the polar ozone hole. In this paper, we explore the potential of applying machine learning (ML) methods to classify PSC observations of infrared limb sounders. Two datasets have been considered in this study. The first dataset is a collection of infrared spectra captured in Northern Hemisphere winter 2006/2007 and Southern Hemisphere winter 2009 by the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) instrument onboard ESA's Envisat satellite. The second dataset is the cloud scenario database (CSDB) of simulated MIPAS spectra. We first performed an initial analysis to assess the basic characteristics of these datasets and to decide which features to extract from them. Here, we focused on an approach using brightness temperature differences (BTDs). From the both, the measured and the simulated infrared spectra, more than 10,000 BTD features have been generated. Next, we assessed the use of ML methods for the reduction of the dimensionality of this large feature space using principal component analysis (PCA) and kernel principal component analysis (KPCA) as well as the classification with the random forest (RF) and support vector machine (SVM) techniques. All methods were found to be suitable to retrieve information on the composition of PSCs. Of these, RF seems to be the most promising method, being less prone to overfitting and producing results that agree well with established results based on conventional classification methods.

Download Full-text

TEKNIK DATA MINING UNTUK MENGKLASIFIKASIKAN DATA ULASAN DESTINASI WISATA MENGGUNAKAN REDUKSI DATA PRINCIPAL COMPONENT ANALYSIS (PCA)

Edutic - Scientific Journal of Informatics Education ◽

10.21107/edutic.v7i2.9247 ◽

2021 ◽

Vol 7 (2) ◽

Author(s):

Alven Safik Ritonga ◽

Isnaini Muhandhis

Keyword(s):

Machine Learning ◽

Data Mining ◽

Principal Component Analysis ◽

Support Vector Machine ◽

Decision Trees ◽

Principal Component ◽

Component Analysis ◽

Support Vector

Peningkatan kunjungan wisatawan ke suatu destinasi wisata, dipengaruhi oleh kepuasan wisatawan waktu berkunjung. Untuk mengetahui suatu destinasi pariwisata sudah sesuai dengan yang diharapkan wisatawan, perlu dilakukan evaluasi terhadap kepuasan wisatawan. Tujuan penelitian ini adalah mendapatkan model klasifikasi yang mempunyai akurasi tinggi dalam melakukan klasifikasi ulasan kepuasan destinasi wisata dan menghasilkan alat bantu untuk pengambilan keputusan dalam pengembagan destinasi wisata. Data yang dipakai pada penelitian ini dimensinya cukup besar, hal ini nantinya membuat waktu komputasi untuk pengklasifikasian makin lama, membuat analisis tidak praktis atau tidak layak, maka reduksi dimensi data diterapkan pada penelitian ini untuk mendapatkan dimensi data yang jauh lebih kecil, namun tetap mempertahankan integritas data asli. Metode yang digunakan untuk pengklasifikasian ulasan kepuasan destinasi wisata adalah kombinasi antara metode Principal Component Analysis (PCA) sebagai metode reduksi dimensi data, dengan tiga metode data mining berikut ini; Support Vector Machine (SVM), Jaringan Saraf Tiruan (JST), dan Decision Trees. Penelitian ini menggunakan data kedua yang diambil dari UCI Machine Learning Repository. Hasil penelitian dengan mengkombinasikan PCA pada ketiga metode memperlihatkan bahwa akurasi klasifikasi lebih baik untuk beberapa metode. Dari ketiga metode yang dipakai, SVM-PCA mempunyai akurasi yang lebih baik dengan 91,50% disusul oleh metode ANN-PCA sebesar 89,46% dan metode Decision-PCA sebesar 88,78%.

Download Full-text

Comparative Analysis of Machine Learning Techniques with Principal Component Analysis on Kidney and Heart Disease

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.1433 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1564-1572

Author(s):

Reena Chandra, Et. al.

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Principal Component Analysis ◽

Logistic Regression ◽

Heart Disease ◽

Kidney Disease ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Support Vector

Detection of disease at earlier stages is the most challenging one. Datasets of different diseases are available online with different number of features corresponding to a particular disease. Many dimensionality reduction and feature extraction techniques are used nowadays to reduce the number of features in dataset and finding the most appropriate ones. This paper explores the difference in performance of different machine learning models using Principal Component Analysis dimensionality reduction technique on the datasets of Chronic kidney disease and Cardiovascular disease. Further, the authors apply Logistic Regression, K Nearest Neighbour, Naïve Bayes, Support Vector Machine and Random Forest Model on the datasets and compare the performance of the model with and without PCA. A key challenge in the field of data mining and machine learning is building accurate and computationally efficient classifiers for medical applications. With an accuracy of 100% in chronic kidney disease and 85% for heart disease, KNN classifier and logistic regression were revealed to be the most optimal method of predictions for kidney and heart disease respectively.

Download Full-text

Using machine learning method to classify polar stratospheric cloud types from Envisat MIPAS observations

10.5194/egusphere-egu2020-8103 ◽

2020 ◽

Author(s):

Rocco Sedona ◽

Lars Hoffmann ◽

Reinhold Spang ◽

Gabriele Cavallaro ◽

Sabine Griessbach ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Infrared Spectra ◽

Michelson Interferometer ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Support Vector ◽

Hemisphere Winter ◽

Polar Ozone

Polar stratospheric clouds (PSC) play a key role in polar ozone depletion in the stratosphere. Improved observations and continuous monitoring of PSCs can help to validate and enhance chemistry-climate models that are used to predict the evolution of the polar ozone hole. Here we present the results of our study in which we explored the potential of applying machine learning (ML) methods to classify PSC observations of infrared limb sounders. Two datasets have been considered. The first dataset is a collection of infrared spectra captured in Northern Hemisphere winter 2006/2007 and Southern Hemisphere winter 2009 by the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) instrument onboard ESA's Envisat satellite. The second dataset is the cloud scenario database (CSDB) of simulated MIPAS spectra. We first performed an initial analysis to assess the basic characteristics of these datasets and to decide which features to extract from them. More than 10,000 Brightness temperature differences (BTDs) features have been generated and fed as input to the ML methods instead of directly using the infrared spectra. Next, we assessed the use of ML methods for the reduction of the dimensionality of this large feature space using principal component analysis (PCA) and kernel principal component analysis (KPCA) as well as the classification with the random forest (RF) and support vector machine (SVM) techniques. All methods were found to be suitable to retrieve information on the composition of PSCs. Of these, RF seems to be the most promising method, being less prone to overfitting and producing results that agree well with established results based on conventional classification methods.

Download Full-text