Exploration of machine learning methods for the classification of infrared limb spectra of polar stratospheric clouds

Abstract. Polar stratospheric clouds (PSC) play a key role in polar ozone depletion in the stratosphere. Improved observations and continuous monitoring of PSCs can help to validate and enhance chemistry-climate models that are used to predict the evolution of the polar ozone hole. In this paper, we explore the potential of applying machine learning (ML) methods to classify PSC observations of infrared limb sounders. Two datasets have been considered in this study. The first dataset is a collection of infrared spectra captured in Northern Hemisphere winter 2006/2007 and Southern Hemisphere winter 2009 by the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) instrument onboard ESA's Envisat satellite. The second dataset is the cloud scenario database (CSDB) of simulated MIPAS spectra. We first performed an initial analysis to assess the basic characteristics of these datasets and to decide which features to extract from them. Here, we focused on an approach using brightness temperature differences (BTDs). From the both, the measured and the simulated infrared spectra, more than 10,000 BTD features have been generated. Next, we assessed the use of ML methods for the reduction of the dimensionality of this large feature space using principal component analysis (PCA) and kernel principal component analysis (KPCA) as well as the classification with the random forest (RF) and support vector machine (SVM) techniques. All methods were found to be suitable to retrieve information on the composition of PSCs. Of these, RF seems to be the most promising method, being less prone to overfitting and producing results that agree well with established results based on conventional classification methods.

Download Full-text

Exploration of machine learning methods for the classification of infrared limb spectra of polar stratospheric clouds

Atmospheric Measurement Techniques ◽

10.5194/amt-13-3661-2020 ◽

2020 ◽

Vol 13 (7) ◽

pp. 3661-3682

Author(s):

Rocco Sedona ◽

Lars Hoffmann ◽

Reinhold Spang ◽

Gabriele Cavallaro ◽

Sabine Griessbach ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Infrared Spectra ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Polar Stratospheric Clouds ◽

Hemisphere Winter ◽

Stratospheric Clouds ◽

Polar Ozone

Abstract. Polar stratospheric clouds (PSCs) play a key role in polar ozone depletion in the stratosphere. Improved observations and continuous monitoring of PSCs can help to validate and improve chemistry–climate models that are used to predict the evolution of the polar ozone hole. In this paper, we explore the potential of applying machine learning (ML) methods to classify PSC observations of infrared limb sounders. Two datasets were considered in this study. The first dataset is a collection of infrared spectra captured in Northern Hemisphere winter 2006/2007 and Southern Hemisphere winter 2009 by the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) instrument on board the European Space Agency's (ESA) Envisat satellite. The second dataset is the cloud scenario database (CSDB) of simulated MIPAS spectra. We first performed an initial analysis to assess the basic characteristics of the CSDB and to decide which features to extract from it. Here, we focused on an approach using brightness temperature differences (BTDs). From both the measured and the simulated infrared spectra, more than 10 000 BTD features were generated. Next, we assessed the use of ML methods for the reduction of the dimensionality of this large feature space using principal component analysis (PCA) and kernel principal component analysis (KPCA) followed by a classification with the support vector machine (SVM). The random forest (RF) technique, which embeds the feature selection step, has also been used as a classifier. All methods were found to be suitable to retrieve information on the composition of PSCs. Of these, RF seems to be the most promising method, being less prone to overfitting and producing results that agree well with established results based on conventional classification methods.

Download Full-text

Using machine learning method to classify polar stratospheric cloud types from Envisat MIPAS observations

10.5194/egusphere-egu2020-8103 ◽

2020 ◽

Author(s):

Rocco Sedona ◽

Lars Hoffmann ◽

Reinhold Spang ◽

Gabriele Cavallaro ◽

Sabine Griessbach ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Infrared Spectra ◽

Michelson Interferometer ◽

Principal Component ◽

Component Analysis ◽

Kernel Principal Component Analysis ◽

Support Vector ◽

Hemisphere Winter ◽

Polar Ozone

<p>Polar stratospheric clouds (PSC) play a key role in polar ozone depletion in the stratosphere. Improved observations and continuous monitoring of PSCs can help to validate and enhance chemistry-climate models that are used to predict the evolution of the polar ozone hole. Here we present the results of our study in which we explored the potential of applying machine learning (ML) methods to classify PSC observations of infrared limb sounders. Two datasets have been considered. The first dataset is a collection of infrared spectra captured in Northern Hemisphere winter 2006/2007 and Southern Hemisphere winter 2009 by the Michelson Interferometer for Passive Atmospheric Sounding (MIPAS) instrument onboard ESA's Envisat satellite. The second dataset is the cloud scenario database (CSDB) of simulated MIPAS spectra. We first performed an initial analysis to assess the basic characteristics of these datasets and to decide which features to extract from them. More than 10,000 Brightness temperature differences (BTDs) features have been generated and fed as input to the ML methods instead of directly using the infrared spectra. Next, we assessed the use of ML methods for the reduction of the dimensionality of this large feature space using principal component analysis (PCA) and kernel principal component analysis (KPCA) as well as the classification with the random forest (RF) and support vector machine (SVM) techniques. All methods were found to be suitable to retrieve information on the composition of PSCs. Of these, RF seems to be the most promising method, being less prone to overfitting and producing results that agree well with established results based on conventional classification methods.</p>

Download Full-text

Multivariate Analysis and Machine Learning for Ripeness Classification of Cape Gooseberry Fruits

Processes ◽

10.3390/pr7120928 ◽

2019 ◽

Vol 7 (12) ◽

pp. 928 ◽

Cited By ~ 2

Author(s):

Miguel De-la-Torre ◽

Omar Zatarain ◽

Himer Avila-George ◽

Mirna Muñoz ◽

Jimy Oblitas ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Feature Selection ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Color Spaces ◽

Combination Methods ◽

Fruit Samples ◽

Cape Gooseberry

This paper explores five multivariate techniques for information fusion on sorting the visual ripeness of Cape gooseberry fruits (principal component analysis, linear discriminant analysis, independent component analysis, eigenvector centrality feature selection, and multi-cluster feature selection.) These techniques are applied to the concatenated channels corresponding to red, green, and blue (RGB), hue, saturation, value (HSV), and lightness, red/green value, and blue/yellow value (L*a*b) color spaces (9 features in total). Machine learning techniques have been reported for sorting the Cape gooseberry fruits’ ripeness. Classifiers such as neural networks, support vector machines, and nearest neighbors discriminate on fruit samples using different color spaces. Despite the color spaces being equivalent up to a transformation, a few classifiers enable better performances due to differences in the pixel distribution of samples. Experimental results show that selection and combination of color channels allow classifiers to reach similar levels of accuracy; however, combination methods still require higher computational complexity. The highest level of accuracy was obtained using the seven-dimensional principal component analysis feature space.

Download Full-text

Quantification of Tumor Micro-Environment Acidity in Glioblastoma Using Principal Component Analysis of Dynamic Susceptibility Contrast-Enhanced MR Imaging and Machine Learning

10.21203/rs.3.rs-431537/v1 ◽

2021 ◽

Author(s):

Hamed Akbari ◽

Anahita Kazerooni ◽

Jeffery B. Ware ◽

Elizabeth Mamourian ◽

Hannah Anderson ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Dynamic Susceptibility ◽

Dynamic Susceptibility Contrast ◽

Tumor Ph ◽

Contrast Enhanced ◽

Susceptibility Contrast

Abstract Glioblastoma (GBM) has high metabolic demands, which can lead to acidification of the tumor microenvironment. We hypothesize that a machine learning model built on temporal principal component analysis (PCA) of dynamic susceptibility contrast-enhanced (DSC) perfusion MRI can be used to estimate tumor acidity in GBM, as estimated by pH-sensitive amine chemical exchange saturation transfer echo-planar imaging (CEST-EPI). We analyzed 78 MRI scans in 32 treatment naïve and post-treatment GBM patients. All patients were imaged with DSC-MRI, and pH-weighting that was quantified from CEST-EPI estimation of the magnetization transfer ratio asymmetry (MTRasym) at 3 ppm. Enhancing tumor (ET), non-enhancing core (NC), and peritumoral T2 hyperintensity (namely, edema, ED) were used to extract principal components (PCs) and to build support vector machines regression (SVR) models to predict MTRasym values using PCs. Our predicted map correlated with MTRasym values with Spearman’s r equal to 0.66, 0.47, 0.67, 0.71, in NC, ET, ED, and overall, respectively (p<0.006). The results of this study demonstrates that PCA analysis of DSC imaging data can provide information about tumor pH in GBM patients, with the strongest association within the peritumoral regions.

Download Full-text

TEKNIK DATA MINING UNTUK MENGKLASIFIKASIKAN DATA ULASAN DESTINASI WISATA MENGGUNAKAN REDUKSI DATA PRINCIPAL COMPONENT ANALYSIS (PCA)

Edutic - Scientific Journal of Informatics Education ◽

10.21107/edutic.v7i2.9247 ◽

2021 ◽

Vol 7 (2) ◽

Author(s):

Alven Safik Ritonga ◽

Isnaini Muhandhis

Keyword(s):

Machine Learning ◽

Data Mining ◽

Principal Component Analysis ◽

Support Vector Machine ◽

Decision Trees ◽

Principal Component ◽

Component Analysis ◽

Support Vector

Peningkatan kunjungan wisatawan ke suatu destinasi wisata, dipengaruhi oleh kepuasan wisatawan waktu berkunjung. Untuk mengetahui suatu destinasi pariwisata sudah sesuai dengan yang diharapkan wisatawan, perlu dilakukan evaluasi terhadap kepuasan wisatawan. Tujuan penelitian ini adalah mendapatkan model klasifikasi yang mempunyai akurasi tinggi dalam melakukan klasifikasi ulasan kepuasan destinasi wisata dan menghasilkan alat bantu untuk pengambilan keputusan dalam pengembagan destinasi wisata. Data yang dipakai pada penelitian ini dimensinya cukup besar, hal ini nantinya membuat waktu komputasi untuk pengklasifikasian makin lama, membuat analisis tidak praktis atau tidak layak, maka reduksi dimensi data diterapkan pada penelitian ini untuk mendapatkan dimensi data yang jauh lebih kecil, namun tetap mempertahankan integritas data asli. Metode yang digunakan untuk pengklasifikasian ulasan kepuasan destinasi wisata adalah kombinasi antara metode Principal Component Analysis (PCA) sebagai metode reduksi dimensi data, dengan tiga metode data mining berikut ini; Support Vector Machine (SVM), Jaringan Saraf Tiruan (JST), dan Decision Trees. Penelitian ini menggunakan data kedua yang diambil dari UCI Machine Learning Repository. Hasil penelitian dengan mengkombinasikan PCA pada ketiga metode memperlihatkan bahwa akurasi klasifikasi lebih baik untuk beberapa metode. Dari ketiga metode yang dipakai, SVM-PCA mempunyai akurasi yang lebih baik dengan 91,50% disusul oleh metode ANN-PCA sebesar 89,46% dan metode Decision-PCA sebesar 88,78%.

Download Full-text

Comparative Analysis of Machine Learning Techniques with Principal Component Analysis on Kidney and Heart Disease

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.1433 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1564-1572

Author(s):

Reena Chandra, Et. al.

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Principal Component Analysis ◽

Logistic Regression ◽

Heart Disease ◽

Kidney Disease ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Support Vector

Detection of disease at earlier stages is the most challenging one. Datasets of different diseases are available online with different number of features corresponding to a particular disease. Many dimensionality reduction and feature extraction techniques are used nowadays to reduce the number of features in dataset and finding the most appropriate ones. This paper explores the difference in performance of different machine learning models using Principal Component Analysis dimensionality reduction technique on the datasets of Chronic kidney disease and Cardiovascular disease. Further, the authors apply Logistic Regression, K Nearest Neighbour, Naïve Bayes, Support Vector Machine and Random Forest Model on the datasets and compare the performance of the model with and without PCA. A key challenge in the field of data mining and machine learning is building accurate and computationally efficient classifiers for medical applications. With an accuracy of 100% in chronic kidney disease and 85% for heart disease, KNN classifier and logistic regression were revealed to be the most optimal method of predictions for kidney and heart disease respectively.

Download Full-text

Classification of Observations through Combination of the Dimension Reduction and the Cluster Analysis

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.13 ◽

2017 ◽

Vol 7 (8) ◽

pp. 30

Author(s):

Hyeuk Kim

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Unsupervised Learning ◽

Principal Component ◽

Component Analysis ◽

Baseball Players ◽

Partitioning Around Medoids ◽

Different Characteristics

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.

Download Full-text

Analysis of the Bath Motion in the MM-SQC Dynamics Using Unsupervised Machine Learning Dimensionality Reduction Approaches: Principal Component Analysis

10.26434/chemrxiv.13332530 ◽

2020 ◽

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Collective Motion ◽

Principal Component ◽

Component Analysis ◽

Nonadiabatic Dynamics ◽

Trajectory Data ◽

Unsupervised Machine Learning ◽

Physical Knowledge ◽

Vibronic Couplings

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.

Download Full-text

Longitudinal Crack Detection Approach Based on Principal Component Analysis and Support Vector Machine for Slab Continuous Casting

steel research international ◽

10.1002/srin.202100168 ◽

2021 ◽

Author(s):

Haiyang Duan ◽

Jingjing Wei ◽

Lin Qi ◽

Xudong Wang ◽

Yu Liu ◽

...

Keyword(s):

Principal Component Analysis ◽

Support Vector Machine ◽

Continuous Casting ◽

Crack Detection ◽

Longitudinal Crack ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Slab Continuous Casting ◽

Detection Approach

Download Full-text

Prediction of China’s Energy Consumption Based on Robust Principal Component Analysis and PSO-LSSVM Optimized by the Tabu Search Algorithm

Energies ◽

10.3390/en12010196 ◽

2019 ◽

Vol 12 (1) ◽

pp. 196 ◽

Cited By ~ 3

Author(s):

Lihui Zhang ◽

Riletu Ge ◽

Jianxue Chai

Keyword(s):

Principal Component Analysis ◽

Energy Consumption ◽

Tabu Search ◽

Industrial Structure ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Forecasting Model ◽

Robust Principal Component Analysis ◽

Consumption Structure

China’s energy consumption issues are closely associated with global climate issues, and the scale of energy consumption, peak energy consumption, and consumption investment are all the focus of national attention. In order to forecast the amount of energy consumption of China accurately, this article selected GDP, population, industrial structure and energy consumption structure, energy intensity, total imports and exports, fixed asset investment, energy efficiency, urbanization, the level of consumption, and fixed investment in the energy industry as a preliminary set of factors; Secondly, we corrected the traditional principal component analysis (PCA) algorithm from the perspective of eliminating “bad points” and then judged a “bad spot” sample based on signal reconstruction ideas. Based on the above content, we put forward a robust principal component analysis (RPCA) algorithm and chose the first five principal components as main factors affecting energy consumption, including: GDP, population, industrial structure and energy consumption structure, urbanization; Then, we applied the Tabu search (TS) algorithm to the least square to support vector machine (LSSVM) optimized by the particle swarm optimization (PSO) algorithm to forecast China’s energy consumption. We collected data from 1996 to 2010 as a training set and from 2010 to 2016 as the test set. For easy comparison, the sample data was input into the LSSVM algorithm and the PSO-LSSVM algorithm at the same time. We used statistical indicators including goodness of fit determination coefficient (R2), the root means square error (RMSE), and the mean radial error (MRE) to compare the training results of the three forecasting models, which demonstrated that the proposed TS-PSO-LSSVM forecasting model had higher prediction accuracy, generalization ability, and higher training speed. Finally, the TS-PSO-LSSVM forecasting model was applied to forecast the energy consumption of China from 2017 to 2030. According to predictions, we found that China shows a gradual increase in energy consumption trends from 2017 to 2030 and will breakthrough 6000 million tons in 2030. However, the growth rate is gradually tightening and China’s energy consumption economy will transfer to a state of diminishing returns around 2026, which guides China to put more emphasis on the field of energy investment.

Download Full-text