A Strategy for Dimensionality Reduction and Data Analysis Applied to Microstructure–Property Relationships of Nanoporous Metals

Nanoporous metals, with their complex microstructure, represent an ideal candidate for the development of methods that combine physics, data, and machine learning. The preparation of nanporous metals via dealloying allows for tuning of the microstructure and macroscopic mechanical properties within a large design space, dependent on the chosen dealloying conditions. Specifically, it is possible to define the solid fraction, ligament size, and connectivity density within a large range. These microstructural parameters have a large impact on the macroscopic mechanical behavior. This makes this class of materials an ideal science case for the development of strategies for dimensionality reduction, supporting the analysis and visualization of the underlying structure–property relationships. Efficient finite element beam modeling techniques were used to generate ~200 data sets for macroscopic compression and nanoindentation of open pore nanofoams. A strategy consisting of dimensional analysis, principal component analysis, and machine learning allowed for data mining of the microstructure–property relationships. It turned out that the scaling law of the work hardening rate has the same exponent as the Young’s modulus. Simple linear relationships are derived for the normalized work hardening rate and hardness. The hardness to yield stress ratio is not limited to 1, as commonly assumed for foams, but spreads over a large range of values from 0.5 to 3.

Download Full-text

Early Detection of Diabetic Retinopathy Using PCA-Firefly Based Deep Learning Model

Electronics ◽

10.3390/electronics9020274 ◽

2020 ◽

Vol 9 (2) ◽

pp. 274 ◽

Cited By ~ 30

Author(s):

Thippa Reddy Gadekallu ◽

Neelu Khare ◽

Sweta Bhattacharya ◽

Saurabh Singh ◽

Praveen Kumar Reddy Maddikunta ◽

...

Keyword(s):

Machine Learning ◽

Diabetic Retinopathy ◽

Deep Learning ◽

Early Detection ◽

Dimensionality Reduction ◽

Vision Loss ◽

Principal Component ◽

Screening Methods ◽

Learning Approaches ◽

Time Period

Diabetic Retinopathy is a major cause of vision loss and blindness affecting millions of people across the globe. Although there are established screening methods - fluorescein angiography and optical coherence tomography for detection of the disease but in majority of the cases, the patients remain ignorant and fail to undertake such tests at an appropriate time. The early detection of the disease plays an extremely important role in preventing vision loss which is the consequence of diabetes mellitus remaining untreated among patients for a prolonged time period. Various machine learning and deep learning approaches have been implemented on diabetic retinopathy dataset for classification and prediction of the disease but majority of them have neglected the aspect of data pre-processing and dimensionality reduction, leading to biased results. The dataset used in the present study is a diabetes retinopathy dataset collected from the UCI machine learning repository. At its inceptions, the raw dataset is normalized using the Standardscalar technique and then Principal Component Analysis (PCA) is used to extract the most significant features in the dataset. Further, Firefly algorithm is implemented for dimensionality reduction. This reduced dataset is fed into a Deep Neural Network Model for classification. The results generated from the model is evaluated against the prevalent machine learning models and the results justify the superiority of the proposed model in terms of Accuracy, Precision, Recall, Sensitivity and Specificity.

Download Full-text

Feature Snatching and Performance Assessment for Connoting the Admittance Likelihood of student using Principal Component Reduction

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2286.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 4800-4807

Keyword(s):

Higher Education ◽

Machine Learning ◽

Dimensionality Reduction ◽

Admission Rate ◽

Principal Component ◽

Feature Reduction ◽

Experimental Result ◽

Sparse Pca ◽

Reduction Methods ◽

Incremental Pca

Recently, engineers are concentrating on designing an effective prediction model for finding the rate of student admission in order to raise the educational growth of the nation. The method to predict the student admission towards the higher education is a challenging task for any educational organization. There is a high visibility of crisis towards admission in the higher education. The admission rate of the student is the major risk to the educational society in the world. The student admission greatly affects the economic, social, academic, profit and cultural growth of the nation. The student admission rate also depends on the admission procedures and policies of the educational institutions. The chance of student admission also depends on the feedback given by all the stake holders of the educational sectors. The forecasting of the student admission is a major task for any educational institution to protect the profit and wealth of the organization. This paper attempts to analyze the performance of the student admission prediction by using machine learning dimensionality reduction algorithms. The Admission Predict dataset from Kaggle machine learning Repository is used for prediction analysis and the features are reduced by feature reduction methods. The prediction of the chance of Admit is achieved in four ways. Firstly, the correlation between each of the dataset attributes are found and depicted as a histogram. Secondly, the top most high correlated features are identified which are directly contributing to the prediction of chance of admit. Thirdly, the Admission Predict dataset is subjected to dimensionality reduction methods like principal component analysis (PCA), Sparse PCA, Incremental PCA , Kernel PCA and Mini Batch Sparse PCA. Fourth, the optimized dimensionality reduced dataset is then executed to analyze and compare the mean squared error, Mean Absolute Error and R2 Score of each method. The implementation is done by python in Anaconda Spyder Navigator Integrated Development Environment. Experimental Result shows that the CGPA, GRE Score and TOEFL Score are highly correlated features in predicting the chance of admit. The execution of performance analysis shows that Incremental PCA have achieved the effective prediction of chance of admit with minimum MSE of 0.09, MAE of 0.24 and reasonable R2 Score of 0.26.

Download Full-text

Info_PCA: A Hybrid Technique to Improve Accuracy by Dimensionality Reduction

INFORMATION TECHNOLOGY IN INDUSTRY ◽

10.17762/itii.v9i2.370 ◽

2021 ◽

Vol 9 (2) ◽

pp. 458-466

Author(s):

Surabhi Lingwal, Et. al.

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Dimensionality Reduction ◽

Shannon Entropy ◽

Information Gain ◽

Principal Component ◽

Hybrid Technique ◽

Significant Information ◽

Feature Vectors ◽

Feature Extraction And Selection

Principal Component Analysis and Shannon Entropy are some of the most widely used methods for feature extraction and selection. PCA reduces the data to a new subspace with low dimensions by calculating the eigenvectors from eigenvalues out of a covariance matrix and thereby reduces the features to a smaller number capturing the significant information. Shannon entropy is based on probability distribution to calculate the significant information content. Information gain shows the importance of a given attribute in the set of feature vectors. The paper has introduced a hybrid technique Info_PCA which captures the properties of Information gain and PCA that overall reduces the dimensionality and thereby increases the accuracy of the machine learning technique. It also demonstrates the individual implementation of Information gain for feature selection and PCA for dimensionality reduction on two different datasets collected from the UCI machine learning repository. One of the major aims is to determine the important attributes in a given set of training feature vectors to differentiate the classes. The paper has shown a comparative analysis on the classification accuracy obtained by the application of Information Gain, PCA and Info_PCA applied individually on the two different datasets for feature extraction followed by ANN classifier where the results of hybrid technique Info_PCA achieves maximum accuracy and minimum loss in comparison to other feature extraction techniques.

Download Full-text

Identifying Explosive Epidemiological Cases with Unsupervised Machine Learning (Preprint)

10.2196/preprints.20842 ◽

2020 ◽

Author(s):

Serge Dolgikh

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Dimensionality Reduction ◽

Local Time ◽

Principal Component ◽

Learning Methods ◽

Unsupervised Machine Learning ◽

Modeling Tool ◽

Machine Learning Methods ◽

Preventative Measures

UNSTRUCTURED An analysis of a combined dataset of Wave 1 and 2 cases, aligned at approximately Local Time Zero + 2 months with unsupervised machine learning methods such as Principal Component Analysis and deep autoencoder dimensionality reduction allows to clearly separate milder background cases from those with more rapid and aggressive onset of the epidemics. The analysis and findings of the study can be used in evaluation of possible epidemiological scenarios and as an effective modeling tool to design corrective and preventative measures to avoid developments with potentially heavy impact.

Download Full-text

Comparison of different machine learning methods and dimensionality reduction for classification astrocytoma and glioblastoma tissues by mass spectra

F1000Research ◽

10.12688/f1000research.28288.1 ◽

2021 ◽

Vol 10 ◽

pp. 39

Author(s):

Evgeny S. Zhvansky ◽

Anatoly A. Sorokin ◽

Denis S. Zavorotnyuk ◽

Vsevolod A. Shurkhay ◽

Vasiliy A. Eliferov ◽

...

Keyword(s):

Machine Learning ◽

Dimensionality Reduction ◽

Mass Spectra ◽

Principal Component ◽

Large Mass ◽

Ambient Ionization ◽

Mass Spectrometric ◽

Machine Learning Algorithms ◽

Spectrometric Data ◽

Tumor Tissues

Background: Recently developed methods of ambient ionization allow rapid obtaining of large mass spectrometric datasets, which have a great application in biological and medical analysis. One of the areas that could employ such analysis is neurosurgery. The fast in situ identification of dissected tissues could assist the neurosurgery procedure. The additional information about tumor could help the tumor border monitoring. In this paper, tumor tissues of astrocytoma and glioblastoma are compared, as their identifications during surgery could influence the extent of resection and, hence, the median and overall survival. Methods: Mass spectrometric profiles of brain tumor tissues contain molecular information, which is rather hard to interpret in terms of identifications of individual molecules. The machine learning algorithms are employed for the fast automated mass spectra classification. Different algorithms of dimensionality reduction are considered to process the mass spectra before the classification task, as the initial dimensionality of mass spectra is too high compared with the number of mass spectra. Results: Different classifiers are compared for both just preprocessed data and after dimensionality reduction. The Non-Negative Matrix Factorization appears to be the most effective dimensionality reduction algorithm. The random forest algorithm demonstrated the most robust appearance on the tested data. Also, the comparison of the accuracy of the trained classifier on the mass spectra of tissues measured with different instruments and different resolution is provided in the paper. Conclusions: Machine learning classifiers overfit the raw mass spectrometric data. The dimensionality reduction allows the classification of both train and test data with 88% accuracy. Positive mode data provides better accuracy. A combination of principal component analysis and AdaBoost algorithms appears to be most robust to changing the instrument and conditions.

Download Full-text

Customer Segment Prognostic System by Machine Learning using Principal Component and Linear Discriminant Analysis

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b2290.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 6198-6203

Keyword(s):

Machine Learning ◽

Discriminant Analysis ◽

Dimensionality Reduction ◽

Linear Discriminant Analysis ◽

Principal Component ◽

Customer Behavior ◽

Machine Learning Algorithms ◽

Data Set ◽

Linear Discriminant ◽

Customer Group

Recently, manufacturing industry faces lots of problem in predicting the customer behavior and group for matching their outcome with the profit. The organizations are finding difficult in identifying the customer behavior for the purpose of predicting the product design so as to increase the profit. The prediction of customer group is a challenging task for all the organization due to the current growing entrepreneurs. This results in using the machine learning algorithms to cluster the customer group for predicting the demand of the customers. This helps in decision making process of manufacturing the products. This paper attempts to predict the customer group for the wine data set extracted from UCI Machine Learning repository. The wine data set is subjected to dimensionality reduction with principal component analysis and linear discriminant analysis. A Performance analysis is done with various classification algorithms and comparative study is done with the performance metric such as accuracy, precision, recall, and f-score. Experimental results shows that after applying dimensionality reduction, the 2 component LDA reduced wine data set with the kernel SVM, Random Forest classifier is found to be effective with the accuracy of 100% compared to other classifiers.

Download Full-text

Dimensionality Reduction of Sensorial Features by Principal Component Analysis for ANN Machine Learning in Tool Condition Monitoring of CFRP Drilling

Procedia CIRP ◽

10.1016/j.procir.2018.09.072 ◽

2018 ◽

Vol 78 ◽

pp. 307-312 ◽

Cited By ~ 5

Author(s):

Alessandra Caggiano ◽

Roberta Angelone ◽

Francesco Napolitano ◽

Luigi Nele ◽

Roberto Teti

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Dimensionality Reduction ◽

Condition Monitoring ◽

Principal Component ◽

Component Analysis ◽

Tool Condition Monitoring ◽

Tool Condition

Download Full-text

Postulation of Customer Retention in Banking Sector using Machine Learning and Principal Component

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8020.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 3178-3182

Keyword(s):

Machine Learning ◽

Random Forest ◽

Performance Analysis ◽

Dimensionality Reduction ◽

Banking Sector ◽

Principal Component ◽

Experimental Results ◽

Data Set ◽

Customer Churn ◽

Kernel Svm

Recently, there is a rapid growth in technological improvement in banking sector. The entire world is using the banking service for managing their financial and property assets. As of now, all the technological advancements are applied to banking sector to facilitate the customers with proper operational excellence. In this view, the bank has complete responsibility in serving the people with their modern application to save their time and wealth. So the customer value analysis is needed for the bank to enrich the marketing growth and turnover of the bank. But still, the prediction of customer churn remains a challenging issue for the banking sector for analyzing the profit growth. With this view, we focus on predicting the customer churn for the banking application. This paper uses the churn modeling data set extracted from UCI Machine Learning Repository. The anaconda Navigator IDE along with Spyder is used for implementing the Python code. Our contribution is folded is folded in three ways. First, the data set is applied to various classifiers like Logistic Regression, KNN, Kernel SVM, Naive Bayes, Decision Tree, Random Forest to analyze the confusion matrix. The Performance analysis is done by comparing the metrics like Precision, Recall, FScore and Accuracy. Second the data set is subjected to dimensionality reduction method using Principal component Analysis and then fitted to the above mentioned classifiers and their performance analysis is done. Third, the performance analysis is done for the dataset by comparing the metrics with and without applying the dimensionality reduction. A Performance analysis is done with various classification algorithms and comparative study is done with the performance metric such as accuracy, precision, recall, and f-score. The implementation is carried out with python code using Anaconda Navigator. Experimental results shows that before applying dimensionality reduction PCA, the Random Forest classifier is found to be effective with the accuracy of 86%, Precision of 0.85, Recall of 0.86 and FScore of 0.84. Experimental results shows that after applying dimensionality reduction, the 2 component PCA with the kernel SVM classifier is found to be effective with the accuracy of 81%, Precision of 0.81, Recall of 0.81 and FScore of 0.74. compared to other classifiers.

Download Full-text

Comparative Analysis of Machine Learning Techniques with Principal Component Analysis on Kidney and Heart Disease

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.1433 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1564-1572

Author(s):

Reena Chandra, Et. al.

Keyword(s):

Machine Learning ◽

Chronic Kidney Disease ◽

Principal Component Analysis ◽

Logistic Regression ◽

Heart Disease ◽

Kidney Disease ◽

Dimensionality Reduction ◽

Principal Component ◽

Component Analysis ◽

Support Vector

Detection of disease at earlier stages is the most challenging one. Datasets of different diseases are available online with different number of features corresponding to a particular disease. Many dimensionality reduction and feature extraction techniques are used nowadays to reduce the number of features in dataset and finding the most appropriate ones. This paper explores the difference in performance of different machine learning models using Principal Component Analysis dimensionality reduction technique on the datasets of Chronic kidney disease and Cardiovascular disease. Further, the authors apply Logistic Regression, K Nearest Neighbour, Naïve Bayes, Support Vector Machine and Random Forest Model on the datasets and compare the performance of the model with and without PCA. A key challenge in the field of data mining and machine learning is building accurate and computationally efficient classifiers for medical applications. With an accuracy of 100% in chronic kidney disease and 85% for heart disease, KNN classifier and logistic regression were revealed to be the most optimal method of predictions for kidney and heart disease respectively.

Download Full-text

Classification of Observations through Combination of the Dimension Reduction and the Cluster Analysis

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.13 ◽

2017 ◽

Vol 7 (8) ◽

pp. 30

Author(s):

Hyeuk Kim

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Unsupervised Learning ◽

Principal Component ◽

Component Analysis ◽

Baseball Players ◽

Partitioning Around Medoids ◽

Different Characteristics

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.

Download Full-text