Data Analysis and Data Classification in Machine Learning using Linear Regression and Principal Component Analysis

In this paper step-by-step procedure to implement linear regression and principal component analysis by considering two examples for each model is explained, to predict the continuous values of target variables. Basically linear regression methods are widely used in prediction, forecasting and error reduction. And principle component analysis is applied for facial recognition, computer vision etc. In Principal component analysis, it is explained how to select a point with respect to variance. And also Lagrange multiplier is used to maximize the principle component function, so that optimized solution is obtained

Download Full-text

Research on the impact of the digital economy on the residents’ lives with high quality based on principal component analysis and multiple linear regression methods

10.1109/dsins54396.2021.9670586 ◽

2021 ◽

Author(s):

Aichen Ni

Keyword(s):

Principal Component Analysis ◽

Linear Regression ◽

Multiple Linear Regression ◽

Principal Component ◽

Component Analysis ◽

Digital Economy ◽

High Quality ◽

Regression Methods ◽

The Impact

Download Full-text

Data Analytics for Cardiotocography Data Using Principal Component Analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.32.15574 ◽

2018 ◽

Vol 7 (2.32) ◽

pp. 233

Author(s):

Pratuisha K ◽

Rajeswara Rao .D ◽

J V.R.Murthy

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Data Analytics ◽

Early Stage ◽

Principal Component ◽

Component Analysis ◽

Chi Square ◽

Principle Component ◽

Precise Diagnosis ◽

Chi Square Test

With growing congenital anamelies in recent years detection of heart problems in fetus has become critical. Cardiotocography test assists doctors in such dignosis followed by cure. Here analytics of cardiotocography data is presented in details.Understanding ,cleaning and preprocessing the data is one of the the foremost part for any researcher,In this work data is cleaned,preprocessed,normalized, Also the attributes are selected by using the Chi-square test. Colinearity problem is addressed using Principle component analysis.Such analytics and prepro-cessing will help in machine learning or allied models for predict-ing precise diagnosis at an early stage

Download Full-text

Classification of Observations through Combination of the Dimension Reduction and the Cluster Analysis

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.13 ◽

2017 ◽

Vol 7 (8) ◽

pp. 30

Author(s):

Hyeuk Kim

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Cluster Analysis ◽

Unsupervised Learning ◽

Principal Component ◽

Component Analysis ◽

Baseball Players ◽

Partitioning Around Medoids ◽

Different Characteristics

Unsupervised learning in machine learning divides data into several groups. The observations in the same group have similar characteristics and the observations in the different groups have the different characteristics. In the paper, we classify data by partitioning around medoids which have some advantages over the k-means clustering. We apply it to baseball players in Korea Baseball League. We also apply the principal component analysis to data and draw the graph using two components for axis. We interpret the meaning of the clustering graphically through the procedure. The combination of the partitioning around medoids and the principal component analysis can be used to any other data and the approach makes us to figure out the characteristics easily.

Download Full-text

Analysis of the Bath Motion in the MM-SQC Dynamics Using Unsupervised Machine Learning Dimensionality Reduction Approaches: Principal Component Analysis

10.26434/chemrxiv.13332530 ◽

2020 ◽

Author(s):

Jiawei Peng ◽

Yu Xie ◽

Deping Hu ◽

Zhenggang Lan

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Collective Motion ◽

Principal Component ◽

Component Analysis ◽

Nonadiabatic Dynamics ◽

Trajectory Data ◽

Unsupervised Machine Learning ◽

Physical Knowledge ◽

Vibronic Couplings

The system-plus-bath model is an important tool to understand nonadiabatic dynamics for large molecular systems. The understanding of the collective motion of a huge number of bath modes is essential to reveal their key roles in the overall dynamics. We apply the principal component analysis (PCA) to investigate the bath motion based on the massive data generated from the MM-SQC (symmetrical quasi-classical dynamics method based on the Meyer-Miller mapping Hamiltonian) nonadiabatic dynamics of the excited-state energy transfer dynamics of Frenkel-exciton model. The PCA method clearly clarifies that two types of bath modes, which either display the strong vibronic couplings or have the frequencies close to electronic transition, are very important to the nonadiabatic dynamics. These observations are fully consistent with the physical insights. This conclusion is obtained purely based on the PCA understanding of the trajectory data, without the large involvement of pre-defined physical knowledge. The results show that the PCA approach, one of the simplest unsupervised machine learning methods, is very powerful to analyze the complicated nonadiabatic dynamics in condensed phase involving many degrees of freedom.

Download Full-text

Comparative Analysis of Machine Learning Techniques with Principal Component Analysis on Kidney and Heart Disease

10.1109/icesc51422.2021.9533011 ◽

2021 ◽

Author(s):

Reena Chandra ◽

Manoj Kapil ◽

Avinash Sharma

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Heart Disease ◽

Comparative Analysis ◽

Principal Component ◽

Component Analysis ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

Principal component analysis for functional data

10.1093/oxfordhb/9780199568444.013.8 ◽

2018 ◽

Cited By ~ 2

Author(s):

Peter Hall

Keyword(s):

Principal Component Analysis ◽

Linear Regression ◽

Functional Data ◽

Weighted Least Squares ◽

Principal Component ◽

Adaptive Methods ◽

Component Analysis ◽

Infinite Dimensional ◽

Role Of Principal ◽

Functional Linear Regression

This article discusses the methodology and theory of principal component analysis (PCA) for functional data. It first provides an overview of PCA in the context of finite-dimensional data and infinite-dimensional data, focusing on functional linear regression, before considering the applications of PCA for functional data analysis, principally in cases of dimension reduction. It then describes adaptive methods for prediction and weighted least squares in functional linear regression. It also examines the role of principal components in the assessment of density for functional data, showing how principal component functions are linked to the amount of probability mass contained in a small ball around a given, fixed function, and how this property can be used to define a simple, easily estimable density surrogate. The article concludes by explaining the use of PCA for estimating log-density.

Download Full-text

Criteria for choosing the number of dimensions in a principal component analysis: An empirical assessment

10.5753/sbbd.2020.13632 ◽

2020 ◽

Author(s):

Renata Silva ◽

Daniel Oliveira ◽

Davi Pereira Santos ◽

Lucio F.D. Santos ◽

Rodrigo Erthal Wilson ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Hypothesis Test ◽

Feature Learning ◽

Principal Component ◽

Component Analysis ◽

Scree Plot ◽

Open Issue ◽

Chained Tasks ◽

High Dimensional Datasets

Principal component analysis (PCA) is an efficient model for the optimization problem of finding d' axes of a subspace Rd' ⊆ Rd so that the mean squared distances from a given set R of points to the axes are minimal. Despite being steadily employed since 1901 in different scenarios, e.g., mechanics, PCA has become an important link in machine learning chained tasks, such as feature learning and AutoML designs. A frequent yet open issue that arises from supervised-based problems is how many PCA axes are required for the performance of machine learning constructs to be tuned. Accordingly, we investigate the behavior of six independent and uncoupled criteria for estimating the number of PCA axes, namely Scree-Plot %, Scree Plot Gap, Kaiser-Guttman, Broken-Stick, p-Score, and 2D. In total, we evaluate the performance of those approaches in 20 high dimensional datasets by using (i) four different classifiers, and (ii) a hypothesis test upon the reported F-Measures. Results indicate Broken-Stick and Scree-Plot % criteria consistently outperformed the competitors regarding supervised-based tasks, whereas estimators Kaiser-Guttman and Scree-Plot Gap delivered poor performances in the same scenarios.

Download Full-text

A machine learning approach to medical data identification through principal component analysis

Big Data III: Learning, Analytics, and Applications ◽

10.1117/12.2586038 ◽

2021 ◽

Author(s):

Lorenzo E. Jaques ◽

Arthur C. Depoian ◽

Dong Xie ◽

Colleen P. Bailey ◽

Parthasarathy Guturu

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Principal Component ◽

Component Analysis ◽

Medical Data ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Tropical principal component analysis on the space of phylogenetic trees

Bioinformatics ◽

10.1093/bioinformatics/btaa564 ◽

2020 ◽

Vol 36 (17) ◽

pp. 4590-4598

Author(s):

Robert Page ◽

Ruriko Yoshida ◽

Leon Zhang

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Phylogenetic Trees ◽

Principal Component ◽

Component Analysis ◽

Fixed Number ◽

Supplementary Information ◽

Gene Trees ◽

Learning Methods ◽

Machine Learning Methods

Abstract Motivation Due to new technology for efficiently generating genome data, machine learning methods are urgently needed to analyze large sets of gene trees over the space of phylogenetic trees. However, the space of phylogenetic trees is not Euclidean, so ordinary machine learning methods cannot be directly applied. In 2019, Yoshida et al. introduced the notion of tropical principal component analysis (PCA), a statistical method for visualization and dimensionality reduction using a tropical polytope with a fixed number of vertices that minimizes the sum of tropical distances between each data point and its tropical projection. However, their work focused on the tropical projective space rather than the space of phylogenetic trees. We focus here on tropical PCA for dimension reduction and visualization over the space of phylogenetic trees. Results Our main results are 2-fold: (i) theoretical interpretations of the tropical principal components over the space of phylogenetic trees, namely, the existence of a tropical cell decomposition into regions of fixed tree topology; and (ii) the development of a stochastic optimization method to estimate tropical PCs over the space of phylogenetic trees using a Markov Chain Monte Carlo approach. This method performs well with simulation studies, and it is applied to three empirical datasets: Apicomplexa and African coelacanth genomes as well as sequences of hemagglutinin for influenza from New York. Availability and implementation Dataset: http://polytopes.net/Data.tar.gz. Code: http://polytopes.net/tropica_MCMC_codes.tar.gz. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Multivariate Analysis and Machine Learning for Ripeness Classification of Cape Gooseberry Fruits

Processes ◽

10.3390/pr7120928 ◽

2019 ◽

Vol 7 (12) ◽

pp. 928 ◽

Cited By ~ 2

Author(s):

Miguel De-la-Torre ◽

Omar Zatarain ◽

Himer Avila-George ◽

Mirna Muñoz ◽

Jimy Oblitas ◽

...

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Feature Selection ◽

Principal Component ◽

Component Analysis ◽

Support Vector ◽

Color Spaces ◽

Combination Methods ◽

Fruit Samples ◽

Cape Gooseberry

This paper explores five multivariate techniques for information fusion on sorting the visual ripeness of Cape gooseberry fruits (principal component analysis, linear discriminant analysis, independent component analysis, eigenvector centrality feature selection, and multi-cluster feature selection.) These techniques are applied to the concatenated channels corresponding to red, green, and blue (RGB), hue, saturation, value (HSV), and lightness, red/green value, and blue/yellow value (L*a*b) color spaces (9 features in total). Machine learning techniques have been reported for sorting the Cape gooseberry fruits’ ripeness. Classifiers such as neural networks, support vector machines, and nearest neighbors discriminate on fruit samples using different color spaces. Despite the color spaces being equivalent up to a transformation, a few classifiers enable better performances due to differences in the pixel distribution of samples. Experimental results show that selection and combination of color channels allow classifiers to reach similar levels of accuracy; however, combination methods still require higher computational complexity. The highest level of accuracy was obtained using the seven-dimensional principal component analysis feature space.

Download Full-text