An Approach to Data Reduction for Learning from Big Datasets: Integrating Stacking, Rotation, and Agent Population Learning Techniques

In the paper, several data reduction techniques for machine learning from big datasets are discussed and evaluated. The discussed approach focuses on combining several techniques including stacking, rotation, and data reduction aimed at improving the performance of the machine classification. Stacking is seen as the technique allowing to take advantage of the multiple classification models. The rotation-based techniques are used to increase the heterogeneity of the stacking ensembles. Data reduction makes it possible to classify instances belonging to big datasets. We propose to use an agent-based population learning algorithm for data reduction in the feature and instance dimensions. For diversification of the classifier ensembles within the rotation also, alternatively, principal component analysis and independent component analysis are used. The research question addressed in the paper is formulated as follows: does the performance of a classifier using the reduced dataset be improved by integrating the data reduction mechanism with the rotation-based technique and the stacking?

Download Full-text

Comparative Analysis of Machine Learning Techniques with Principal Component Analysis on Kidney and Heart Disease

10.1109/icesc51422.2021.9533011 ◽

2021 ◽

Author(s):

Reena Chandra ◽

Manoj Kapil ◽

Avinash Sharma

Keyword(s):

Machine Learning ◽

Principal Component Analysis ◽

Heart Disease ◽

Comparative Analysis ◽

Principal Component ◽

Component Analysis ◽

Machine Learning Techniques ◽

Learning Techniques

Download Full-text

A New Hybrid Approach to Forecast Wind Power for Large Scale Wind Turbine Data Using Deep Learning with TensorFlow Framework and Principal Component Analysis

Energies ◽

10.3390/en12122229 ◽

2019 ◽

Vol 12 (12) ◽

pp. 2229 ◽

Cited By ~ 3

Author(s):

Mansoor Khan ◽

Tianqi Liu ◽

Farhan Ullah

Keyword(s):

Principal Component Analysis ◽

Renewable Energy ◽

Deep Learning ◽

Wind Power ◽

Learning Algorithm ◽

Hybrid Approach ◽

Principal Component ◽

Component Analysis ◽

Wind Data ◽

Deep Learning Algorithm

Wind power forecasting plays a vital role in renewable energy production. Accurately forecasting wind energy is a significant challenge due to the uncertain and complex behavior of wind signals. For this purpose, accurate prediction methods are required. This paper presents a new hybrid approach of principal component analysis (PCA) and deep learning to uncover the hidden patterns from wind data and to forecast accurate wind power. PCA is applied to wind data to extract the hidden features from wind data and to identify meaningful information. It is also used to remove high correlation among the values. Further, an optimized deep learning algorithm with a TensorFlow framework is used to accurately forecast wind power from significant features. Finally, the deep learning algorithm is fine-tuned with learning error rate, optimizer function, dropout layer, activation and loss function. The algorithm uses a neural network and intelligent algorithm to predict the wind signals. The proposed idea is applied to three different datasets (hourly, monthly, yearly) gathered from the National Renewable Energy Laboratory (NREL) transforming energy database. The forecasting results show that the proposed research can accurately predict wind power using a span ranging from hours to years. A comparison is made with popular state of the art algorithms and it is demonstrated that the proposed research yields better predictions results.

Download Full-text

Image Denoising Algorithm Combined with SGK Dictionary Learning and Principal Component Analysis Noise Estimation

Mathematical Problems in Engineering ◽

10.1155/2018/1259703 ◽

2018 ◽

Vol 2018 ◽

pp. 1-10

Author(s):

Wenjing Zhao ◽

Yue Chi ◽

Yatong Zhou ◽

Cheng Zhang

Keyword(s):

Principal Component Analysis ◽

Standard Deviation ◽

Dictionary Learning ◽

Learning Algorithm ◽

Principal Component ◽

Estimation Algorithm ◽

Component Analysis ◽

Noise Estimation ◽

Noise Standard Deviation ◽

Estimation Algorithms

SGK (sequential generalization of K-means) dictionary learning denoising algorithm has the characteristics of fast denoising speed and excellent denoising performance. However, the noise standard deviation must be known in advance when using SGK algorithm to process the image. This paper presents a denoising algorithm combined with SGK dictionary learning and the principal component analysis (PCA) noise estimation. At first, the noise standard deviation of the image is estimated by using the PCA noise estimation algorithm. And then it is used for SGK dictionary learning algorithm. Experimental results show the following: (1) The SGK algorithm has the best denoising performance compared with the other three dictionary learning algorithms. (2) The SGK algorithm combined with PCA is superior to the SGK algorithm combined with other noise estimation algorithms. (3) Compared with the original SGK algorithm, the proposed algorithm has higher PSNR and better denoising performance.

Download Full-text

Exploring High-Dimensional Biological Data with Sparse Contrastive Principal Component Analysis

10.1101/836650 ◽

2019 ◽

Cited By ~ 1

Author(s):

Philippe Boileau ◽

Nima S. Hejazi ◽

Sandrine Dudoit

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

High Throughput Sequencing ◽

Principal Component ◽

Component Analysis ◽

Biological Data ◽

Sequencing Data ◽

Microarray Gene Expression ◽

Biological Signal ◽

Reduction Techniques

AbstractMotivationStatistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances; however, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.ResultsInspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis, sparse contrastive principal component analysis, that extracts sparse, stable, interpretable, and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study as well as via analyses of several publicly available protein expression, microarray gene expression, and single-cell transcriptome sequencing datasets.AvailabilityA free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in the paper is also available via GitHub.

Download Full-text

Comparison of Different Training Data Reduction Approaches for Fast Support Vector Machines Based on Principal Component Analysis and Distance Based Measurements

International Journal of Computational and Experimental Science and Engineering ◽

10.22399/ijcesen.374222 ◽

2018 ◽

Vol 4 (1) ◽

pp. 1-5 ◽

Cited By ~ 1

Author(s):

Gür Emre Güraksın ◽

Harun Uğuz

Keyword(s):

Principal Component Analysis ◽

Support Vector Machines ◽

Data Reduction ◽

Principal Component ◽

Component Analysis ◽

Training Data ◽

Support Vector ◽

Vector Machines

Download Full-text

In-game behaviour analysis of football players using machine learning techniques based on player statistics

International Journal of Sports Science & Coaching ◽

10.1177/1747954120959762 ◽

2020 ◽

pp. 174795412095976

Author(s):

Abraham García-Aliaga ◽

Moisés Marquina ◽

Javier Coterón ◽

Asier Rodríguez-González ◽

Sergio Luengo-Sánchez

Keyword(s):

Machine Learning ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Football Players ◽

Machine Learning Algorithm ◽

Reduction Techniques ◽

Learning Techniques ◽

Dimensionality Reduction Techniques ◽

Player Positions

The purpose of this research was to determine the on-field playing positions of a group of football players based on their technical-tactical behaviour using machine learning algorithms. Each player was characterized according to a set of 52 non-spatiotemporal descriptors including offensive, defensive and build-up variables that were computed from OPTA’s on-ball event records of the matches for 18 national leagues between the 2012 and 2019 seasons. To test whether positions could be identified from the statistical performance of the players, the dimensionality reduction techniques were used. To better understand the differences between the player positions, the most discriminatory variables for each group were obtained as a set of rules discovered by RIPPER, a machine learning algorithm. From the combination of both techniques, we obtained useful conclusions to enhance the performance of players and to identify positions on the field. The study demonstrates the suitability and potential of artificial intelligence to characterize players' positions according to their technical-tactical behaviour, providing valuable information to the professionals of this sport.

Download Full-text