scholarly journals An Approach to Data Reduction for Learning from Big Datasets: Integrating Stacking, Rotation, and Agent Population Learning Techniques

Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Ireneusz Czarnowski ◽  
Piotr Jędrzejowicz

In the paper, several data reduction techniques for machine learning from big datasets are discussed and evaluated. The discussed approach focuses on combining several techniques including stacking, rotation, and data reduction aimed at improving the performance of the machine classification. Stacking is seen as the technique allowing to take advantage of the multiple classification models. The rotation-based techniques are used to increase the heterogeneity of the stacking ensembles. Data reduction makes it possible to classify instances belonging to big datasets. We propose to use an agent-based population learning algorithm for data reduction in the feature and instance dimensions. For diversification of the classifier ensembles within the rotation also, alternatively, principal component analysis and independent component analysis are used. The research question addressed in the paper is formulated as follows: does the performance of a classifier using the reduced dataset be improved by integrating the data reduction mechanism with the rotation-based technique and the stacking?

Energies ◽  
2019 ◽  
Vol 12 (12) ◽  
pp. 2229 ◽  
Author(s):  
Mansoor Khan ◽  
Tianqi Liu ◽  
Farhan Ullah

Wind power forecasting plays a vital role in renewable energy production. Accurately forecasting wind energy is a significant challenge due to the uncertain and complex behavior of wind signals. For this purpose, accurate prediction methods are required. This paper presents a new hybrid approach of principal component analysis (PCA) and deep learning to uncover the hidden patterns from wind data and to forecast accurate wind power. PCA is applied to wind data to extract the hidden features from wind data and to identify meaningful information. It is also used to remove high correlation among the values. Further, an optimized deep learning algorithm with a TensorFlow framework is used to accurately forecast wind power from significant features. Finally, the deep learning algorithm is fine-tuned with learning error rate, optimizer function, dropout layer, activation and loss function. The algorithm uses a neural network and intelligent algorithm to predict the wind signals. The proposed idea is applied to three different datasets (hourly, monthly, yearly) gathered from the National Renewable Energy Laboratory (NREL) transforming energy database. The forecasting results show that the proposed research can accurately predict wind power using a span ranging from hours to years. A comparison is made with popular state of the art algorithms and it is demonstrated that the proposed research yields better predictions results.


2018 ◽  
Vol 2018 ◽  
pp. 1-10
Author(s):  
Wenjing Zhao ◽  
Yue Chi ◽  
Yatong Zhou ◽  
Cheng Zhang

SGK (sequential generalization of K-means) dictionary learning denoising algorithm has the characteristics of fast denoising speed and excellent denoising performance. However, the noise standard deviation must be known in advance when using SGK algorithm to process the image. This paper presents a denoising algorithm combined with SGK dictionary learning and the principal component analysis (PCA) noise estimation. At first, the noise standard deviation of the image is estimated by using the PCA noise estimation algorithm. And then it is used for SGK dictionary learning algorithm. Experimental results show the following: (1) The SGK algorithm has the best denoising performance compared with the other three dictionary learning algorithms. (2) The SGK algorithm combined with PCA is superior to the SGK algorithm combined with other noise estimation algorithms. (3) Compared with the original SGK algorithm, the proposed algorithm has higher PSNR and better denoising performance.


2019 ◽  
Author(s):  
Philippe Boileau ◽  
Nima S. Hejazi ◽  
Sandrine Dudoit

AbstractMotivationStatistical analyses of high-throughput sequencing data have re-shaped the biological sciences. In spite of myriad advances, recovering interpretable biological signal from data corrupted by technical noise remains a prevalent open problem. Several classes of procedures, among them classical dimensionality reduction techniques and others incorporating subject-matter knowledge, have provided effective advances; however, no procedure currently satisfies the dual objectives of recovering stable and relevant features simultaneously.ResultsInspired by recent proposals for making use of control data in the removal of unwanted variation, we propose a variant of principal component analysis, sparse contrastive principal component analysis, that extracts sparse, stable, interpretable, and relevant biological signal. The new methodology is compared to competing dimensionality reduction approaches through a simulation study as well as via analyses of several publicly available protein expression, microarray gene expression, and single-cell transcriptome sequencing datasets.AvailabilityA free and open-source software implementation of the methodology, the scPCA R package, is made available via the Bioconductor Project. Code for all analyses presented in the paper is also available via GitHub.


Author(s):  
Abraham García-Aliaga ◽  
Moisés Marquina ◽  
Javier Coterón ◽  
Asier Rodríguez-González ◽  
Sergio Luengo-Sánchez

The purpose of this research was to determine the on-field playing positions of a group of football players based on their technical-tactical behaviour using machine learning algorithms. Each player was characterized according to a set of 52 non-spatiotemporal descriptors including offensive, defensive and build-up variables that were computed from OPTA’s on-ball event records of the matches for 18 national leagues between the 2012 and 2019 seasons. To test whether positions could be identified from the statistical performance of the players, the dimensionality reduction techniques were used. To better understand the differences between the player positions, the most discriminatory variables for each group were obtained as a set of rules discovered by RIPPER, a machine learning algorithm. From the combination of both techniques, we obtained useful conclusions to enhance the performance of players and to identify positions on the field. The study demonstrates the suitability and potential of artificial intelligence to characterize players' positions according to their technical-tactical behaviour, providing valuable information to the professionals of this sport.


2014 ◽  
Vol 32 (3-4) ◽  
pp. 331-351 ◽  
Author(s):  
Ireneusz Czarnowski ◽  
Piotr Jȩdrzejowicz

Sign in / Sign up

Export Citation Format

Share Document