Hybrid Reduction Techniques with Covariate Shift Optimization in High-Dimensional Track Geometry

Author(s):  
Ibrahim Balogun ◽  
Nii Attoh-Okine

Abstract In discussions of track geometry, track safety takes precedence over other requirements because its shortfall often leads to unrecoverable loss. Track geometry is unanimously positioned as the index for safety evaluation—corrective or predictive—to predict the rightful maintenance regime based on track conditions. A recent study has shown that track defect probability thresholds can best be explored using a hybrid index. Hence, a dimension reduction technique that combines both safety components and geometry quality is needed. It is observed that dimensional space representation of track parameters without prior covariate shift evaluation could affect the overall distribution as the underlying discrepancies could pose a problem for the accuracy of the prediction. In this study, the authors applied a covariate shift framework to track geometry parameters before applying the dimension reduction techniques. Whilst both principal component analysis (PCA) and t-distributed stochastic neighbour embedding (TSNE) are viable techniques that express the probability distribution of parameters based on correlation in their embedded space and inclination to maximize the variance, shift distribution evaluation should be considered. In conclusion, we demonstrate that our framework can detect and evaluate a covariate shift likelihood in a high-dimensional track geometry defect problem.

2016 ◽  
Vol 27 (5) ◽  
pp. 1331-1350 ◽  
Author(s):  
Maxime Turgeon ◽  
Karim Oualkacha ◽  
Antonio Ciampi ◽  
Hanane Miftah ◽  
Golsa Dehghan ◽  
...  

The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.


2016 ◽  
Author(s):  
Maxime Turgeon ◽  
Karim Oualkacha ◽  
Antonio Ciampi ◽  
Golsa Dehghan ◽  
Brent W. Zanke ◽  
...  

The genomics era has led to an increase in the dimensionality of the data collected to investigate biological questions. In this context, dimension-reduction techniques can be used to summarize high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as Principal Component of Heritability and renamed here as Principal Component of Explained Variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power but limited by its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach will be illustrated using three examples taken from the epigenetics and brain imaging areas.


2013 ◽  
Vol 303-306 ◽  
pp. 1101-1104 ◽  
Author(s):  
Yong De Hu ◽  
Jing Chang Pan ◽  
Xin Tan

Kernel entropy component analysis (KECA) reveals the original data’s structure by kernel matrix. This structure is related to the Renyi entropy of the data. KECA maintains the invariance of the original data’s structure by keeping the data’s Renyi entropy unchanged. This paper described the original data by several components on the purpose of dimension reduction. Then the KECA was applied in celestial spectra reduction and was compared with Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) by experiments. Experimental results show that the KECA is a good method in high-dimensional data reduction.


2013 ◽  
Vol 373-375 ◽  
pp. 468-472
Author(s):  
Chun Ling Li ◽  
Yu Feng Lu

One’s head pose can be estimated using face images. The hidden manifold of head pose in the high dimensional space can be successfully embedded into a 2 dimensional space using Kernel Principal Component Analysis (KPCA). A pose curve is gotten using KPCA train samples and new pose image is projected onto this curve. The pose angle can be estimated using interpolation method. The disadvantage of traditional linear method is conquered by using 2-D KPCA and the experimental results that the method is effective to estimate head poses. The kernel functions effects on estimation accuracy are also discussed.


2021 ◽  
pp. 1321-1333
Author(s):  
Ghadeer JM Mahdi ◽  
Bayda A. Kalaf ◽  
Mundher A. Khaleel

In this paper, a new hybridization of supervised principal component analysis (SPCA) and stochastic gradient descent techniques is proposed, and called as SGD-SPCA, for real large datasets that have a small number of samples in high dimensional space. SGD-SPCA is proposed to become an important tool that can be used to diagnose and treat cancer accurately. When we have large datasets that require many parameters, SGD-SPCA is an excellent method, and it can easily update the parameters when a new observation shows up. Two cancer datasets are used, the first is for Leukemia and the second is for small round blue cell tumors. Also, simulation datasets are used to compare principal component analysis (PCA), SPCA, and SGD-SPCA. The results show that SGD-SPCA is more efficient than other existing methods.


Author(s):  
S. Schmitz ◽  
U. Weidner ◽  
H. Hammer ◽  
A. Thiele

Abstract. In this paper, the nonlinear dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) is investigated to visualize information contained in high dimensional feature representations of Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) data. Based on polarimetric parameters, target decomposition methods and interferometric coherences a wide range of features is extracted that spans the high dimensional feature space. UMAP is applied to determine a representation of the data in 2D and 3D euclidean space, preserving local and global structures of the data and still suited for classification. The performance of UMAP in terms of generating expressive visualizations is evaluated on PolInSAR data acquired by the F-SAR sensor and compared to that of Principal Component Analysis (PCA), Laplacian Eigenmaps (LE) and t-distributed Stochastic Neighbor embedding (t-SNE). For this purpose, a visual analysis of 2D embeddings is performed. In addition, a quantitative analysis is provided for evaluating the preservation of information in low dimensional representations with respect to separability of different land cover classes. The results show that UMAP exceeds the capability of PCA and LE in these regards and is competitive with t-SNE.


2017 ◽  
Vol 10 (13) ◽  
pp. 355 ◽  
Author(s):  
Reshma Remesh ◽  
Pattabiraman. V

Dimensionality reduction techniques are used to reduce the complexity for analysis of high dimensional data sets. The raw input data set may have large dimensions and it might consume time and lead to wrong predictions if unnecessary data attributes are been considered for analysis. So using dimensionality reduction techniques one can reduce the dimensions of input data towards accurate prediction with less cost. In this paper the different machine learning approaches used for dimensionality reductions such as PCA, SVD, LDA, Kernel Principal Component Analysis and Artificial Neural Network  have been studied.


2007 ◽  
Vol 19 (2) ◽  
pp. 513-545 ◽  
Author(s):  
Inge Koch ◽  
Kanta Naito

This letter is concerned with the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. The dimension of the data is reduced by principal component analysis; subsequent application of independent component analysis to the principal component scores determines the most nongaussian directions in the lower-dimensional space. A criterion for choosing the optimal dimension based on bias-adjusted skewness and kurtosis is proposed. This new dimension selector is applied to real data sets and compared to existing methods. Simulation studies for a range of densities show that the proposed method performs well and is more appropriate for nongaussian data than existing methods.


Sign in / Sign up

Export Citation Format

Share Document