An Improved High-Dimensional Kriging Surrogate Modeling Method through Principal Component Dimension Reduction

Yaohui Li; Junjun Shi; Zhifeng Yin; Jingfang Shen; Yizhong Wu; Shuting Wang

doi:10.3390/math9161985

An Improved High-Dimensional Kriging Surrogate Modeling Method through Principal Component Dimension Reduction

Mathematics ◽

10.3390/math9161985 ◽

2021 ◽

Vol 9 (16) ◽

pp. 1985

Author(s):

Yaohui Li ◽

Junjun Shi ◽

Zhifeng Yin ◽

Jingfang Shen ◽

Yizhong Wu ◽

...

Keyword(s):

Correlation Function ◽

Dimension Reduction ◽

Principal Component ◽

Kriging Model ◽

Modeling Method ◽

High Dimensional ◽

Approximation Process ◽

Correlation Parameter ◽

Design Variables ◽

Low Dimensional

The Kriging surrogate model in complex simulation problems uses as few expensive objectives as possible to establish a global or local approximate interpolation. However, due to the inversion of the covariance correlation matrix and the solving of Kriging-related parameters, the Kriging approximation process for high-dimensional problems is time consuming and even impossible to construct. For this reason, a high-dimensional Kriging modeling method through principal component dimension reduction (HDKM-PCDR) is proposed by considering the correlation parameters and the design variables of a Kriging model. It uses PCDR to transform a high-dimensional correlation parameter vector in Kriging into low-dimensional one, which is used to reconstruct a new correlation function. In this way, time consumption of correlation parameter optimization and correlation function matrix construction in the Kriging modeling process is greatly reduced. Compared with the original Kriging method and the high-dimensional Kriging modeling method based on partial least squares, the proposed method can achieve faster modeling efficiency under the premise of meeting certain accuracy requirements.

EVALUATING UNIFORM MANIFOLD APPROXIMATION AND PROJECTION FOR DIMENSION REDUCTION AND VISUALIZATION OF POLINSAR FEATURES

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-1-2021-39-2021 ◽

2021 ◽

Vol V-1-2021 ◽

pp. 39-46

Author(s):

S. Schmitz ◽

U. Weidner ◽

H. Hammer ◽

A. Thiele

Keyword(s):

Dimension Reduction ◽

Visual Analysis ◽

Principal Component ◽

Feature Space ◽

Decomposition Methods ◽

High Dimensional ◽

Feature Representations ◽

Wide Range ◽

Nonlinear Dimension ◽

Low Dimensional

Abstract. In this paper, the nonlinear dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) is investigated to visualize information contained in high dimensional feature representations of Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) data. Based on polarimetric parameters, target decomposition methods and interferometric coherences a wide range of features is extracted that spans the high dimensional feature space. UMAP is applied to determine a representation of the data in 2D and 3D euclidean space, preserving local and global structures of the data and still suited for classification. The performance of UMAP in terms of generating expressive visualizations is evaluated on PolInSAR data acquired by the F-SAR sensor and compared to that of Principal Component Analysis (PCA), Laplacian Eigenmaps (LE) and t-distributed Stochastic Neighbor embedding (t-SNE). For this purpose, a visual analysis of 2D embeddings is performed. In addition, a quantitative analysis is provided for evaluating the preservation of information in low dimensional representations with respect to separability of different land cover classes. The results show that UMAP exceeds the capability of PCA and LE in these regards and is competitive with t-SNE.

High-Dimensional Data Dimension Reduction Based on KECA

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.303-306.1101 ◽

2013 ◽

Vol 303-306 ◽

pp. 1101-1104 ◽

Cited By ~ 2

Author(s):

Yong De Hu ◽

Jing Chang Pan ◽

Xin Tan

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

High Dimensional Data ◽

Principal Component ◽

Good Method ◽

Component Analysis ◽

Renyi Entropy ◽

Rényi Entropy ◽

Kernel Principal Component Analysis ◽

High Dimensional

Kernel entropy component analysis (KECA) reveals the original data’s structure by kernel matrix. This structure is related to the Renyi entropy of the data. KECA maintains the invariance of the original data’s structure by keeping the data’s Renyi entropy unchanged. This paper described the original data by several components on the purpose of dimension reduction. Then the KECA was applied in celestial spectra reduction and was compared with Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) by experiments. Experimental results show that the KECA is a good method in high-dimensional data reduction.

Principal component of explained variance: An efficient and optimal data dimension reduction framework for association studies

Statistical Methods in Medical Research ◽

10.1177/0962280216660128 ◽

2016 ◽

Vol 27 (5) ◽

pp. 1331-1350 ◽

Cited By ~ 4

Author(s):

Maxime Turgeon ◽

Karim Oualkacha ◽

Antonio Ciampi ◽

Hanane Miftah ◽

Golsa Dehghan ◽

...

Keyword(s):

Dimension Reduction ◽

Association Studies ◽

Computational Cost ◽

Principal Component ◽

Original Method ◽

High Dimensional ◽

Testing Procedures ◽

Simple Strategy ◽

Reduction Techniques ◽

Explained Variance

The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.

An Empirical Approach for Extreme Behavior Identification through Tweets Using Machine Learning

Applied Sciences ◽

10.3390/app9183723 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3723

Author(s):

Sharif ◽

Mumtaz ◽

Shafiq ◽

Riaz ◽

Ali ◽

...

Keyword(s):

Machine Learning ◽

Research Work ◽

Principal Component ◽

Research Area ◽

Ensemble Classification ◽

High Dimensional ◽

Support Vector ◽

K Nearest Neighbors ◽

N Gram ◽

Low Dimensional

The rise of social media has led to an increasing online cyber-war via hate and violent comments or speeches, and even slick videos that lead to the promotion of extremism and radicalization. An analysis to sense cyber-extreme content from microblogging sites, specifically Twitter, is a challenging, and an evolving research area since it poses several challenges owing short, noisy, context-dependent, and dynamic nature content. The related tweets were crawled using query words and then carefully labelled into two classes: Extreme (having two sub-classes: pro-Afghanistan government and pro-Taliban) and Neutral. An Exploratory Data Analysis (EDA) using Principal Component Analysis (PCA), was performed for tweets data (having Term Frequency—Inverse Document Frequency (TF-IDF) features) to reduce a high-dimensional data space into a low-dimensional (usually 2-D or 3-D) space. PCA-based visualization has shown better cluster separation between two classes (extreme and neutral), whereas cluster separation, within sub-classes of extreme class, was not clear. The paper also discusses the pros and cons of applying PCA as an EDA in the context of textual data that is usually represented by a high-dimensional feature set. Furthermore, the classification algorithms like naïve Bayes’, K Nearest Neighbors (KNN), random forest, Support Vector Machine (SVM) and ensemble classification methods (with bagging and boosting), etc., were applied with PCA-based reduced features and with a complete set of features (TF-IDF features extracted from n-gram terms in the tweets). The analysis has shown that an SVM demonstrated an average accuracy of 84% compared with other classification models. It is pertinent to mention that this is the novel reported research work in the context of Afghanistan war zone for Twitter content analysis using machine learning methods.

Comparative Studies of Two Metameric Black Spectral Dimension Reduction Methods Based on Color Difference Optimization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.731.120 ◽

2015 ◽

Vol 731 ◽

pp. 120-123

Author(s):

Song Hua He ◽

Qiao Chen ◽

Gang Zhang ◽

Jiang Duan

Keyword(s):

Dimension Reduction ◽

Matrix Theory ◽

Principal Component ◽

Color Difference ◽

Spectral Dimension ◽

R Matrix ◽

Original Spectrum ◽

Reduction Methods ◽

Low Dimensional ◽

Basis Vectors

Two new metameric black spectral dimension reduction methods based on color difference optimization are presented, and dimension reduction effects are compared in colorimetric and spectral accuracy. The method one decomposes firstly the original spectrum into the basic spectrum and the metameric black spectrum using R-matrix theory, and then determines respectively the basis vectors which express linearly the basic spectrum and the metameric black spectrum. The method two applies firstly the principal component method to the original spectrum to get the first three eigenvectors as basis vectors of the basic spectrum, and then calculates the fundamental spectrum using tristimulus values and basis vectors of original spectrum. Results of experiment show the low-dimensional linear model built by method two can improve spectral and colorimetric accuracy, and satisfy the requirement of spectral color reproduction.

Visualizing Clusters in Artificial Neural Networks Using Morse Theory

Advances in Artificial Neural Systems ◽

10.1155/2013/486363 ◽

2013 ◽

Vol 2013 ◽

pp. 1-8 ◽

Cited By ~ 1

Author(s):

Paul T. Pearson

Keyword(s):

Neural Network ◽

Iterative Process ◽

Principal Component ◽

Topological Data Analysis ◽

High Dimensional ◽

Clustering Problem ◽

The Neural Network ◽

Low Dimensional ◽

High Dimensional Clustering

This paper develops a process whereby a high-dimensional clustering problem is solved using a neural network and a low-dimensional cluster diagram of the results is produced using the Mapper method from topological data analysis. The low-dimensional cluster diagram makes the neural network's solution to the high-dimensional clustering problem easy to visualize, interpret, and understand. As a case study, a clustering problem from a diabetes study is solved using a neural network. The clusters in this neural network are visualized using the Mapper method during several stages of the iterative process used to construct the neural network. The neural network and Mapper clustering diagram results for the diabetes study are validated by comparison to principal component analysis.

Performance Analysis of Dimensionality Reduction Techniques in the Context of Clustering

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s3.2084 ◽

2019 ◽

Vol 8 (S3) ◽

pp. 66-71

Author(s):

T. Sudha ◽

P. Nagendra Kumar

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

High Dimensional Data ◽

Principal Component ◽

Component Analysis ◽

High Dimensional ◽

Reduction Techniques ◽

Dimensionality Reduction Techniques ◽

Low Dimensional ◽

Probabilistic Principal Component Analysis

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.

Noise-free principal component analysis: An efficient dimension reduction technique for high dimensional molecular data

Expert Systems with Applications ◽

10.1016/j.eswa.2014.06.024 ◽

2014 ◽

Vol 41 (17) ◽

pp. 7797-7804 ◽

Cited By ~ 6

Author(s):

Mansoor Rezghi ◽

Askar obulkasim

Keyword(s):

Principal Component Analysis ◽

Dimension Reduction ◽

Principal Component ◽

Component Analysis ◽

Molecular Data ◽

Reduction Technique ◽

High Dimensional

Visualization of High-Dimensional Data by Pairwise Fusion Matrices Using t-SNE

Symmetry ◽

10.3390/sym11010107 ◽

2019 ◽

Vol 11 (1) ◽

pp. 107 ◽

Cited By ~ 6

Author(s):

Mujtaba Husnain ◽

Malik Missen ◽

Shahzad Mumtaz ◽

Muhammad Luqman ◽

Mickaël Coustaty ◽

...

Keyword(s):

Local Structure ◽

High Dimensional Data ◽

Three Dimensional ◽

Principal Component ◽

Large Data ◽

High Dimensional ◽

Data Set ◽

Novel Approach ◽

Critical Issues ◽

Low Dimensional

We applied t-distributed stochastic neighbor embedding (t-SNE) to visualize Urdu handwritten numerals (or digits). The data set used consists of 28 × 28 images of handwritten Urdu numerals. The data set was created by inviting authors from different categories of native Urdu speakers. One of the challenging and critical issues for the correct visualization of Urdu numerals is shape similarity between some of the digits. This issue was resolved using t-SNE, by exploiting local and global structures of the large data set at different scales. The global structure consists of geometrical features and local structure is the pixel-based information for each class of Urdu digits. We introduce a novel approach that allows the fusion of these two independent spaces using Euclidean pairwise distances in a highly organized and principled way. The fusion matrix embedded with t-SNE helps to locate each data point in a two (or three-) dimensional map in a very different way. Furthermore, our proposed approach focuses on preserving the local structure of the high-dimensional data while mapping to a low-dimensional plane. The visualizations produced by t-SNE outperformed other classical techniques like principal component analysis (PCA) and auto-encoders (AE) on our handwritten Urdu numeral dataset.

High-Precision Kriging Modeling Method Based on Hybrid Sampling Criteria

Mathematics ◽

10.3390/math9050536 ◽

2021 ◽

Vol 9 (5) ◽

pp. 536

Author(s):

Junjun Shi ◽

Jingfang Shen ◽

Yaohui Li

Keyword(s):

Correlation Function ◽

High Precision ◽

Screening Method ◽

Kriging Model ◽

Modeling Method ◽

Evaluation Point ◽

Sampling Points ◽

Close Distance ◽

Candidate Point ◽

Hybrid Sampling

Finding new valuable sampling points and making these points better distributed in the design space is the key to determining the approximate effect of Kriging. To this end, a high-precision Kriging modeling method based on hybrid sampling criteria (HKM-HS) is proposed to solve this problem. In the HKM-HS method, two infilling sampling strategies based on MSE (Mean Square Error) are optimized to obtain new candidate points. By maximizing MSE (MMSE) of Kriging model, it can generate the first candidate point that is likely to appear in a sparse area. To avoid the ill-conditioned correlation matrix caused by the too close distance between any two sampling points, the MC (MSE and Correlation function) criterion formed by combining the MSE and the correlation function through multiplication and division is minimized to generate the second candidate point. Furthermore, a new screening method is used to select the final expensive evaluation point from the two candidate points. Finally, the test results of sixteen benchmark functions and a house heating case show that the HKM-HS method can effectively enhance the modeling accuracy and stability of Kriging in contrast with other approximate modeling methods.