scholarly journals An Improved High-Dimensional Kriging Surrogate Modeling Method through Principal Component Dimension Reduction

Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1985
Author(s):  
Yaohui Li ◽  
Junjun Shi ◽  
Zhifeng Yin ◽  
Jingfang Shen ◽  
Yizhong Wu ◽  
...  

The Kriging surrogate model in complex simulation problems uses as few expensive objectives as possible to establish a global or local approximate interpolation. However, due to the inversion of the covariance correlation matrix and the solving of Kriging-related parameters, the Kriging approximation process for high-dimensional problems is time consuming and even impossible to construct. For this reason, a high-dimensional Kriging modeling method through principal component dimension reduction (HDKM-PCDR) is proposed by considering the correlation parameters and the design variables of a Kriging model. It uses PCDR to transform a high-dimensional correlation parameter vector in Kriging into low-dimensional one, which is used to reconstruct a new correlation function. In this way, time consumption of correlation parameter optimization and correlation function matrix construction in the Kriging modeling process is greatly reduced. Compared with the original Kriging method and the high-dimensional Kriging modeling method based on partial least squares, the proposed method can achieve faster modeling efficiency under the premise of meeting certain accuracy requirements.


Author(s):  
S. Schmitz ◽  
U. Weidner ◽  
H. Hammer ◽  
A. Thiele

Abstract. In this paper, the nonlinear dimension reduction algorithm Uniform Manifold Approximation and Projection (UMAP) is investigated to visualize information contained in high dimensional feature representations of Polarimetric Interferometric Synthetic Aperture Radar (PolInSAR) data. Based on polarimetric parameters, target decomposition methods and interferometric coherences a wide range of features is extracted that spans the high dimensional feature space. UMAP is applied to determine a representation of the data in 2D and 3D euclidean space, preserving local and global structures of the data and still suited for classification. The performance of UMAP in terms of generating expressive visualizations is evaluated on PolInSAR data acquired by the F-SAR sensor and compared to that of Principal Component Analysis (PCA), Laplacian Eigenmaps (LE) and t-distributed Stochastic Neighbor embedding (t-SNE). For this purpose, a visual analysis of 2D embeddings is performed. In addition, a quantitative analysis is provided for evaluating the preservation of information in low dimensional representations with respect to separability of different land cover classes. The results show that UMAP exceeds the capability of PCA and LE in these regards and is competitive with t-SNE.



2013 ◽  
Vol 303-306 ◽  
pp. 1101-1104 ◽  
Author(s):  
Yong De Hu ◽  
Jing Chang Pan ◽  
Xin Tan

Kernel entropy component analysis (KECA) reveals the original data’s structure by kernel matrix. This structure is related to the Renyi entropy of the data. KECA maintains the invariance of the original data’s structure by keeping the data’s Renyi entropy unchanged. This paper described the original data by several components on the purpose of dimension reduction. Then the KECA was applied in celestial spectra reduction and was compared with Principal Component Analysis (PCA) and Kernel Principal Component Analysis (KPCA) by experiments. Experimental results show that the KECA is a good method in high-dimensional data reduction.



2016 ◽  
Vol 27 (5) ◽  
pp. 1331-1350 ◽  
Author(s):  
Maxime Turgeon ◽  
Karim Oualkacha ◽  
Antonio Ciampi ◽  
Hanane Miftah ◽  
Golsa Dehghan ◽  
...  

The genomics era has led to an increase in the dimensionality of data collected in the investigation of biological questions. In this context, dimension-reduction techniques can be used to summarise high-dimensional signals into low-dimensional ones, to further test for association with one or more covariates of interest. This paper revisits one such approach, previously known as principal component of heritability and renamed here as principal component of explained variance (PCEV). As its name suggests, the PCEV seeks a linear combination of outcomes in an optimal manner, by maximising the proportion of variance explained by one or several covariates of interest. By construction, this method optimises power; however, due to its computational complexity, it has unfortunately received little attention in the past. Here, we propose a general analytical PCEV framework that builds on the assets of the original method, i.e. conceptually simple and free of tuning parameters. Moreover, our framework extends the range of applications of the original procedure by providing a computationally simple strategy for high-dimensional outcomes, along with exact and asymptotic testing procedures that drastically reduce its computational cost. We investigate the merits of the PCEV using an extensive set of simulations. Furthermore, the use of the PCEV approach is illustrated using three examples taken from the fields of epigenetics and brain imaging.



2019 ◽  
Vol 9 (18) ◽  
pp. 3723
Author(s):  
Sharif ◽  
Mumtaz ◽  
Shafiq ◽  
Riaz ◽  
Ali ◽  
...  

The rise of social media has led to an increasing online cyber-war via hate and violent comments or speeches, and even slick videos that lead to the promotion of extremism and radicalization. An analysis to sense cyber-extreme content from microblogging sites, specifically Twitter, is a challenging, and an evolving research area since it poses several challenges owing short, noisy, context-dependent, and dynamic nature content. The related tweets were crawled using query words and then carefully labelled into two classes: Extreme (having two sub-classes: pro-Afghanistan government and pro-Taliban) and Neutral. An Exploratory Data Analysis (EDA) using Principal Component Analysis (PCA), was performed for tweets data (having Term Frequency—Inverse Document Frequency (TF-IDF) features) to reduce a high-dimensional data space into a low-dimensional (usually 2-D or 3-D) space. PCA-based visualization has shown better cluster separation between two classes (extreme and neutral), whereas cluster separation, within sub-classes of extreme class, was not clear. The paper also discusses the pros and cons of applying PCA as an EDA in the context of textual data that is usually represented by a high-dimensional feature set. Furthermore, the classification algorithms like naïve Bayes’, K Nearest Neighbors (KNN), random forest, Support Vector Machine (SVM) and ensemble classification methods (with bagging and boosting), etc., were applied with PCA-based reduced features and with a complete set of features (TF-IDF features extracted from n-gram terms in the tweets). The analysis has shown that an SVM demonstrated an average accuracy of 84% compared with other classification models. It is pertinent to mention that this is the novel reported research work in the context of Afghanistan war zone for Twitter content analysis using machine learning methods.



2015 ◽  
Vol 731 ◽  
pp. 120-123
Author(s):  
Song Hua He ◽  
Qiao Chen ◽  
Gang Zhang ◽  
Jiang Duan

Two new metameric black spectral dimension reduction methods based on color difference optimization are presented, and dimension reduction effects are compared in colorimetric and spectral accuracy. The method one decomposes firstly the original spectrum into the basic spectrum and the metameric black spectrum using R-matrix theory, and then determines respectively the basis vectors which express linearly the basic spectrum and the metameric black spectrum. The method two applies firstly the principal component method to the original spectrum to get the first three eigenvectors as basis vectors of the basic spectrum, and then calculates the fundamental spectrum using tristimulus values and basis vectors of original spectrum. Results of experiment show the low-dimensional linear model built by method two can improve spectral and colorimetric accuracy, and satisfy the requirement of spectral color reproduction.



2013 ◽  
Vol 2013 ◽  
pp. 1-8 ◽  
Author(s):  
Paul T. Pearson

This paper develops a process whereby a high-dimensional clustering problem is solved using a neural network and a low-dimensional cluster diagram of the results is produced using the Mapper method from topological data analysis. The low-dimensional cluster diagram makes the neural network's solution to the high-dimensional clustering problem easy to visualize, interpret, and understand. As a case study, a clustering problem from a diabetes study is solved using a neural network. The clusters in this neural network are visualized using the Mapper method during several stages of the iterative process used to construct the neural network. The neural network and Mapper clustering diagram results for the diabetes study are validated by comparison to principal component analysis.



2019 ◽  
Vol 8 (S3) ◽  
pp. 66-71
Author(s):  
T. Sudha ◽  
P. Nagendra Kumar

Data mining is one of the major areas of research. Clustering is one of the main functionalities of datamining. High dimensionality is one of the main issues of clustering and Dimensionality reduction can be used as a solution to this problem. The present work makes a comparative study of dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis in the context of clustering. High dimensional data have been reduced to low dimensional data using dimensionality reduction techniques such as t-distributed stochastic neighbour embedding and probabilistic principal component analysis. Cluster analysis has been performed on the high dimensional data as well as the low dimensional data sets obtained through t-distributed stochastic neighbour embedding and Probabilistic principal component analysis with varying number of clusters. Mean squared error; time and space have been considered as parameters for comparison. The results obtained show that time taken to convert the high dimensional data into low dimensional data using probabilistic principal component analysis is higher than the time taken to convert the high dimensional data into low dimensional data using t-distributed stochastic neighbour embedding.The space required by the data set reduced through Probabilistic principal component analysis is less than the storage space required by the data set reduced through t-distributed stochastic neighbour embedding.



Symmetry ◽  
2019 ◽  
Vol 11 (1) ◽  
pp. 107 ◽  
Author(s):  
Mujtaba Husnain ◽  
Malik Missen ◽  
Shahzad Mumtaz ◽  
Muhammad Luqman ◽  
Mickaël Coustaty ◽  
...  

We applied t-distributed stochastic neighbor embedding (t-SNE) to visualize Urdu handwritten numerals (or digits). The data set used consists of 28 × 28 images of handwritten Urdu numerals. The data set was created by inviting authors from different categories of native Urdu speakers. One of the challenging and critical issues for the correct visualization of Urdu numerals is shape similarity between some of the digits. This issue was resolved using t-SNE, by exploiting local and global structures of the large data set at different scales. The global structure consists of geometrical features and local structure is the pixel-based information for each class of Urdu digits. We introduce a novel approach that allows the fusion of these two independent spaces using Euclidean pairwise distances in a highly organized and principled way. The fusion matrix embedded with t-SNE helps to locate each data point in a two (or three-) dimensional map in a very different way. Furthermore, our proposed approach focuses on preserving the local structure of the high-dimensional data while mapping to a low-dimensional plane. The visualizations produced by t-SNE outperformed other classical techniques like principal component analysis (PCA) and auto-encoders (AE) on our handwritten Urdu numeral dataset.



Mathematics ◽  
2021 ◽  
Vol 9 (5) ◽  
pp. 536
Author(s):  
Junjun Shi ◽  
Jingfang Shen ◽  
Yaohui Li

Finding new valuable sampling points and making these points better distributed in the design space is the key to determining the approximate effect of Kriging. To this end, a high-precision Kriging modeling method based on hybrid sampling criteria (HKM-HS) is proposed to solve this problem. In the HKM-HS method, two infilling sampling strategies based on MSE (Mean Square Error) are optimized to obtain new candidate points. By maximizing MSE (MMSE) of Kriging model, it can generate the first candidate point that is likely to appear in a sparse area. To avoid the ill-conditioned correlation matrix caused by the too close distance between any two sampling points, the MC (MSE and Correlation function) criterion formed by combining the MSE and the correlation function through multiplication and division is minimized to generate the second candidate point. Furthermore, a new screening method is used to select the final expensive evaluation point from the two candidate points. Finally, the test results of sixteen benchmark functions and a house heating case show that the HKM-HS method can effectively enhance the modeling accuracy and stability of Kriging in contrast with other approximate modeling methods.



Sign in / Sign up

Export Citation Format

Share Document