The efficient cross-validation of principal components applied to principal component regression

1995 ◽  
Vol 5 (3) ◽  
pp. 227-235 ◽  
Author(s):  
Bart Mertens ◽  
Tom Fearn ◽  
Michael Thompson
Author(s):  
John Tipton ◽  
Mevin Hooten ◽  
Simon Goring

Abstract. Scientific records of temperature and precipitation have been kept for several hundred years, but for many areas, only a shorter record exists. To understand climate change, there is a need for rigorous statistical reconstructions of the paleoclimate using proxy data. Paleoclimate proxy data are often sparse, noisy, indirect measurements of the climate process of interest, making each proxy uniquely challenging to model statistically. We reconstruct spatially explicit temperature surfaces from sparse and noisy measurements recorded at historical United States military forts and other observer stations from 1820 to 1894. One common method for reconstructing the paleoclimate from proxy data is principal component regression (PCR). With PCR, one learns a statistical relationship between the paleoclimate proxy data and a set of climate observations that are used as patterns for potential reconstruction scenarios. We explore PCR in a Bayesian hierarchical framework, extending classical PCR in a variety of ways. First, we model the latent principal components probabilistically, accounting for measurement error in the observational data. Next, we extend our method to better accommodate outliers that occur in the proxy data. Finally, we explore alternatives to the truncation of lower-order principal components using different regularization techniques. One fundamental challenge in paleoclimate reconstruction efforts is the lack of out-of-sample data for predictive validation. Cross-validation is of potential value, but is computationally expensive and potentially sensitive to outliers in sparse data scenarios. To overcome the limitations that a lack of out-of-sample records presents, we test our methods using a simulation study, applying proper scoring rules including a computationally efficient approximation to leave-one-out cross-validation using the log score to validate model performance. The result of our analysis is a spatially explicit reconstruction of spatio-temporal temperature from a very sparse historical record.


Author(s):  
Jihhyeon Yi ◽  
Sungryul Park ◽  
Juah Im ◽  
Seonyeong Jeon ◽  
Gyouhyung Kyung

The purpose of this study was to examine the effects of display curvature and hand length on smartphone usability, which was assessed in terms of grip comfort, immersive feeling, typing performance, and overall satisfaction. A total of 20 younger individuals with the mean (SD) age of 20.8 (2.4) yrs were divided into three hand-size groups (small: 8, medium: 6, large: 6). Two smartphones of the same size were used – one with a flat display and the other with a side-edge curved display. Three tasks (watching video, calling, and texting) were used to evaluate smartphone usability. The smartphones were used in a landscape mode for the first task, and in a portrait mode for the other two. The flat display smartphone provided higher grip comfort during calling (p = 0.008) and texting (p = 0.006) and higher overall satisfaction (p = 0.0002) than the curved display smartphone. The principal component regression (adjusted R2 = 0.49) of overall satisfaction on three principal components comprised of the remaining measures showed that the first principal component on grip comfort was more important than the other two on watching experience and texting performance. It is thus necessary to carefully consider the effect of display curvature on grip comfort when applying curved displays to hand-held devices such as smartphones.


1994 ◽  
Vol 48 (1) ◽  
pp. 37-43 ◽  
Author(s):  
M. Blanco ◽  
J. Coello ◽  
H. Iturriaga ◽  
S. Maspoch ◽  
M. Redon

The potential of principal component regression (PCR) for mixture resolution by UV-visible spectrophotometry was assessed. For this purpose, a set of binary mixtures with Gaussian bands was simulated, and the influence of spectral overlap on the precision of quantification was studied. Likewise, the results obtained in the resolution of a mixture of components with extensively overlapped spectra were investigated in terms of spectral noise and the criterion used to select the optimal number of principal components. The model was validated by cross-validation, and the number of significant principal components was determined on the basis of four different criteria. Three types of noise were considered: intrinsic instrumental noise, which was modeled from experimental data provided by an HP 8452A diode array spectrophotometer; constant baseline shifts; and baseline drift. Introducing artificial baseline alterations in some samples of the calibration matrix was found to increase the reliability of the proposed method in routine analysis. The method was applied to the analysis of mixtures of Ti, AI, and Fe by resolving the spectra of their 8-hydroxyquinoline complexes previously extracted into chloroform.


2013 ◽  
Vol 38 (1) ◽  
pp. 39-45
Author(s):  
Peng Song ◽  
Li Zhao ◽  
Yongqiang Bao

Abstract The Gaussian mixture model (GMM) method is popular and efficient for voice conversion (VC), but it is often subject to overfitting. In this paper, the principal component regression (PCR) method is adopted for the spectral mapping between source speech and target speech, and the numbers of principal components are adjusted properly to prevent the overfitting. Then, in order to better model the nonlinear relationships between the source speech and target speech, the kernel principal component regression (KPCR) method is also proposed. Moreover, a KPCR combined with GMM method is further proposed to improve the accuracy of conversion. In addition, the discontinuity and oversmoothing problems of the traditional GMM method are also addressed. On the one hand, in order to solve the discontinuity problem, the adaptive median filter is adopted to smooth the posterior probabilities. On the other hand, the two mixture components with higher posterior probabilities for each frame are chosen for VC to reduce the oversmoothing problem. Finally, the objective and subjective experiments are carried out, and the results demonstrate that the proposed approach shows greatly better performance than the GMM method. In the objective tests, the proposed method shows lower cepstral distances and higher identification rates than the GMM method. While in the subjective tests, the proposed method obtains higher scores of preference and perceptual quality.


Author(s):  
Shuichi Kawano

AbstractPrincipal component regression (PCR) is a two-stage procedure: the first stage performs principal component analysis (PCA) and the second stage builds a regression model whose explanatory variables are the principal components obtained in the first stage. Since PCA is performed using only explanatory variables, the principal components have no information about the response variable. To address this problem, we present a one-stage procedure for PCR based on a singular value decomposition approach. Our approach is based upon two loss functions, which are a regression loss and a PCA loss from the singular value decomposition, with sparse regularization. The proposed method enables us to obtain principal component loadings that include information about both explanatory variables and a response variable. An estimation algorithm is developed by using the alternating direction method of multipliers. We conduct numerical studies to show the effectiveness of the proposed method.


Author(s):  
Margaretha Ohyver

Principal Component Regression (PCR) is one method to handle multicollinear problems. PCR produces principal components that have a VIF less than ten. The purpose for this research is to obtained PCR model using R software. The result is a model of PCR with two principal components and determination coefficients R(square) = 97,27%.


2005 ◽  
Vol 2 ◽  
pp. 1-5 ◽  
Author(s):  
O. Schimmer ◽  
F. Daschner ◽  
M. Kent ◽  
R. Knöchel

Abstract. This paper presents a novel approach for determining certain material properties from permittivity measurements. It does not rely on physical models and mixture formulas. Instead, the reflection time domain response of the material to a step impulse is evaluated in a narrow time window near the steepest ascent of the pulse. A dedicated time domain spectrometer is introduced, which records the data. Principal components are derived directly from the time domain data. The most significant principal components are used to establish a principal component regression formula for prediction of the required material properties. The viability and accuracy of the method is demonstrated by applying it to measurements of the storage time of some chilled fish samples.


Sign in / Sign up

Export Citation Format

Share Document