scholarly journals Pseudo-proxy evaluation of Climate Field Reconstruction methods of North Atlantic climate based on an annually resolved marine proxy network

2017 ◽  
Author(s):  
Maria Pyrina ◽  
Sebastian Wagner ◽  
Eduardo Zorita

Abstract. Two statistical methods are tested to reconstruct the inter-annual variations of past sea surface temperatures (SSTs) of the North Atlantic (NA) Ocean over the past millennium, based on annually resolved and absolutely dated marine proxy records of the bivalve mollusk Arctica islandica. The methods are tested in a pseudo-proxy experiment (PPE) set-up using state-of-the-art climate models (CMIP5 Earth System Models) and reanalysis data from the COBE2 SST data set. The methods were applied in the virtual reality provided by global climate simulations and reanalysis data to reconstruct the past NA SSTs, using pseudoproxy records that mimic the statistical characteristics and network of Arctica islandica. The multivariate linear regression methods evaluated here are Principal Component Regression and Canonical Correlation Analysis. Differences in the skill of the Climate Field Reconstruction (CFR) are assessed according to different calibration periods and different proxy locations within the NA basin. The choice of the climate model used as surrogate reality in the PPE has a more profound effect on the CFR skill than the calibration period and the statistical reconstruction method. The differences between the two methods are clearer for the MPI-ESM model, due to its higher spatial resolution in the NA basin. The pseudo-proxy results of the CCSM4 model are closer to the pseudo-proxy results based on the reanalysis data set COBE2. The addition of noise in the pseudo-proxies is important for the evaluation of the methods, as more spatial differences in the reconstruction skill are revealed. More profound differences between methods are obtained when the number of proxy records is smaller than five, making the Principal Component Regression a more appropriate method in this case. Despite the differences, the results show that the marine network of Arctica islandica can be used to skilfully reconstruct the spatial patterns of SSTs at the eastern NA basin.

2017 ◽  
Vol 13 (10) ◽  
pp. 1339-1354 ◽  
Author(s):  
Maria Pyrina ◽  
Sebastian Wagner ◽  
Eduardo Zorita

Abstract. Two statistical methods are tested to reconstruct the interannual variations in past sea surface temperatures (SSTs) of the North Atlantic (NA) Ocean over the past millennium based on annually resolved and absolutely dated marine proxy records of the bivalve mollusk Arctica islandica. The methods are tested in a pseudo-proxy experiment (PPE) setup using state-of-the-art climate models (CMIP5 Earth system models) and reanalysis data from the COBE2 SST data set. The methods were applied in the virtual reality provided by global climate simulations and reanalysis data to reconstruct the past NA SSTs using pseudo-proxy records that mimic the statistical characteristics and network of Arctica islandica. The multivariate linear regression methods evaluated here are principal component regression and canonical correlation analysis. Differences in the skill of the climate field reconstruction (CFR) are assessed according to different calibration periods and different proxy locations within the NA basin. The choice of the climate model used as a surrogate reality in the PPE has a more profound effect on the CFR skill than the calibration period and the statistical reconstruction method. The differences between the two methods are clearer for the MPI-ESM model due to its higher spatial resolution in the NA basin. The pseudo-proxy results of the CCSM4 model are closer to the pseudo-proxy results based on the reanalysis data set COBE2. Conducting PPEs using noise-contaminated pseudo-proxies instead of noise-free pseudo-proxies is important for the evaluation of the methods, as more spatial differences in the reconstruction skill are revealed. Both methods are appropriate for the reconstruction of the temporal evolution of the NA SSTs, even though they lead to a great loss of variance away from the proxy sites. Under reasonable assumptions about the characteristics of the non-climate noise in the proxy records, our results show that the marine network of Arctica islandica can be used to skillfully reconstruct the spatial patterns of SSTs at the eastern NA basin.


2020 ◽  
Vol 13 (2) ◽  
pp. 841-858 ◽  
Author(s):  
Simon Michel ◽  
Didier Swingedouw ◽  
Marie Chavent ◽  
Pablo Ortega ◽  
Juliette Mignot ◽  
...  

Abstract. Modes of climate variability strongly impact our climate and thus human society. Nevertheless, the statistical properties of these modes remain poorly known due to the short time frame of instrumental measurements. Reconstructing these modes further back in time using statistical learning methods applied to proxy records is useful for improving our understanding of their behaviour. For doing so, several statistical methods exist, among which principal component regression is one of the most widely used in paleoclimatology. Here, we provide the software ClimIndRec to the climate community; it is based on four regression methods (principal component regression, PCR; partial least squares, PLS; elastic net, Enet; random forest, RF) and cross-validation (CV) algorithms, and enables the systematic reconstruction of a given climate index. A prerequisite is that there are proxy records in the database that overlap in time with its observed variations. The relative efficiency of the methods can vary, according to the statistical properties of the mode and the proxy records used. Here, we assess the sensitivity to the reconstruction technique. ClimIndRec is modular as it allows different inputs like the proxy database or the regression method. As an example, it is here applied to the reconstruction of the North Atlantic Oscillation by using the PAGES 2k database. In order to identify the most reliable reconstruction among those given by the different methods, we use the modularity of ClimIndRec to investigate the sensitivity of the methodological setup to other properties such as the number and the nature of the proxy records used as predictors or the targeted reconstruction period. We obtain the best reconstruction of the North Atlantic Oscillation (NAO) using the random forest approach. It shows significant correlation with former reconstructions, but exhibits higher validation scores.


2003 ◽  
Vol 11 (1) ◽  
pp. 55-70 ◽  
Author(s):  
Laila Stordrange ◽  
Olav M. Kvalheim ◽  
Per A. Hassel ◽  
Dick Malthe-Sørenssen ◽  
Fred Olav Libnau

Partial least squares (PLS) is a powerful tool for multivariate linear regression. But what if the data show a non-linear structure? Near infrared spectra from a pharmaceutical process were used as a case study. An ANOVA test revealed that the data are well described by a 2nd order polynomial. This work investigates the application of regression techniques that account for slightly non-linear data. The regression techniques investigated are: linearising data by applying transformations, local PLS, i.e. splitting of data, and quadratic PLS. These models were compared with ordinary PLS and principal component regression (PCR). The predictive ability of the models was tested on an independent data set acquired a year later. Using the knowledge of non-linear pattern and important spectral regions, simpler models with better predictive ability can be obtained.


2013 ◽  
Vol 9 (3) ◽  
pp. 1153-1160 ◽  
Author(s):  
Q. Ge ◽  
Z. Hao ◽  
J. Zheng ◽  
X. Shao

Abstract. We use principal component regression and partial least squares regression to separately reconstruct a composite series of temperature variations in China, and associated uncertainties, at a decadal resolution over the past 2000 yr. The reconstruction is developed using proxy temperature data with relatively high confidence levels from five regions across China, and using a temperature series from observations by the Chinese Meteorological Administration, covering the period from 1871 to 2000. Relative to the 1851–1950 climatology, our two reconstructions show four warm intervals during AD 1–AD 200, AD 551–AD 760, AD 951–AD 1320, and after AD 1921, and four cold intervals during AD 201–AD 350, AD 441–AD 530, AD 781–AD 950, and AD 1321–AD 1920. The temperatures during AD 981–AD 1100 and AD 1201–AD 1270 are comparable to those of the Present Warm Period, but have an uncertainty of ±0.28 °C to ±0.42 °C at the 95% confidence interval. Temperature variations over China are typically in phase with those of the Northern Hemisphere (NH) after 1000, a period which covers the Medieval Climate Anomaly, the Little Ice Age, and the Present Warm Period. In contrast, a warm period in China during AD 541–AD 740 is not obviously seen in the NH.


2018 ◽  
Author(s):  
Simon Michel ◽  
Didier Swingedouw ◽  
Marie Chavent ◽  
Pablo Ortega ◽  
Juliette Mignot ◽  
...  

Abstract. Modes of climate variability strongly impact our climate and thus human society. Nevertheless, their statistical properties remain poorly known due to the short time frame of instrumental measurements. Reconstructing these modes further back in time using statistical learning methods applied to proxy records is a useful way to improve our understanding of their behaviours and meteorological impacts. For doing so, several statistical reconstruction methods exist, among which the Principal Component Regression is one of the most widely used. Additional predictive, and then reconstructive, statistical methods have been developed recently, following the advent of big data. Here, we provide to the climate community a multi-statistical toolbox, based on four statistical learning methods and cross validation algorithms, that enables systematic reconstruction of any climate mode of variability as long as there are proxy records that overlap in time with the observed variations of the considered mode. The efficiency of the methods can vary, depending on the statistical properties of the mode and the learning set, thereby allowing to assess sensitivity related to the reconstruction techniques. This toolbox is modular in the sense that it allows different inputs like the proxy database or the chosen variability mode. As an example, the toolbox is here applied to the reconstruction of the North Atlantic Oscillation (NAO) by using Pages 2K database. In order to identify the most reliable reconstruction among those given by the different methods, we also investigate the sensitivity to the methodological setup to other properties such as the number and the nature of the proxy records used as predictors or the reconstruction period targeted. The best reconstruction of the NAO that we thus obtain shows significant correlation with former reconstructions, but exhibits better validation scores.


2010 ◽  
Vol 08 (04) ◽  
pp. 645-659 ◽  
Author(s):  
YICHUAN ZHAO ◽  
GUOSHEN WANG

In order to predict future patients' survival time based on their microarray gene expression data, one interesting question is how to relate genes to survival outcomes. In this paper, by applying a semi-parametric additive risk model in survival analysis, we propose a new approach to conduct a careful analysis of gene expression data with the focus on the model's predictive ability. In the proposed method, we apply the correlation principal component regression to deal with right censoring survival data under the semi-parametric additive risk model frame with high-dimensional covariates. We also employ the time-dependent area under the receiver operating characteristic curve and root mean squared error for prediction to assess how well the model can predict the survival time. Furthermore, the proposed method is able to identify significant genes, which are significantly related to the disease. Finally, the proposed useful approach is illustrated by the diffuse large B-cell lymphoma data set and breast cancer data set. The results show that the model fits the data sets very well.


2013 ◽  
Vol 10 (1) ◽  
pp. 51-58 ◽  
Author(s):  
P. E. Bett ◽  
H. E. Thornton ◽  
R. T. Clark

Abstract. We present initial results of a study on the variability of wind speeds across Europe over the past 140 yr, making use of the recent Twentieth Century Reanalysis data set, which includes uncertainty estimates from an ensemble method of reanalysis. Maps of the means and standard deviations of daily wind speeds, and the Weibull-distribution parameters, show the expected features, such as the strong, highly-variable wind in the north-east Atlantic. We do not find any clear, strong long-term trends in wind speeds across Europe, and the variability between decades is large. We examine how different years and decades are related in the long-term context, by looking at the ranking of annual mean wind speeds. Picking a region covering eastern England as an example, our analyses show that the wind speeds there over the past ~ 20 yr are within the range expected from natural variability, but do not span the full range of variability of the 140-yr data set. The calendar-year 2010 is however found to have the lowest mean wind speed on record for this region.


2005 ◽  
Vol 13 (5) ◽  
pp. 241-254 ◽  
Author(s):  
Ralf Marbach

A new method for multivariate calibration is described that combines the best features of “classical” (also called “physical” or “K-matrix”) calibration and “inverse” (or “statistical” or “P-matrix”) calibration. By estimating the spectral signal in the physical way and the spectral noise in the statistical way, so to speak, the prediction accuracy of the inverse model can be combined with the low cost and ease of interpretability of the classical model, including “built-in” proof of specificity of response. The cost of calibration is significantly reduced compared to today's standard practice of statistical calibration using partial least squares or principal component regression, because the need for lab-reference values is virtually eliminated. The method is demonstrated on a data set of near-infrared spectra from pharmaceutical tablets, which is available on the web (so-called Chambersburg Shoot-out 2002 data set). Another benefit is that the correct definitions of the “limits of multivariate detection” become obvious. The sensitivity of multivariate measurements is shown to be limited by the so-called “spectral noise,” and the specificity is shown to be limited by potentially existing “unspecific correlations.” Both limits are testable from first principles, i.e. from measurable pieces of data and without the need to perform any calibration.


2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Aleksander Jaworski ◽  
Hanna Wikiel ◽  
Kazimierz Wikiel

The Real Time Analyzer (RTA) utilizing DC- and AC-voltammetric techniques is an in situ, online monitoring system that provides a complete chemical analysis of different electrochemical deposition solutions. The RTA employs multivariate calibration when predicting concentration parameters from a multivariate data set. Although the hierarchical and multiblock Principal Component Regression- (PCR-) and Partial Least Squares- (PLS-) based methods can handle data sets even when the number of variables significantly exceeds the number of samples, it can be advantageous to reduce the number of variables to obtain improvement of the model predictions and better interpretation. This presentation focuses on the introduction of a multistep, rigorous method of data-selection-based Least Squares Regression, Simple Modeling of Class Analogy modeling power, and, as a novel application in electroanalysis, Uninformative Variable Elimination by PLS and by PCR, Variable Importance in the Projection coupled with PLS, Interval PLS, Interval PCR, and Moving Window PLS. Selection criteria of the optimum decomposition technique for the specific data are also demonstrated. The chief goal of this paper is to introduce to the community of electroanalytical chemists numerous variable selection methods which are well established in spectroscopy and can be successfully applied to voltammetric data analysis.


Sign in / Sign up

Export Citation Format

Share Document