scholarly journals Applications of matrix factorization methods to climate data

2020 ◽  
Author(s):  
Dylan Harries ◽  
Terence J. O'Kane

Abstract. An initial dimension reduction forms an integral part of many analyses in climate science. Different methods yield low-dimensional representations that are based on differing aspects of the data. Depending on the features of the data that are relevant for a given study, certain methods may be more suitable than others, for instance yielding bases that can be more easily identified with physically meaningful modes. To illustrate the distinction between particular methods and identify circumstances in which a given method might be preferred, in this paper we present a set of case studies comparing the results obtained using the traditional approaches of EOF analysis and k-means clustering with the more recently introduced methods such as archetypal analysis and convex coding. For data such as global sea surface temperature anomalies, in which there is a clear, dominant mode of variability, all of the methods considered yield rather similar bases with which to represent the data, while differing in reconstruction accuracy for a given basis size. However, in the absence of such a clear scale separation, as in the case of daily geopotential height anomalies, the extracted bases differ much more significantly between the methods. We highlight the importance in such cases of carefully considering the relevant features of interest, and of choosing the method that best targets precisely those features so as to obtain more easily interpretable results.

2020 ◽  
Vol 27 (3) ◽  
pp. 453-471
Author(s):  
Dylan Harries ◽  
Terence J. O'Kane

Abstract. An initial dimension reduction forms an integral part of many analyses in climate science. Different methods yield low-dimensional representations that are based on differing aspects of the data. Depending on the features of the data that are relevant for a given study, certain methods may be more suitable than others, for instance yielding bases that can be more easily identified with physically meaningful modes. To illustrate the distinction between particular methods and identify circumstances in which a given method might be preferred, in this paper we present a set of case studies comparing the results obtained using the traditional approaches of empirical orthogonal function analysis and k-means clustering with the more recently introduced methods such as archetypal analysis and convex coding. For data such as global sea surface temperature anomalies, in which there is a clear, dominant mode of variability, all of the methods considered yield rather similar bases with which to represent the data while differing in reconstruction accuracy for a given basis size. However, in the absence of such a clear scale separation, as in the case of daily geopotential height anomalies, the extracted bases differ much more significantly between the methods. We highlight the importance in such cases of carefully considering the relevant features of interest and of choosing the method that best targets precisely those features so as to obtain more easily interpretable results.


2017 ◽  
Vol 129 (980) ◽  
pp. 105002
Author(s):  
Germán Chaparro Molano ◽  
Oscar Leonardo Ramírez Suárez ◽  
Oscar Alberto Restrepo Gaitán ◽  
Alexander Marcial Martínez Mercado

2016 ◽  
Author(s):  
Allison H. Baker ◽  
Dorit M. Hammerling ◽  
Sheri A. Mickleson ◽  
Haiying Xu ◽  
Martin B. Stolpe ◽  
...  

Abstract. High-resolution earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives by forcing reductions in data output frequency, simulation length, or ensemble size, for example. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that has undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.


2015 ◽  
Vol 39 (4) ◽  
pp. 536-553 ◽  
Author(s):  
Glenn McGregor

Climate risk management has emerged over the last decade as a distinct area of activity within the wider field of climatology. Its focus is on integrating climate and non-climate information in order to enhance the decision-making process in a wide range of climate-sensitive sectors of society, the economy and the environment. Given the burgeoning pure and applied climate science literature that addresses a range of climate risks, the purpose of this progress report is to provide an overview of recent developments in the field of climatology that may contribute to the risk assessment component of climate risk management. Data rescue and climate database construction, hurricanes and droughts as examples of extreme climate events and seasonal climate forecasting are focused on in this report and are privileged over other topics because of either their fundamental importance for establishing event probability or scale of societal impact. The review of the literature finds that historical data rescue, climate reconstruction and the compilation of climate data bases has assisted immensely in understanding past climate events and increasing the information base for managing climate risk. Advances in the scientific understanding of the causes and the characterization of hurricanes and droughts stand to benefit the management of these two extreme events while work focused on unravelling the nature of ocean–atmosphere interactions and associated climate anomalies at the seasonal timescale has provided the basis for the possible seasonal forecasting of a range of climate events. The report also acknowledges that despite the potential of climate information to assist with managing climate risk, its uptake by decision makers should not be automatically assumed by the climatological community.


Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 88
Author(s):  
Urszula Grzybowska ◽  
Marek Karwański

One of the goals of macroeconomic analysis is to rank and segment enterprises described by many financial indicators. The segmentation can be used for investment strategies or risk evaluation. The aim of this research was to distinguish groups of similar objects and visualize the results in a low dimensional space. In order to obtain clusters of similar objects, the authors applied a DEA BCC model and archetypal analysis for a set of companies described by financial indicators and listed on the Warsaw Stock Exchange. The authors showed that both methods give consistent results. To get a better insight into the data structure as well as a visualization of the similarities between objects, the authors used a new approach called the PHATE algorithm. It allowed the results of DEA and archetypal analysis to be visualized in a low dimensional space.


2012 ◽  
Vol 16 (7) ◽  
pp. 2285-2298 ◽  
Author(s):  
J. Oh ◽  
A. Sankarasubramanian

Abstract. It is well established in the hydroclimatic literature that the interannual variability in seasonal streamflow could be partially explained using climatic precursors such as tropical sea surface temperature (SST) conditions. Similarly, it is widely known that streamflow is the most important predictor in estimating nutrient loadings and the associated concentration. The intent of this study is to bridge these two findings so that nutrient loadings could be predicted using season-ahead climate forecasts forced with forecasted SSTs. By selecting 18 relatively undeveloped basins in the Southeast US (SEUS), we relate winter (January-February-March, JFM) precipitation forecasts that influence the JFM streamflow over the basin to develop winter forecasts of nutrient loadings. For this purpose, we consider two different types of low-dimensional statistical models to predict 3-month ahead nutrient loadings based on retrospective climate forecasts. Split sample validation of the predictive models shows that 18–45% of interannual variability in observed winter nutrient loadings could be predicted even before the beginning of the season for at least 8 stations. Stations that have very high coefficient of determination (> 0.8) in predicting the observed water quality network (WQN) loadings during JFM exhibit significant skill in predicting seasonal total nitrogen (TN) loadings using climate forecasts. Incorporating antecedent flow conditions (December flow) as an additional predictor did not increase the explained variance in these stations, but substantially reduced the root-mean-square error (RMSE) in the predicted loadings. Relating the dominant mode of winter nutrient loadings over 18 stations clearly illustrates the association with El Niño Southern Oscillation (ENSO) conditions. Potential utility of these season-ahead nutrient predictions in developing proactive and adaptive nutrient management strategies is also discussed.


2018 ◽  
Vol 115 (29) ◽  
pp. 7509-7514 ◽  
Author(s):  
Jun Young Chung ◽  
Ashkan Vaziri ◽  
L. Mahadevan

We describe a minimal realization of reversibly programmable matter in the form of a featureless smooth elastic plate that has the capacity to store information in a Braille-like format as a sequence of stable discrete dimples. Simple experiments with cylindrical and spherical shells show that we can control the number, location, and the temporal order of these dimples, which can be written and erased at will. Theoretical analysis of the governing equations in a specialized setting and numerical simulations of the complete equations allow us to characterize the phase diagram for the formation of these localized elastic states, elastic bits (e-bits), consistent with our observations. Given that the inherent bistability and hysteresis in these low-dimensional systems arise exclusively due to the geometrical-scale separation, independent of material properties or absolute scale, our results might serve as alternate approaches to small-scale mechanical memories.


2020 ◽  
Author(s):  
Carlo Buontempo

<div>Climate adaptation often requires high resolution information about the expected changes in the statistical distribution of user-relevant variables. Thanks to targeted national programmes, research projects and international climate service initiatives  this kind of information is not only becoming more easily available but it is also making its way into building codes, engineering standards as well as the risk assessments for financial products.  If such an increase in the use of climate data can be seen as a positive step towards the construction of a climate resilient society, it is also true that the inconsistencies that exist between the information derived from different sources of information, have the potential to reduce the user uptake, increase the costs of adaptation and even undermine the credibility of both climate services and the underpinning climate science.</div><div>This paper offers a personal reflection on the emerging user requirements in this field. The presenation also aims at suggesting  some prelimimary ideas in support of the development of appropriate methodologies for extracting robust evidence from different sources in a scalable way.</div>


2016 ◽  
Vol 9 (12) ◽  
pp. 4381-4403 ◽  
Author(s):  
Allison H. Baker ◽  
Dorit M. Hammerling ◽  
Sheri A. Mickelson ◽  
Haiying Xu ◽  
Martin B. Stolpe ◽  
...  

Abstract. High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.


Sign in / Sign up

Export Citation Format

Share Document