Applications of matrix factorization methods to climate data

Abstract. An initial dimension reduction forms an integral part of many analyses in climate science. Different methods yield low-dimensional representations that are based on differing aspects of the data. Depending on the features of the data that are relevant for a given study, certain methods may be more suitable than others, for instance yielding bases that can be more easily identified with physically meaningful modes. To illustrate the distinction between particular methods and identify circumstances in which a given method might be preferred, in this paper we present a set of case studies comparing the results obtained using the traditional approaches of empirical orthogonal function analysis and k-means clustering with the more recently introduced methods such as archetypal analysis and convex coding. For data such as global sea surface temperature anomalies, in which there is a clear, dominant mode of variability, all of the methods considered yield rather similar bases with which to represent the data while differing in reconstruction accuracy for a given basis size. However, in the absence of such a clear scale separation, as in the case of daily geopotential height anomalies, the extracted bases differ much more significantly between the methods. We highlight the importance in such cases of carefully considering the relevant features of interest and of choosing the method that best targets precisely those features so as to obtain more easily interpretable results.

Download Full-text

Applications of matrix factorization methods to climate data

10.5194/npg-2020-7 ◽

2020 ◽

Author(s):

Dylan Harries ◽

Terence J. O'Kane

Keyword(s):

Climate Science ◽

Dominant Mode ◽

Climate Data ◽

Reconstruction Accuracy ◽

Scale Separation ◽

Archetypal Analysis ◽

Basis Size ◽

Sea Surface Temperature Anomalies ◽

Low Dimensional ◽

Traditional Approaches

Abstract. An initial dimension reduction forms an integral part of many analyses in climate science. Different methods yield low-dimensional representations that are based on differing aspects of the data. Depending on the features of the data that are relevant for a given study, certain methods may be more suitable than others, for instance yielding bases that can be more easily identified with physically meaningful modes. To illustrate the distinction between particular methods and identify circumstances in which a given method might be preferred, in this paper we present a set of case studies comparing the results obtained using the traditional approaches of EOF analysis and k-means clustering with the more recently introduced methods such as archetypal analysis and convex coding. For data such as global sea surface temperature anomalies, in which there is a clear, dominant mode of variability, all of the methods considered yield rather similar bases with which to represent the data, while differing in reconstruction accuracy for a given basis size. However, in the absence of such a clear scale separation, as in the case of daily geopotential height anomalies, the extracted bases differ much more significantly between the methods. We highlight the importance in such cases of carefully considering the relevant features of interest, and of choosing the method that best targets precisely those features so as to obtain more easily interpretable results.

Download Full-text

Seasonal Variability of SST Fronts in the Inner Sea of Chiloé and Its Adjacent Coastal Ocean, Northern Patagonia

Remote Sensing ◽

10.3390/rs13020181 ◽

2021 ◽

Vol 13 (2) ◽

pp. 181

Author(s):

Gonzalo S. Saldías ◽

Wilber Hernández ◽

Carlos Lara ◽

Richard Muñoz ◽

Cristian Rojas ◽

...

Keyword(s):

Coastal Upwelling ◽

Environmental Variability ◽

Function Analysis ◽

Coastal Ocean ◽

Dominant Mode ◽

Early Summer ◽

High Biological Activity ◽

Northern Patagonia ◽

Sst Gradient ◽

Sst Fronts

Surface oceanic fronts are regions characterized by high biological activity. Here, Sea Surface Temperature (SST) fronts are analyzed for the period 2003–2019 using the Multi-scale Ultra-high Resolution (MUR) SST product in northern Patagonia, a coastal region with high environmental variability through river discharges and coastal upwelling events. SST gradient magnitudes were maximum off Chiloé Island in summer and fall, coherent with the highest frontal probability in the coastal oceanic area, which would correspond to the formation of a coastal upwelling front in the meridional direction. Increased gradient magnitudes in the Inner Sea of Chiloé (ISC) were found primarily in spring and summer. The frontal probability analysis revealed the highest occurrences were confined to the northern area (north of Desertores Islands) and around the southern border of Boca del Guafo. An Empirical Orthogonal Function analysis was performed to clarify the dominant modes of variability in SST gradient magnitudes. The meridional coastal fronts explained the dominant mode (78% of the variance) off Chiloé Island, which dominates in summer, whereas the SST fronts inside the ISC (second mode; 15.8%) were found to dominate in spring and early summer (October–January). Future efforts are suggested focusing on high frontal probability areas to study the vertical structure and variability of the coastal fronts in the ISC and its adjacent coastal ocean.

Download Full-text

Low Dimensional Embedding of Climate Data for Radio Astronomical Site Testing in the Colombian Andes

Publications of the Astronomical Society of the Pacific ◽

10.1088/1538-3873/aa83fe ◽

2017 ◽

Vol 129 (980) ◽

pp. 105002

Author(s):

Germán Chaparro Molano ◽

Oscar Leonardo Ramírez Suárez ◽

Oscar Alberto Restrepo Gaitán ◽

Alexander Marcial Martínez Mercado

Keyword(s):

Climate Data ◽

Site Testing ◽

Colombian Andes ◽

Low Dimensional

Download Full-text

Spectral Empirical Orthogonal Function Analysis of Weather and Climate Data

Monthly Weather Review ◽

10.1175/mwr-d-18-0337.1 ◽

2019 ◽

Vol 147 (8) ◽

pp. 2979-2995 ◽

Cited By ~ 1

Author(s):

Oliver T. Schmidt ◽

Gianmarco Mengaldo ◽

Gianpaolo Balsamo ◽

Nils P. Wedi

Keyword(s):

Frequency Domain ◽

Empirical Orthogonal Function ◽

Southern Oscillation ◽

Function Analysis ◽

Reanalysis Data ◽

Climate Data ◽

Quasi Biennial Oscillation ◽

Linear Dynamical Systems ◽

Orthogonal Function ◽

Spectral Density Matrix

Abstract We apply spectral empirical orthogonal function (SEOF) analysis to educe climate patterns as dominant spatiotemporal modes of variability from reanalysis data. SEOF is a frequency-domain variant of standard empirical orthogonal function (EOF) analysis, and computes modes that represent the statistically most relevant and persistent patterns from an eigendecomposition of the estimated cross-spectral density matrix (CSD). The spectral estimation step distinguishes the approach from other frequency-domain EOF methods based on a single realization of the Fourier transform, and results in a number of desirable mathematical properties: at each frequency, SEOF yields a set of orthogonal modes that are optimally ranked in terms of variance in the L2 sense, and that are coherent in both space and time by construction. We discuss the differences between SEOF and other competing approaches, as well as its relation to dynamical modes of stochastically forced, nonnormal linear dynamical systems. The method is applied to ERA-Interim and ERA-20C reanalysis data, demonstrating its ability to identify a number of well-known spatiotemporal coherent meteorological patterns and teleconnections, including the Madden–Julian oscillation (MJO), the quasi-biennial oscillation (QBO), and the El Niño–Southern Oscillation (ENSO) (i.e., a range of phenomena reoccurring with average periods ranging from months to many years). In addition to two-dimensional univariate analyses of surface data, we give examples of multivariate and three-dimensional meteorological patterns that illustrate how this technique can systematically identify coherent structures from different sets of data. The MATLAB code used to compute the results presented in this study, including the download scripts for the reanalysis data, is freely available online.

Download Full-text

Evaluating Lossy Data Compression on Climate Simulation Data within a Large Ensemble

10.5194/gmd-2016-146 ◽

2016 ◽

Cited By ~ 2

Author(s):

Allison H. Baker ◽

Dorit M. Hammerling ◽

Sheri A. Mickleson ◽

Haiying Xu ◽

Martin B. Stolpe ◽

...

Keyword(s):

Data Compression ◽

Climate Science ◽

Original Data ◽

Earth System Model ◽

System Model ◽

Climate Simulation ◽

Earth System ◽

Climate Data ◽

Large Ensemble ◽

Simulation Data

Abstract. High-resolution earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives by forcing reductions in data output frequency, simulation length, or ensemble size, for example. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that has undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.

Download Full-text

Representing Climate Data Visually: Inquiry-based Lessons about Climate Science and Climate Change for K-12 Students Using Local, Online Climate Data for Spatial and Graphical Analysis

The International Journal of Climate Change Impacts and Responses ◽

10.18848/1835-7156/cgp/v06i01/37216 ◽

2014 ◽

Vol 6 (1) ◽

pp. 11-19

Author(s):

Lawrence Rudd

Keyword(s):

Climate Change ◽

Climate Science ◽

Graphical Analysis ◽

Climate Data ◽

K 12

Download Full-text

Climatology in support of climate risk management

Progress in Physical Geography Earth and Environment ◽

10.1177/0309133315578941 ◽

2015 ◽

Vol 39 (4) ◽

pp. 536-553 ◽

Cited By ~ 10

Author(s):

Glenn McGregor

Keyword(s):

Risk Management ◽

Climate Science ◽

Climate Risk ◽

Science Literature ◽

Climate Information ◽

Climate Data ◽

Climate Risks ◽

Wide Range ◽

Climate Risk Management ◽

Climate Events

Climate risk management has emerged over the last decade as a distinct area of activity within the wider field of climatology. Its focus is on integrating climate and non-climate information in order to enhance the decision-making process in a wide range of climate-sensitive sectors of society, the economy and the environment. Given the burgeoning pure and applied climate science literature that addresses a range of climate risks, the purpose of this progress report is to provide an overview of recent developments in the field of climatology that may contribute to the risk assessment component of climate risk management. Data rescue and climate database construction, hurricanes and droughts as examples of extreme climate events and seasonal climate forecasting are focused on in this report and are privileged over other topics because of either their fundamental importance for establishing event probability or scale of societal impact. The review of the literature finds that historical data rescue, climate reconstruction and the compilation of climate data bases has assisted immensely in understanding past climate events and increasing the information base for managing climate risk. Advances in the scientific understanding of the causes and the characterization of hurricanes and droughts stand to benefit the management of these two extreme events while work focused on unravelling the nature of ocean–atmosphere interactions and associated climate anomalies at the seasonal timescale has provided the basis for the possible seasonal forecasting of a range of climate events. The report also acknowledges that despite the potential of climate information to assist with managing climate risk, its uptake by decision makers should not be automatically assumed by the climatological community.

Download Full-text

Archetypal Analysis and DEA Model, Their Application on Financial Data and Visualization with PHATE

Entropy ◽

10.3390/e24010088 ◽

2022 ◽

Vol 24 (1) ◽

pp. 88

Author(s):

Urszula Grzybowska ◽

Marek Karwański

Keyword(s):

Stock Exchange ◽

Dimensional Space ◽

Investment Strategies ◽

Financial Indicators ◽

Archetypal Analysis ◽

New Approach ◽

Macroeconomic Analysis ◽

Warsaw Stock Exchange ◽

Low Dimensional ◽

Insight Into

One of the goals of macroeconomic analysis is to rank and segment enterprises described by many financial indicators. The segmentation can be used for investment strategies or risk evaluation. The aim of this research was to distinguish groups of similar objects and visualize the results in a low dimensional space. In order to obtain clusters of similar objects, the authors applied a DEA BCC model and archetypal analysis for a set of companies described by financial indicators and listed on the Warsaw Stock Exchange. The authors showed that both methods give consistent results. To get a better insight into the data structure as well as a visualization of the similarities between objects, the authors used a new approach called the PHATE algorithm. It allowed the results of DEA and archetypal analysis to be visualized in a low dimensional space.

Download Full-text

Modulation of hydroxyl variability by ENSO in the absence of external forcing

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1807532115 ◽

2018 ◽

Vol 115 (36) ◽

pp. 8931-8936 ◽

Cited By ~ 10

Author(s):

Alexander J. Turner ◽

Inez Fung ◽

Vaishali Naik ◽

Larry W. Horowitz ◽

Ronald C. Cohen

Keyword(s):

Climate Model ◽

Southern Oscillation ◽

Function Analysis ◽

Oxidative Capacity ◽

Dominant Mode ◽

External Forcing ◽

External Forcings ◽

Methane Budget ◽

Interannual Fluctuations ◽

The Impact

The hydroxyl radical (OH) is the primary oxidant in the troposphere, and the impact of its fluctuations on the methane budget has been disputed in recent years, however measurements of OH are insufficient to characterize global interannual fluctuations relevant for methane. Here, we use a 6,000-y control simulation of preindustrial conditions with a chemistry-climate model to quantify the natural variability in OH and internal feedbacks governing that variability. We find that, even in the absence of external forcing, maximum OH changes are 3.8 ± 0.8% over a decade, which is large in the context of the recent methane growth from 2007–2017. We show that the OH variability is not a white-noise process. A wavelet analysis indicates that OH variability exhibits significant feedbacks with the same periodicity as the El Niño–Southern Oscillation (ENSO). We find intrinsically generated modulation of the OH variability, suggesting that OH may show periods of rapid or no change in future decades that are solely due to the internal climate dynamics (as opposed to external forcings). An empirical orthogonal function analysis further indicates that ENSO is the dominant mode of OH variability, with the modulation of OH occurring primarily through lightningNOx. La Niña is associated with an increase in convection in the Tropical Pacific, which increases the simulated occurrence of lightning and allows for more OH production. Understanding this link between OH and ENSO may improve the predictability of the oxidative capacity of the troposphere and assist in elucidating the causes of current and historical trends in methane.

Download Full-text

Interannual hydroclimatic variability and its influence on winter nutrient loadings over the Southeast United States

Hydrology and Earth System Sciences ◽

10.5194/hess-16-2285-2012 ◽

2012 ◽

Vol 16 (7) ◽

pp. 2285-2298 ◽

Cited By ~ 15

Author(s):

J. Oh ◽

A. Sankarasubramanian

Keyword(s):

Interannual Variability ◽

Nutrient Management ◽

Southern Oscillation ◽

Management Strategies ◽

Dominant Mode ◽

Coefficient Of Determination ◽

Nutrient Loadings ◽

Climate Forecasts ◽

Hydroclimatic Variability ◽

Low Dimensional

Abstract. It is well established in the hydroclimatic literature that the interannual variability in seasonal streamflow could be partially explained using climatic precursors such as tropical sea surface temperature (SST) conditions. Similarly, it is widely known that streamflow is the most important predictor in estimating nutrient loadings and the associated concentration. The intent of this study is to bridge these two findings so that nutrient loadings could be predicted using season-ahead climate forecasts forced with forecasted SSTs. By selecting 18 relatively undeveloped basins in the Southeast US (SEUS), we relate winter (January-February-March, JFM) precipitation forecasts that influence the JFM streamflow over the basin to develop winter forecasts of nutrient loadings. For this purpose, we consider two different types of low-dimensional statistical models to predict 3-month ahead nutrient loadings based on retrospective climate forecasts. Split sample validation of the predictive models shows that 18–45% of interannual variability in observed winter nutrient loadings could be predicted even before the beginning of the season for at least 8 stations. Stations that have very high coefficient of determination (> 0.8) in predicting the observed water quality network (WQN) loadings during JFM exhibit significant skill in predicting seasonal total nitrogen (TN) loadings using climate forecasts. Incorporating antecedent flow conditions (December flow) as an additional predictor did not increase the explained variance in these stations, but substantially reduced the root-mean-square error (RMSE) in the predicted loadings. Relating the dominant mode of winter nutrient loadings over 18 stations clearly illustrates the association with El Niño Southern Oscillation (ENSO) conditions. Potential utility of these season-ahead nutrient predictions in developing proactive and adaptive nutrient management strategies is also discussed.

Download Full-text