Overcoming challenges in spatio-temporal modelling of large-scale (global) data

Modelling spatio-temporal data on a large scale presents a number of obstacles for statisticians and environmental scientists. Issues such as computational complexity, combining point and areal data, separation of sources into their component processes, and the handling of both large volumes of data in some areas and sparse data in others must be considered. We discuss methods to overcome such challenges within a Bayesian hierarchical modelling framework using INLA.In particular, we illustrate the approach using the example of source-separation of geophysical signals both on a continental and global scale. In such a setting, data tends to be available both at a local and areal level. We propose a novel approach for integrating such sources together using the INLA-SPDE method, which is normally reserved for point-level data. Additionally, the geophysical processes involved are both spatial (time-invariant) and spatio-temporal in nature. Separation of such processes into physically sensible components requires careful modelling and consideration of priors (such as physical model outputs where data is sparse), which will be discussed. We also consider methods to overcome the computational costs of modelling on such a large scale, from efficient mesh design, to thinning/aggregating of data, to considering alternative approaches for inference. This holistic approach to modelling of large-scale data ensures that spatial and spatio-temporal processes can be sensibly separated into their component parts, without being prohibitively expensive to model.

Download Full-text

Spatio-temporal decomposition of geophysical signals in North America

10.5194/egusphere-egu2020-16232 ◽

2020 ◽

Author(s):

Aoibheann Brady ◽

Jonathan Rougier ◽

Bramha Dutt Vishwakarma ◽

Yann Ziegler ◽

Richard Westaway ◽

...

Keyword(s):

North America ◽

Sea Level Rise ◽

Sea Level ◽

Global Scale ◽

Bayesian Hierarchical ◽

Isostatic Adjustment ◽

Modelling Framework ◽

Vertical Land Motion ◽

Early Results ◽

Spatio Temporal

Sea level rise is one of the most significant consequences of projected future changes in climate. One factor which influences sea level rise is vertical land motion (VLM) due to glacial isostatic adjustment (GIA), which changes the elevation of the ocean floor. Typically, GIA forward models are used for this purpose, but these are known to vary with the assumptions made about ice loading history and Earth structure. In this study, we implement a Bayesian hierarchical modelling framework to explore a data-driven VLM solution for North America, with the aim of separating out the overall signal into its GIA and hydrology (mass change) components. A Bayesian spatio-temporal model is implemented in INLA using satellite (GRACE) and in-situ (GPS) data as observations.&#160;Under the assumption that GIA varies in space but is constant in time, and that hydrology is both spatially- and temporally-variable, it is possible to separate the contributions of each component with an associated uncertainty level. Early results will be presented. Extensions to the BHM framework to investigate sea level rise at the global scale, such as the inclusion of additional processes and incorporation of increased volumes of data, will be discussed.

Download Full-text

Global scale hydrological modelling at 100 m, 1 h resolution, in Python

10.5194/egusphere-egu21-7154 ◽

2021 ◽

Author(s):

Kor de Jong ◽

Marc van Kreveld ◽

Debabrata Panja ◽

Oliver Schmitz ◽

Derek Karssenberg

Keyword(s):

Large Scale ◽

Model Building ◽

Flow Direction ◽

Building Blocks ◽

Global Scale ◽

Data Availability ◽

Hydrological Models ◽

Continental Scale ◽

Modelling Framework ◽

Scale Models

Data availability at global scale is increasing exponentially. Although considerable challenges remain regarding the identification of model structure and parameters of continental scale hydrological models, we will soon reach the situation that global scale models could be defined at very high resolutions close to 100 m or less. One of the key challenges is how to make simulations of these ultra-high resolution models tractable ([1]).Our research contributes by the development of a model building framework that is specifically designed to distribute calculations over multiple cluster nodes. This framework enables domain experts like hydrologists to develop their own large scale models, using a scripting language like Python, without the need to acquire the skills to develop low-level computer code for parallel and distributed computing.We present the design and implementation of this software framework and illustrate its use with a prototype 100 m, 1 h continental scale hydrological model. Our modelling framework ensures that any model built with it is parallelized. This is made possible by providing the model builder with a set of building blocks of models, which are coded in such a manner that parallelization of calculations occurs within and across these building blocks, for any combination of building blocks. There is thus full flexibility on the side of the modeller, without losing performance.This breakthrough is made possible by applying a novel approach to the implementation of the model building framework, called asynchronous many-tasks, provided by the HPX C++ software library ([3]). The code in the model building framework expresses spatial operations as large collections of interdependent tasks that can be executed efficiently on individual laptops as well as computer clusters ([2]). Our framework currently includes the most essential operations for building large scale hydrological models, including those for simulating transport of material through a flow direction network. By combining these operations, we rebuilt an existing 100 m, 1 h resolution model, thus far used for simulations of small catchments, requiring limited coding as we only had to replace the computational back end of the existing model. Runs at continental scale on a computer cluster show acceptable strong and weak scaling providing a strong indication that global simulations at this resolution will soon be possible, technically speaking.Future work will focus on extending the set of modelling operations and adding scalable I/O, after which existing models that are currently limited in their ability to use the computational resources available to them can be ported to this new environment.More information about our modelling framework is at https://lue.computationalgeography.org.References[1] M. Bierkens. Global hydrology 2015: State, trends, and directions. Water Resources Research, 51(7):4923&#8211;4947, 2015. [2] K. de Jong, et al. An environmental modelling framework based on asynchronous many-tasks: scalability and usability. Submitted. [3] H. Kaiser, et al. HPX - The C++ standard library for parallelism and concurrency. Journal of Open Source Software, 5(53):2352, 2020.

Download Full-text

Integrated mechanistic and data-driven modelling for multivariate analysis of signalling pathways

Journal of The Royal Society Interface ◽

10.1098/rsif.2005.0109 ◽

2006 ◽

Vol 3 (9) ◽

pp. 515-526 ◽

Cited By ~ 40

Author(s):

Fei Hua ◽

Sampsa Hautaniemi ◽

Rayka Yokoo ◽

Douglas A Lauffenburger

Keyword(s):

Decision Tree ◽

Mathematical Models ◽

Large Scale ◽

Therapeutic Interventions ◽

Data Driven ◽

Signalling Pathways ◽

Apoptotic Pathway ◽

Tree Analysis ◽

Modelling Framework ◽

Novel Approach

Mathematical models of highly interconnected and multivariate signalling networks provide useful tools to understand these complex systems. However, effective approaches to extracting multivariate regulation information from these models are still lacking. In this study, we propose a data-driven modelling framework to analyse large-scale multivariate datasets generated from mathematical models. We used an ordinary differential equation based model for the Fas apoptotic pathway as an example. The first step in our approach was to cluster simulation outputs generated from models with varied protein initial concentrations. Subsequently, decision tree analysis was applied, in which we used protein concentrations to predict the simulation outcomes. Our results suggest that no single subset of proteins can determine the pathway behaviour. Instead, different subsets of proteins with different concentrations ranges can be important. We also used the resulting decision tree to identify the minimal number of perturbations needed to change pathway behaviours. In conclusion, our framework provides a novel approach to understand the multivariate dependencies among molecules in complex networks, and can potentially be used to identify combinatorial targets for therapeutic interventions.

Download Full-text

On the spatio-temporal analysis of hydrological droughts from global hydrological models

Hydrology and Earth System Sciences Discussions ◽

10.5194/hessd-8-619-2011 ◽

2011 ◽

Vol 8 (1) ◽

pp. 619-652 ◽

Cited By ~ 1

Author(s):

G. A. Corzo Perez ◽

M. H. J. van Huijgevoort ◽

F. Voß ◽

H. A. J. van Lanen

Keyword(s):

Time Series ◽

20Th Century ◽

Large Scale ◽

Southern Oscillation ◽

Regional Scale ◽

Spatial Coherence ◽

Temporal Analysis ◽

Global Scale ◽

Scale Models ◽

Spatio Temporal

Abstract. The recent concerns for world-wide extreme events related to climate change phenomena have motivated the development of large scale models that simulate the global water cycle. In this context, analyses of extremes is an important topic that requires the adaptation of methods used for river basin and regional scale models. This paper presents two methodologies that extend the tools to analyze spatio-temporal drought development and characteristics using large scale gridded time series of hydrometeorological data. The methodologies are distinguished and defined as non-contiguous and contiguous drought area analyses (i.e. NCDA and CDA). The NCDA presents time series of percentages of areas in drought at the global scale and for pre-defined regions of known hydroclimatology. The CDA is introduced as a complementary method that generates information on the spatial coherence of drought events at the global scale. Spatial drought events are found through CDA by clustering patterns (contiguous areas). In this study the global hydrological model WaterGAP was used to illustrate the methodology development. Global gridded time series (resolution 0.5°) simulated with the WaterGAP model from land points were used. The NCDA and CDA were applied to identify drought events in subsurface runoff. The percentages of area in drought calculated with both methods show complementary information on the spatial and temporal events for the last decades of the 20th century. The NCDA provides relevant information on the average number of droughts, duration and severity (deficit volume) for pre-defined regions (globe, 2 selected climate regions). Additionally, the CDA provides information on the number of spatially linked areas in drought as well as their geographic location on the globe. An explorative validation process shows that the NCDA results capture the overall spatio-temporal drought extremes over the last decades of the 20th century. Events like the El Niño Southern Oscillation (ENSO) in South America and the pan-European drought in 1976 appeared clearly in both analyses. The methodologies introduced provide an important basis for the global characterization of droughts, model inter-comparison, and spatial events validation.

Download Full-text

On the spatio-temporal analysis of hydrological droughts from global hydrological models

Hydrology and Earth System Sciences ◽

10.5194/hess-15-2963-2011 ◽

2011 ◽

Vol 15 (9) ◽

pp. 2963-2978 ◽

Cited By ~ 41

Author(s):

G. A. Corzo Perez ◽

M. H. J. van Huijgevoort ◽

F. Voß ◽

H. A. J. van Lanen

Keyword(s):

Time Series ◽

20Th Century ◽

Large Scale ◽

Southern Oscillation ◽

Spatial Coherence ◽

Temporal Analysis ◽

Global Scale ◽

Hydrological Models ◽

Context Analysis ◽

Spatio Temporal

Abstract. The recent concerns for world-wide extreme events related to climate change have motivated the development of large scale models that simulate the global water cycle. In this context, analysis of hydrological extremes is important and requires the adaptation of identification methods used for river basin models. This paper presents two methodologies that extend the tools to analyze spatio-temporal drought development and characteristics using large scale gridded time series of hydrometeorological data. The methodologies are classified as non-contiguous and contiguous drought area analyses (i.e. NCDA and CDA). The NCDA presents time series of percentages of areas in drought at the global scale and for pre-defined regions of known hydroclimatology. The CDA is introduced as a complementary method that generates information on the spatial coherence of drought events at the global scale. Spatial drought events are found through CDA by clustering patterns (contiguous areas). In this study the global hydrological model WaterGAP was used to illustrate the methodology development. Global gridded time series of subsurface runoff (resolution 0.5°) simulated with the WaterGAP model from land points were used. The NCDA and CDA were developed to identify drought events in runoff. The percentages of area in drought calculated with both methods show complementary information on the spatial and temporal events for the last decades of the 20th century. The NCDA provides relevant information on the average number of droughts, duration and severity (deficit volume) for pre-defined regions (globe, 2 selected hydroclimatic regions). Additionally, the CDA provides information on the number of spatially linked areas in drought, maximum spatial event and their geographic location on the globe. Some results capture the overall spatio-temporal drought extremes over the last decades of the 20th century. Events like the El Niño Southern Oscillation (ENSO) in South America and the pan-European drought in 1976 appeared clearly in both analyses. The methodologies introduced provide an important basis for the global characterization of droughts, model inter-comparison of drought identified from global hydrological models and spatial event analyses.

Download Full-text

Deaths without denominators: using a matched dataset to study mortality patterns in the United States

10.31235/osf.io/q79ye ◽

2018 ◽

Author(s):

Monica Alexander

Keyword(s):

United States ◽

Large Scale ◽

The United States ◽

Estimation Methods ◽

Mortality Trends ◽

Bayesian Hierarchical ◽

Mortality Estimation ◽

Level Data ◽

National Trends ◽

Over Time

To understand national trends in mortality over time, it is important to study differences by demographic, socioeconomic and geographic characteristics. One issue with studying mortality inequalities, particularly by socioeconomic status, is that there are few micro-level data sources available that link an individual's SES with their eventual age and date of death. In this paper, a new dataset for studying mortality disparities and changes over time in the United States is presented. The dataset, termed 'CenSoc', uses two large-scale datasets: the full-count 1940 Census to obtain demographic, socioeconomic and geographic information; and that is linked to the Social Security Deaths Masterfile (SSDM) to obtain mortality information. This paper also develops mortality estimation methods to better use the 'deaths without denominators' information contained in CenSoc. Bayesian hierarchical methods are presented to estimate truncated death distributions over age and cohort, allowing for prior information in mortality trends to be incorporated and estimates of life expectancy and associated uncertainty to be produced.

Download Full-text

Green synthesis, characterization of silver sulfide nanoparticles and antibacterial activity evaluation

10.31221/osf.io/8byuc ◽

2019 ◽

Author(s):

Chem Int

Keyword(s):

Antibacterial Activity ◽

Large Scale ◽

Research Work ◽

Silver Sulfide ◽

Morphological Properties ◽

Effective Diameter ◽

Novel Approach ◽

Transmission Electron ◽

Green Route ◽

Silver Sulfide Nanoparticles

This research work presents a facile and green route for synthesis silver sulfide (Ag2SNPs) nanoparticles from silver nitrate (AgNO3) and sodium sulfide nonahydrate (Na2S.9H2O) in the presence of rosemary leaves aqueous extract at ambient temperature (27 oC). Structural and morphological properties of Ag2SNPs nanoparticles were analyzed by X-ray diffraction (XRD) and transmission electron microscopy (TEM). The surface Plasmon resonance for Ag2SNPs was obtained around 355 nm. Ag2SNPs was spherical in shape with an effective diameter size of 14 nm. Our novel approach represents a promising and effective method to large scale synthesis of eco-friendly antibacterial activity silver sulfide nanoparticles.

Download Full-text

Integrative Data Analysis from a Unifying Research Synthesis Perspective

10.1093/oso/9780190676001.003.0020 ◽

2018 ◽

Author(s):

Eun-Young Mun ◽

Anne E. Ray

Keyword(s):

Data Analysis ◽

Large Scale ◽

Research Synthesis ◽

Alcohol Intervention ◽

Data Set ◽

Integrative Data Analysis ◽

Level Data ◽

Model Complex ◽

Wide Range ◽

Individual Participant

Integrative data analysis (IDA) is a promising new approach in psychological research and has been well received in the field of alcohol research. This chapter provides a larger unifying research synthesis framework for IDA. Major advantages of IDA of individual participant-level data include better and more flexible ways to examine subgroups, model complex relationships, deal with methodological and clinical heterogeneity, and examine infrequently occurring behaviors. However, between-study heterogeneity in measures, designs, and samples and systematic study-level missing data are significant barriers to IDA and, more broadly, to large-scale research synthesis. Based on the authors’ experience working on the Project INTEGRATE data set, which combined individual participant-level data from 24 independent college brief alcohol intervention studies, it is also recognized that IDA investigations require a wide range of expertise and considerable resources and that some minimum standards for reporting IDA studies may be needed to improve transparency and quality of evidence.

Download Full-text

Documentary data and the study of past droughts: a global state of the art

Climate of the Past ◽

10.5194/cp-14-1915-2018 ◽

2018 ◽

Vol 14 (12) ◽

pp. 1915-1960 ◽

Cited By ~ 34

Author(s):

Rudolf Brázdil ◽

Andrea Kiss ◽

Jürg Luterbacher ◽

David J. Nash ◽

Ladislava Řezníčková

Keyword(s):

Large Scale ◽

State Of The Art ◽

Drought Indices ◽

Documentary Evidence ◽

Climatic Trends ◽

Instrumental Observations ◽

Spatio Temporal ◽

Epigraphic Evidence ◽

Administrative Evidence

Abstract. The use of documentary evidence to investigate past climatic trends and events has become a recognised approach in recent decades. This contribution presents the state of the art in its application to droughts. The range of documentary evidence is very wide, including general annals, chronicles, memoirs and diaries kept by missionaries, travellers and those specifically interested in the weather; records kept by administrators tasked with keeping accounts and other financial and economic records; legal-administrative evidence; religious sources; letters; songs; newspapers and journals; pictographic evidence; chronograms; epigraphic evidence; early instrumental observations; society commentaries; and compilations and books. These are available from many parts of the world. This variety of documentary information is evaluated with respect to the reconstruction of hydroclimatic conditions (precipitation, drought frequency and drought indices). Documentary-based drought reconstructions are then addressed in terms of long-term spatio-temporal fluctuations, major drought events, relationships with external forcing and large-scale climate drivers, socio-economic impacts and human responses. Documentary-based drought series are also considered from the viewpoint of spatio-temporal variability for certain continents, and their employment together with hydroclimate reconstructions from other proxies (in particular tree rings) is discussed. Finally, conclusions are drawn, and challenges for the future use of documentary evidence in the study of droughts are presented.

Download Full-text

Fractional ridge regression: a fast, interpretable reparameterization of ridge regression

GigaScience ◽

10.1093/gigascience/giaa133 ◽

2020 ◽

Vol 9 (12) ◽

Author(s):

Ariel Rokem ◽

Kendrick Kay

Keyword(s):

Ridge Regression ◽

Large Scale ◽

Imaging Data ◽

Regularization Technique ◽

Large Scale Data ◽

Novel Approach ◽

Manual Exploration ◽

L2 Norm ◽

Software Implementations ◽

Brain Imaging Data

Abstract Background Ridge regression is a regularization technique that penalizes the L2-norm of the coefficients in linear regression. One of the challenges of using ridge regression is the need to set a hyperparameter (α) that controls the amount of regularization. Cross-validation is typically used to select the best α from a set of candidates. However, efficient and appropriate selection of α can be challenging. This becomes prohibitive when large amounts of data are analyzed. Because the selected α depends on the scale of the data and correlations across predictors, it is also not straightforwardly interpretable. Results The present work addresses these challenges through a novel approach to ridge regression. We propose to reparameterize ridge regression in terms of the ratio γ between the L2-norms of the regularized and unregularized coefficients. We provide an algorithm that efficiently implements this approach, called fractional ridge regression, as well as open-source software implementations in Python and matlab (https://github.com/nrdg/fracridge). We show that the proposed method is fast and scalable for large-scale data problems. In brain imaging data, we demonstrate that this approach delivers results that are straightforward to interpret and compare across models and datasets. Conclusion Fractional ridge regression has several benefits: the solutions obtained for different γ are guaranteed to vary, guarding against wasted calculations; and automatically span the relevant range of regularization, avoiding the need for arduous manual exploration. These properties make fractional ridge regression particularly suitable for analysis of large complex datasets.

Download Full-text