Dynamical clustering: A new approach to make distributed (hydrological) modeling more efficient by dynamically detecting and removing redundant computations

Author(s):  
Uwe Ehret ◽  
Rik van Pruijssen ◽  
Marina Bortoli ◽  
Ralf Loritz ◽  
Elnaz Azmi ◽  
...  

<p>The structural properties of hydrological systems such as topography, soils or land use often show a considerable degree of spatial variability, and so do the drivers of systems dynamics, such as rainfall. Detailed statements about system states and responses therefore generally require spatially distributed and temporally highly resolved hydrological models. This comes at the price of substantial computational costs. However, even if hydrological sub systems potentially behave very differently, in practice we often find groups of these sub systems that behave similarly, but the number, size and characteristics of these groups varies in time. If we have knowledge of such clustered behavior of sub systems while running a model, we can increase computational efficiency by computing in full detail only a few representatives within each cluster, and assign results to the remaining cluster members. Thus, we avoid costly redundant computations. Unlike other methods designed to dynamically remove computational redundancies, such as adaptive gridding, dynamical clustering does not require spatial proximity of the model elements.</p><p>In our contribution, we present and discuss at the example of a distributed, conceptual hydrological model of the Attert basin in Luxembourg, i) a dimensionless approach to express dynamical similarity, ii) the temporal evolution of dynamical similarity in a 5-year period, iii) an approach to dynamically cluster and re-cluster model elements during run time based on an analysis of clustering stability, and iv) the effect of dynamical clustering with respect to computational gains and the associated losses of simulation quality.</p><p>For the Attert model, we found that there indeed exists high redundancy among model elements, that the degree of redundancy varies with time, and that the spatial patterns of similarity are mainly controlled by geology and precipitation. Compared to a standard, full-resolution model run used as a virtual reality ‘truth’, computation time could be reduced to one fourth, when modelling quality, expressed as Nash-Sutcliffe efficiency of discharge, was allowed decreasing from 1 to 0.84. Re-clustering occurred at irregular intervals mainly associated with the onset of precipitation, but on average the patterns of similarity were quite stable, such that during the entire six-year simulation period, only 165 re-clusterings were carried out, i.e. on average once every eleven days.</p>

2020 ◽  
Author(s):  
Uwe Ehret ◽  
Rik van Pruijssen ◽  
Marina Bortoli ◽  
Ralf Loritz ◽  
Elnaz Azmi ◽  
...  

Abstract. In this paper we propose adaptive clustering as a new way to analyse hydrological systems and to reduce computational efforts of distributed modelling, by dynamically identifying similar model elements, clustering them and inferring dynamics from just a few representatives per cluster. It is based on the observation that while hydrological systems generally exhibit large spatial variability of their properties, requiring distributed approaches for analysis and modelling, there is also redundancy, i.e. there exist typical and recurrent combinations of properties, such that sub systems exist with similar properties, which will exhibit similar internal dynamics and produce similar output when in similar initial states and when exposed to similar forcing. Being dependent on all these factors, similarity is hence a dynamical rather than a static phenomenon, and it is not necessarily a function of spatial proximity. We explain and demonstrate adaptive clustering at the example of a conceptual, yet realistic and distributed hydrological model, fit to the Attert basin in Luxembourg by multi-variate calibration. Based on normalized and binned transformations of model states and fluxes, we first calculated time series of Shannon information entropy to measure dynamical similarity (or redundancy) among sub systems. This revealed that indeed high redundancy exists, that its magnitude differs among variables, that it varies with time, and that for the Attert basin the spatial patterns of similarity are mainly controlled by geology and precipitation. Based on these findings, we integrated adaptive clustering into the hydrological model. It constitutes a shell around the model hydrological process core and comprises: Clustering of model elements, choice of cluster representatives, mapping of results from representatives to recipients, comparison of clusterings over time to decide when re-clustering is advisable. Adaptive clustering, compared to a standard, full-resolution model run used as a virtual reality truth, reduced computation time to one fourth, when accepting a decrease of modelling quality, expressed as Nash–Sutcliffe efficiency of sub catchment runoff, from 1 to 0.84. We suggest that adaptive clustering is a promising tool for both system analysis, and for reducing computation times of distributed models, thus facilitating applications to larger systems and/or longer periods of time. We demonstrate the potential of adaptive clustering at the example of a hydrological system and model, but it should apply to a wide range of systems and models across the earth system sciences. Being dynamical, it goes beyond existing static methods used to increase model performance, such as lumping, and it is compatible with existing dynamical methods such as adaptive time-stepping or adaptive gridding. Unlike the latter, adaptive clustering does not require adjacency of the sub systems to be joined.


Author(s):  
Xiyu Peng ◽  
Karin S Dorman

Abstract Motivation Next-generation amplicon sequencing is a powerful tool for investigating microbial communities. A main challenge is to distinguish true biological variants from errors caused by amplification and sequencing. In traditional analyses, such errors are eliminated by clustering reads within a sequence similarity threshold, usually 97%, and constructing operational taxonomic units, but the arbitrary threshold leads to low resolution and high false-positive rates. Recently developed ‘denoising’ methods have proven able to resolve single-nucleotide amplicon variants, but they still miss low-frequency sequences, especially those near more frequent sequences, because they ignore the sequencing quality information. Results We introduce AmpliCI, a reference-free, model-based method for rapidly resolving the number, abundance and identity of error-free sequences in massive Illumina amplicon datasets. AmpliCI considers the quality information and allows the data, not an arbitrary threshold or an external database, to drive conclusions. AmpliCI estimates a finite mixture model, using a greedy strategy to gradually select error-free sequences and approximately maximize the likelihood. AmpliCI has better performance than three popular denoising methods, with acceptable computation time and memory usage. Availability and implementation Source code is available at https://github.com/DormanLab/AmpliCI. Supplementary information Supplementary material are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document