scholarly journals Consistent assimilation of multiple data streams in a carbon cycle data assimilation system

2016 ◽  
Vol 9 (10) ◽  
pp. 3569-3588 ◽  
Author(s):  
Natasha MacBean ◽  
Philippe Peylin ◽  
Frédéric Chevallier ◽  
Marko Scholze ◽  
Gregor Schürmann

Abstract. Data assimilation methods provide a rigorous statistical framework for constraining parametric uncertainty in land surface models (LSMs), which in turn helps to improve their predictive capability and to identify areas in which the representation of physical processes is inadequate. The increase in the number of available datasets in recent years allows us to address different aspects of the model at a variety of spatial and temporal scales. However, combining data streams in a DA system is not a trivial task. In this study we highlight some of the challenges surrounding multiple data stream assimilation for the carbon cycle component of LSMs. We give particular consideration to the assumptions associated with the type of inversion algorithm that are typically used when optimising global LSMs – namely, Gaussian error distributions and linearity in the model dynamics. We explore the effect of biases and inconsistencies between the observations and the model (resulting in non-Gaussian error distributions), and we examine the difference between a simultaneous assimilation (in which all data streams are included in one optimisation) and a step-wise approach (in which each data stream is assimilated sequentially) in the presence of non-linear model dynamics. In addition, we perform a preliminary investigation into the impact of correlated errors between two data streams for two cases, both when the correlated observation errors are included in the prior observation error covariance matrix, and when the correlated errors are ignored. We demonstrate these challenges by assimilating synthetic observations into two simple models: the first a simplified version of the carbon cycle processes represented in many LSMs and the second a non-linear toy model. Finally, we provide some perspectives and advice to other land surface modellers wishing to use multiple data streams to constrain their model parameters.

2016 ◽  
Author(s):  
Natasha MacBean ◽  
Philippe Peylin ◽  
Frédéric Chevallier ◽  
Marko Scholze ◽  
Gregor Schürmann

Abstract. Data assimilation methods provide a rigorous statistical framework for constraining the parametric uncertainty of land surface models (LSMs), with the aim of improving our predictive capability as well as identifying areas in which the models need improvement. The increase in the number of available datasets in recent years allows us to address different aspects of the model at a variety of spatial and temporal scales. However, combining data streams in a DA system is not a trivial task. In this study we highlight some of the challenges surrounding multiple data stream assimilation, with a particular focus on the carbon cycle component of LSMs. We examine the impact of biases and inconsistencies between the observations and the model (resulting in non Gaussian error distributions) and the impact of non-linearity in model dynamics. In addition we explore the differences between performing a simultaneous assimilation (in which all data streams are included in one optimisation) and a step-wise approach (in which each data steam is assimilated sequentially), given the assumptions inherent to the inversion algorithm chosen for this study. We demonstrate some of these issues by assimilating synthetic observations into two simple models: the first a simplified version of the carbon cycle processes represented in many LSMs, and the second a non-linear toy model. We further discuss these experimental results in the context of recent studies in the carbon cycle data assimilation literature, and finally we provide some perspectives and advice to other land surface modellers wishing to use multiple data streams to constrain their models.


2016 ◽  
Author(s):  
G. J. Schürmann ◽  
T. Kaminski ◽  
C. Köstler ◽  
N. Carvalhais ◽  
M. Voßbeck ◽  
...  

Abstract. We describe the Max Planck Institute Carbon Cycle Data Assimilation System (MPI-CCDAS) built around the tangent-linear version of the land surface scheme of the MPI-Earth System Model v1 (JSBACH). The simulated terrestrial biosphere processes (phenology and carbon balance) were constrained by observations of the fraction of photosynthetically active radiation (TIP-FAPAR product) and by observations of atmospheric CO2 at a global set of monitoring stations for the years 2005–2009. The system successfully, and computationally efficiently, improved average foliar area and northern extra-tropical seasonality of foliar area when constrained by TIP-FAPAR. Global net and gross carbon fluxes were improved when constrained by atmospheric CO2, although the system tended to underestimate tropical productivity. Assimilating both data streams jointly allowed the MPI-CCDAS to match both observations (TIP-FAPAR and atmospheric CO2) equally well as the single data stream assimilation cases, therefore overall increasing the appropriateness of the resultant parameter values and biosphere dynamics. Our study thus highlights the role of the TIP-FAPAR product in stabilising the underdetermined atmospheric inversion problem and demonstrates the value of multiple-data stream assimilation for the simulation of terrestrial biosphere dynamics. The constraint on regional gross and net CO2 flux patterns is limited through the parametrisation of the biosphere model. We expect improvement on that aspect through a refined initialisation strategy and inclusion of further biosphere observations as constraints.


2011 ◽  
Vol 7 (4) ◽  
pp. 1-20 ◽  
Author(s):  
Reem Al-Mulla ◽  
Zaher Al Aghbari

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.


Author(s):  
Reem Al-Mulla ◽  
Zaher Al Aghbari

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.


Sign in / Sign up

Export Citation Format

Share Document