Relocating Local Outliers Produced by Partitioning Methods

Abstract. For an assessment of the roles of soil and vegetation in the climate system, a further understanding of the flux components of H2O and CO2 (e.g., transpiration, soil respiration) and their interaction with physical conditions and physiological functioning of plants and ecosystems is necessary. To obtain magnitudes of these flux components, we applied source partitioning approaches after Scanlon and Kustas (2010; SK10) and after Thomas et al. (2008; TH08) to high-frequency eddy covariance measurements of 12 study sites covering different ecosystems (croplands, grasslands, and forests) in different climatic regions. Both partitioning methods are based on higher-order statistics of the H2O and CO2 fluctuations, but proceed differently to estimate transpiration, evaporation, net primary production, and soil respiration. We compared and evaluated the partitioning results obtained with SK10 and TH08, including slight modifications of both approaches. Further, we analyzed the interrelations among the performance of the partitioning methods, turbulence characteristics, and site characteristics (such as plant cover type, canopy height, canopy density, and measurement height). We were able to identify characteristics of a data set that are prerequisites for adequate performance of the partitioning methods. SK10 had the tendency to overestimate and TH08 to underestimate soil flux components. For both methods, the partitioning of CO2 fluxes was less robust than for H2O fluxes. Results derived with SK10 showed relatively large dependencies on estimated water use efficiency (WUE) at the leaf level, which is a required input. Measurements of outgoing longwave radiation used for the estimation of foliage temperature (used in WUE) could slightly increase the quality of the partitioning results. A modification of the TH08 approach, by applying a cluster analysis for the conditional sampling of respiration–evaporation events, performed satisfactorily, but did not result in significant advantages compared to the original method versions developed by Thomas et al. (2008). The performance of each partitioning approach was dependent on meteorological conditions, plant development, canopy height, canopy density, and measurement height. Foremost, the performance of SK10 correlated negatively with the ratio between measurement height and canopy height. The performance of TH08 was more dependent on canopy height and leaf area index. In general, all site characteristics that increase dissimilarities between scalars appeared to enhance partitioning performance for SK10 and TH08.

Download Full-text

Dynamic Partitioning Supporting Load Balancing for Distributed RDF Graph Stores

Symmetry ◽

10.3390/sym11070926 ◽

2019 ◽

Vol 11 (7) ◽

pp. 926

Author(s):

Kyoungsoo Bok ◽

Junwon Kim ◽

Jaesoo Yoo

Keyword(s):

Load Balancing ◽

Distributed Processing ◽

Data Partitioning ◽

Rdf Graph ◽

Dynamic Partitioning ◽

Usage Frequency ◽

Partitioning Methods ◽

Description Framework ◽

Rdf Graphs ◽

Distributed Server

Various resource description framework (RDF) partitioning methods have been studied for the efficient distributed processing of a large RDF graph. The RDF graph has symmetrical characteristics because subject and object can be used interchangeably if predicate is changed. This paper proposes a dynamic partitioning method of RDF graphs to support load balancing in distributed environments where data insertion and change continue to occur. The proposed method generates clusters and subclusters by considering the usage frequency of the RDF graph that are used by queries as the criteria to perform graph partitioning. It creates a cluster by grouping RDF subgraphs with higher usage frequency while creating a subcluster with lower usage frequency. These clusters and subclusters conduct load balancing by using the mean frequency of queries for the distributed server and conduct graph data partitioning by considering the size of the data stored in each distributed server. It also minimizes the number of edge-cuts connected to clusters and subclusters to minimize communication costs between servers. This solves the problem of data concentration to specific servers due to ongoing data changes and additions and allows efficient load balancing among servers. The performance results show that the proposed method significantly outperforms the existing partitioning methods in terms of query performance time in a distributed server.

Download Full-text