United in Variety: The EarthServer Datacube Federation

Author(s):  
Peter Baumann

<p>Datacubes form an accepted cornerstone for analysis (and visualization) ready spatio-temporal data offerings. Beyond the multi-dimensional data structure, the paradigm also suggests rich services, abstracting away from the untractable zillions of files and products - actionable datacubes as established by Array Databases enable users to ask "any query, any time" without programming. The principle of location-transparent federations establishes a single, coherent information space.</p><p>The EarthServer federation is a large, growing data center network offering Petabytes of a critical variety, such as radar and optical satellite data, atmospheric data, elevation data, and thematic cubes like global sea ice. Around CODE-DE and DIASs an ecosystem of data has been established that is available to users as a single pool, in particular for efficient distributed data fusion irrespective of data location.</p><p>In our talk we present technology, services, and governance of this unique intercontinental line-up of data centers. A live demo will show dist<br>ributed datacube fusion.</p><p> </p>

2018 ◽  
Vol 6 (3) ◽  
pp. 734-746 ◽  
Author(s):  
Mohammad A. Islam ◽  
Kishwar Ahmed ◽  
Hong Xu ◽  
Nguyen H. Tran ◽  
Gang Quan ◽  
...  

2021 ◽  
Author(s):  
Philipp Kaestli ◽  
Daniel Armbruster ◽  
The EIDA Technical Committee

<p>With the setup of EIDA (the European Integrated Data Archive https://www.orfeus-eu.org/data/eida/) in the framework of ORFEUS, and the implementation of FDSN-standardized web services, seismic waveform data and instrumentation metadata of most seismic networks and data centers in Europe became accessible in a homogeneous way. EIDA has augmented this with the WFcatalog service for waveform quality metadata, and a routing service to find out which data center offers data of which network, region, and type. However, while a distributed data archive has clear advantages for maintenance and quality control of the holdings, it complicates the life of researchers who wish to collect data archived across different data centers. To tackle this, EIDA has implemented the “federator” as a one-stop transparent gateway service to access the entire data holdings of EIDA.</p><p>To its users the federator acts just like a standard FDSN dataselect, station, or EIDA WFcatalog service, except for the fact that it can (due to a fully qualified internal routing cache) directly answer data requests on virtual networks.</p><p>Technically, the federator fulfills a user request by decomposing it into single stream epoch requests targeted at a single data center, collecting them, and re-assemble them to a single result.</p><p>This implementation has several technical advantages:</p><ul><li>It avoids response size limitations of EIDA member services, reducing limitations to those imposed by assembling cache space of the federator instance itself.</li> <li>It allows easy merging of partial responses using request sorting and concatenation, and reducing needs to interpret them. This reduces computational needs of the federator and allows high throughput of parallel user requests.</li> <li>It reduces the variability of requests to end member services. Thus, the federator can implement a reverse loopback cache and protect end node services from delivering redundant information and reducing their load.</li> <li>As partial results are quick, and delivered in small subunits, they can be streamed to the user more or less continuously, avoiding both service timeouts and throughput bottlenecks.</li> </ul><p>The advantage of having a one-stop data access for entire EIDA still comes with some limitations and shortcomings. Having requests which ultimately map to a single data center performed by the federator can be slower by that data center directly. FDSN-defined standard error codes sent by end member services have limited utility as they refer to a part of the request only. Finally, the federator currently does not provide access to restricted data.</p><p>Nevertheless, we believe that the one-stop data access compensates these shortcomings in many use cases.</p><p>Further documentation of the service is available with ORFEUS at http://www.orfeus-eu.org/data/eida/nodes/FEDERATOR/</p>


2021 ◽  
Author(s):  
Peter Baumann

<p>Collaboration requires some minimum of common understanding, in the case of Earth data in particular common principles making data interchangeable, comparable, and combinable. Open standards help here; in case of Big Earth Data specifically the OGC/ISO Coverages standard. This unifying framework establishes a common framework in particular for regular and irregular spatio-temporal datacubes. Services grounding on such common understanding have proven more uniform to access and handle, implementing a principle of "minimal surprise" for users visiting different portals while using their favourite clients. Data combination and fusion benefits from canonical metadata allowing automatic alignment, e.g, between 2D DEMs, 3D satellite image time series, 4D atmospheric data, etc.</p><p>The EarthServer datacube federation s showing the way towards unleashing in full the potential of pixels for supporting the UN Sustainable Development Goals, local governance, and also businesses. EarthServer is an open, free, transparent, and democratic network of data centers offering dozens of Petabytes of a critical variety, such as radar and optical Copernicus data, atmospheric data, elevation data, and thematic cubes like global sea ice. Data centers like DIASs and CODE-DE, research organizations, companies, and agencies have teamed up in EarthServer. Strictly based on the open OGC standards, an ecosystem of data has been established that is available to users as a single pool, without the need for any coding skills (such as python). A specific unique capability is location-transparency: clients can fire their query against any of the mebers, and the federation nodes will figure out the optimal work distribution irrespective of data location.</p><p>The underlying datacube engine, rasdaman, enables all datacube access, analytics, and federation. Query evaluation is optimized automatically applying highly efficient intelligent, rule-based methods in homogeneous and heterogeneous mashups, up to satellite on-board deployments as done in the ORBiDANSe project. Users perceive one single, common information space accessible through a wide spectrum of open-source and proprietary clients.</p><p>In our talk we present technology, services, and governance of this unique line-up of data centers. A demo will show distributed datacube fusion live.</p><p> </p>


2019 ◽  
Vol 8 (4) ◽  
pp. 6594-6597

This work shows a multi-target approach for planning vitality utilization in server farms thinking about customary and environmentally friendly power vitality information sources. Cloud computing is a developing innovation. Cloud computing offers administrations such as IaaS, SaaS, PaaS and it gives computing resources through virtualization over data network. Data center consumes huge amount of electrical energy in which it releases very high amount of carbon-di-oxide. The foremost critical challenge in cloud computing is to implement green cloud computing with the help of optimizing energy utilization. The carbon footprint is lowered while minimizing the operating cost. We know that renewable energies that are produced on-site are highly variable and unpredictable but usage of green energy is very important for the mankind using huge amount of single sourced brown energy is not suggested, so our algorithm which evolves genetically and gives practical solution in order to use renewable energy


2018 ◽  
Vol 7 (3.34) ◽  
pp. 141
Author(s):  
D Ramya ◽  
J Deepa ◽  
P N.Karthikayan

A geographically distributed Data center assures Globalization of data and also security for the organizations. The principles for Disaster recovery is also taken into consideration. The above aspects drive business opportunities to companies that own many sites and Cloud Infrastructures with multiple owners.  The data centers store very critical and confidential documents that multiple organizations share in the cloud infrastructure. Previously different servers with different Operating systems and software applications were used. As it was difficult to maintain, Servers are consolidated which allows sharing of resources at low of cost maintenance [7]. The availability of documents should be increased and down time should be reduced. Thus workload management becomes a challenging among the data centers distributed geographically. In this paper we focus on different approaches used for workload management in Geo-distributed data centers. The algorithms used and also the challenges involved in different approaches are discussed 


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2879
Author(s):  
Marcel Antal ◽  
Andrei-Alexandru Cristea ◽  
Victor-Alexandru Pădurean ◽  
Tudor Cioara ◽  
Ionut Anghel ◽  
...  

Data centers consume lots of energy to execute their computational workload and generate heat that is mostly wasted. In this paper, we address this problem by considering heat reuse in the case of a distributed data center that features IT equipment (i.e., servers) installed in residential homes to be used as a primary source of heat. We propose a workload scheduling solution for distributed data centers based on a constraint satisfaction model to optimally allocate workload on servers to reach and maintain the desired home temperature setpoint by reusing residual heat. We have defined two models to correlate the heat demand with the amount of workload to be executed by the servers: a mathematical model derived from thermodynamic laws calibrated with monitored data and a machine learning model able to predict the amount of workload to be executed by a server to reach a desired ambient temperature setpoint. The proposed solution was validated using the monitored data of an operational distributed data center. The server heat and power demand mathematical model achieve a correlation accuracy of 11.98% while in the case of machine learning models, the best correlation accuracy of 4.74% is obtained for a Gradient Boosting Regressor algorithm. Also, our solution manages to distribute the workload so that the temperature setpoint is met in a reasonable time, while the server power demand is accurately following the heat demand.


2020 ◽  
Author(s):  
Rodrigo A. C. Da Silva ◽  
Nelson L. S. Da Fonseca

This paper summarizes the dissertation ”Energy-aware load balancing in distributed data centers”, which proposed two new algorithms for minimizing energy consumption in cloud data centers. Both algorithms consider hierarchical data center network topologies and requests for the allocation of groups of virtual machines (VMs). The Topology-aware Virtual Machine Placement (TAVMP) algorithm deals with the placement of virtual machines in a single data center. It reduces the blocking of requests and yet maintains acceptable levels of energy consumption. The Topology-aware Virtual Machine Selection (TAVMS) algorithm chooses sets of VM groups for migration between different data centers. Its employment leads to relevant overall energy savings.


2021 ◽  
Author(s):  
Stiw Herrera ◽  
Larissa Miguez da Silva ◽  
Paulo Ricardo Reis ◽  
Anderson Silva ◽  
Fabio Porto

Scientific data is mainly multidimensional in its nature, presenting interesting opportunities for optimizations when managed by array databases. However, in scenarios where data is sparse, an efficient implementation is still required. In this paper, we investigate the adoption of the Ph-tree as an in-memory indexing structure for sparse data. We compare the performance in data ingestion and in both range and punctual queries, using SAVIME as the multidimensional array DBMS. Our experiments, using a real weather dataset, highlights the challenges involving providing a fast data ingestion, as proposed by SAVIME, and at the same time efficiently answering multidimensional queries on sparse data.


Sign in / Sign up

Export Citation Format

Share Document