United in Variety: The EarthServer Datacube Federation

Mapping Intimacies ◽

10.5194/egusphere-egu2020-10849 ◽

2020 ◽

Author(s):

Peter Baumann

Keyword(s):

Data Center ◽

Data Centers ◽

Distributed Data ◽

Temporal Data ◽

Atmospheric Data ◽

Elevation Data ◽

Technology Services ◽

Spatio Temporal ◽

Data Location ◽

Array Databases

Datacubes form an accepted cornerstone for analysis (and visualization) ready spatio-temporal data offerings. Beyond the multi-dimensional data structure, the paradigm also suggests rich services, abstracting away from the untractable zillions of files and products - actionable datacubes as established by Array Databases enable users to ask "any query, any time" without programming. The principle of location-transparent federations establishes a single, coherent information space.The EarthServer federation is a large, growing data center network offering Petabytes of a critical variety, such as radar and optical satellite data, atmospheric data, elevation data, and thematic cubes like global sea ice. Around CODE-DE and DIASs an ecosystem of data has been established that is available to users as a single pool, in particular for efficient distributed data fusion irrespective of data location.In our talk we present technology, services, and governance of this unique intercontinental line-up of data centers. A live demo will show dist ributed datacube fusion.&#160;

Download Full-text

Exploiting Spatio-Temporal Diversity for Water Saving in Geo-Distributed Data Centers

IEEE Transactions on Cloud Computing ◽

10.1109/tcc.2016.2535201 ◽

2018 ◽

Vol 6 (3) ◽

pp. 734-746 ◽

Cited By ~ 4

Author(s):

Mohammad A. Islam ◽

Kishwar Ahmed ◽

Hong Xu ◽

Nguyen H. Tran ◽

Gang Quan ◽

...

Keyword(s):

Data Centers ◽

Water Saving ◽

Distributed Data ◽

Temporal Diversity ◽

Spatio Temporal

Download Full-text

The EIDA federator – a one-stop access to EIDA seismic data holdings

10.5194/egusphere-egu21-15558 ◽

2021 ◽

Author(s):

Philipp Kaestli ◽

Daniel Armbruster ◽

The EIDA Technical Committee

Keyword(s):

Data Center ◽

Data Centers ◽

Data Access ◽

Distributed Data ◽

Data Archive ◽

Waveform Data ◽

Seismic Waveform ◽

One Stop ◽

Single Data ◽

Seismic Networks

With the setup of EIDA (the European Integrated Data Archive https://www.orfeus-eu.org/data/eida/) in the framework of ORFEUS, and the implementation of FDSN-standardized web services, seismic waveform data and instrumentation metadata of most seismic networks and data centers in Europe became accessible in a homogeneous way. EIDA has augmented this with the WFcatalog service for waveform quality metadata, and a routing service to find out which data center offers data of which network, region, and type. However, while a distributed data archive has clear advantages for maintenance and quality control of the holdings, it complicates the life of researchers who wish to collect data archived across different data centers. To tackle this, EIDA has implemented the &#8220;federator&#8221; as a one-stop transparent gateway service to access the entire data holdings of EIDA.To its users the federator acts just like a standard FDSN dataselect, station, or EIDA WFcatalog service, except for the fact that it can (due to a fully qualified internal routing cache) directly answer data requests on virtual networks.Technically, the federator fulfills a user request by decomposing it into single stream epoch requests targeted at a single data center, collecting them, and re-assemble them to a single result.This implementation has several technical advantages:<ul><li>It avoids response size limitations of EIDA member services, reducing limitations to those imposed by assembling cache space of the federator instance itself.</li> <li>It allows easy merging of partial responses using request sorting and concatenation, and reducing needs to interpret them. This reduces computational needs of the federator and allows high throughput of parallel user requests.</li> <li>It reduces the variability of requests to end member services. Thus, the federator can implement a reverse loopback cache and protect end node services from delivering redundant information and reducing their load.</li> <li>As partial results are quick, and delivered in small subunits, they can be streamed to the user more or less continuously, avoiding both service timeouts and throughput bottlenecks.</li> </ul>The advantage of having a one-stop data access for entire EIDA still comes with some limitations and shortcomings. Having requests which ultimately map to a single data center performed by the federator can be slower by that data center directly. FDSN-defined standard error codes sent by end member services have limited utility as they refer to a part of the request only. Finally, the federator currently does not provide access to restricted data.Nevertheless, we believe that the one-stop data access compensates these shortcomings in many use cases.Further documentation of the service is available with ORFEUS at http://www.orfeus-eu.org/data/eida/nodes/FEDERATOR/

Download Full-text

Teams Win: The European Datacube Federation

10.5194/egusphere-egu21-15148 ◽

2021 ◽

Author(s):

Peter Baumann

Keyword(s):

Data Centers ◽

Local Governance ◽

Wide Spectrum ◽

Satellite Image ◽

Common Information ◽

Work Distribution ◽

Atmospheric Data ◽

Common Framework ◽

Data Location ◽

Common Understanding

Collaboration requires some minimum of common understanding, in the case of Earth data in particular common principles making data interchangeable, comparable, and combinable. Open standards help here; in case of Big Earth Data specifically the OGC/ISO Coverages standard. This unifying framework establishes a common framework in particular for regular and irregular spatio-temporal datacubes. Services grounding on such common understanding have proven more uniform to access and handle, implementing a principle of "minimal surprise" for users visiting different portals while using their favourite clients. Data combination and fusion benefits from canonical metadata allowing automatic alignment, e.g, between 2D DEMs, 3D satellite image time series, 4D atmospheric data, etc.The EarthServer datacube federation s showing the way towards unleashing in full the potential of pixels for supporting the UN Sustainable Development Goals, local governance, and also businesses. EarthServer is an open, free, transparent, and democratic network of data centers offering dozens of Petabytes of a critical variety, such as radar and optical Copernicus data, atmospheric data, elevation data, and thematic cubes like global sea ice. Data centers like DIASs and CODE-DE, research organizations, companies, and agencies have teamed up in EarthServer. Strictly based on the open OGC standards, an ecosystem of data has been established that is available to users as a single pool, without the need for any coding skills (such as python). A specific unique capability is location-transparency: clients can fire their query against any of the mebers, and the federation nodes will figure out the optimal work distribution irrespective of data location.The underlying datacube engine, rasdaman, enables all datacube access, analytics, and federation. Query evaluation is optimized automatically applying highly efficient intelligent, rule-based methods in homogeneous and heterogeneous mashups, up to satellite on-board deployments as done in the ORBiDANSe project. Users perceive one single, common information space accessible through a wide spectrum of open-source and proprietary clients.In our talk we present technology, services, and governance of this unique line-up of data centers. A demo will show distributed datacube fusion live.&#160;

Download Full-text

Managing Energy Consumption in Distributed Data Centers using Genetic algorithm

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d8562.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 6594-6597

Keyword(s):

Cloud Computing ◽

Data Center ◽

Data Centers ◽

Operating Cost ◽

Electrical Energy ◽

Green Energy ◽

Energy Utilization ◽

Distributed Data ◽

Huge Amount ◽

Very High

This work shows a multi-target approach for planning vitality utilization in server farms thinking about customary and environmentally friendly power vitality information sources. Cloud computing is a developing innovation. Cloud computing offers administrations such as IaaS, SaaS, PaaS and it gives computing resources through virtualization over data network. Data center consumes huge amount of electrical energy in which it releases very high amount of carbon-di-oxide. The foremost critical challenge in cloud computing is to implement green cloud computing with the help of optimizing energy utilization. The carbon footprint is lowered while minimizing the operating cost. We know that renewable energies that are produced on-site are highly variable and unpredictable but usage of green energy is very important for the mankind using huge amount of single sourced brown energy is not suggested, so our algorithm which evolves genetically and gives practical solution in order to use renewable energy

Download Full-text

A survey on Approaches Used for Efficient Workload Management in Geo-Distributed Data Centres

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.34.18924 ◽

2018 ◽

Vol 7 (3.34) ◽

pp. 141

Author(s):

D Ramya ◽

J Deepa ◽

P N.Karthikayan

Keyword(s):

Operating Systems ◽

Data Center ◽

Data Centers ◽

Distributed Data ◽

Cloud Infrastructure ◽

Workload Management ◽

Software Applications ◽

Geographically Distributed ◽

Cloud Infrastructures ◽

Data Centres

A geographically distributed Data center assures Globalization of data and also security for the organizations. The principles for Disaster recovery is also taken into consideration. The above aspects drive business opportunities to companies that own many sites and Cloud Infrastructures with multiple owners. The data centers store very critical and confidential documents that multiple organizations share in the cloud infrastructure. Previously different servers with different Operating systems and software applications were used. As it was difficult to maintain, Servers are consolidated which allows sharing of resources at low of cost maintenance [7]. The availability of documents should be increased and down time should be reduced. Thus workload management becomes a challenging among the data centers distributed geographically. In this paper we focus on different approaches used for workload management in Geo-distributed data centers. The algorithms used and also the challenges involved in different approaches are discussed

Download Full-text

A spatio-temporal data model for road network in data center based on incremental updating in vehicle navigation system

Chinese Geographical Science ◽

10.1007/s11769-011-0446-4 ◽

2011 ◽

Vol 21 (3) ◽

pp. 346-353 ◽

Cited By ~ 2

Author(s):

Huisheng Wu ◽

Zhaoli Liu ◽

Shuwen Zhang ◽

Xiuling Zuo

Keyword(s):

Navigation System ◽

Data Model ◽

Data Center ◽

Road Network ◽

Vehicle Navigation ◽

Temporal Data ◽

Incremental Updating ◽

Vehicle Navigation System ◽

Spatio Temporal

Download Full-text

Heating Homes with Servers: Workload Scheduling for Heat Reuse in Distributed Data Centers

Sensors ◽

10.3390/s21082879 ◽

2021 ◽

Vol 21 (8) ◽

pp. 2879

Author(s):

Marcel Antal ◽

Andrei-Alexandru Cristea ◽

Victor-Alexandru Pădurean ◽

Tudor Cioara ◽

Ionut Anghel ◽

...

Keyword(s):

Machine Learning ◽

Mathematical Model ◽

Data Center ◽

Data Centers ◽

Gradient Boosting ◽

Distributed Data ◽

Power Demand ◽

Heat Demand ◽

Workload Scheduling ◽

Correlation Accuracy

Data centers consume lots of energy to execute their computational workload and generate heat that is mostly wasted. In this paper, we address this problem by considering heat reuse in the case of a distributed data center that features IT equipment (i.e., servers) installed in residential homes to be used as a primary source of heat. We propose a workload scheduling solution for distributed data centers based on a constraint satisfaction model to optimally allocate workload on servers to reach and maintain the desired home temperature setpoint by reusing residual heat. We have defined two models to correlate the heat demand with the amount of workload to be executed by the servers: a mathematical model derived from thermodynamic laws calibrated with monitored data and a machine learning model able to predict the amount of workload to be executed by a server to reach a desired ambient temperature setpoint. The proposed solution was validated using the monitored data of an operational distributed data center. The server heat and power demand mathematical model achieve a correlation accuracy of 11.98% while in the case of machine learning models, the best correlation accuracy of 4.74% is obtained for a Gradient Boosting Regressor algorithm. Also, our solution manages to distribute the workload so that the temperature setpoint is met in a reasonable time, while the server power demand is accurately following the heat demand.

Download Full-text

Energy-aware load balancing in distributed data centers

10.5753/ctd.2016.9133 ◽

2020 ◽

Author(s):

Rodrigo A. C. Da Silva ◽

Nelson L. S. Da Fonseca

Keyword(s):

Energy Consumption ◽

Load Balancing ◽

Virtual Machine ◽

Data Center ◽

Data Centers ◽

Energy Savings ◽

Virtual Machines ◽

Distributed Data ◽

Energy Aware ◽

Cloud Data

This paper summarizes the dissertation ”Energy-aware load balancing in distributed data centers”, which proposed two new algorithms for minimizing energy consumption in cloud data centers. Both algorithms consider hierarchical data center network topologies and requests for the allocation of groups of virtual machines (VMs). The Topology-aware Virtual Machine Placement (TAVMP) algorithm deals with the placement of virtual machines in a single data center. It reduces the blocking of requests and yet maintains acceptable levels of energy consumption. The Topology-aware Virtual Machine Selection (TAVMS) algorithm chooses sets of VM groups for migration between different data centers. Its employment leads to relevant overall energy savings.

Download Full-text

Managing Sparse Spatio-Temporal Data in SAVIME: an Evaluation of the Ph-tree Index

10.5753/sbbd.2021.17895 ◽

2021 ◽

Author(s):

Stiw Herrera ◽

Larissa Miguez da Silva ◽

Paulo Ricardo Reis ◽

Anderson Silva ◽

Fabio Porto

Keyword(s):

Sparse Data ◽

Scientific Data ◽

Efficient Implementation ◽

Temporal Data ◽

Indexing Structure ◽

Tree Index ◽

Spatio Temporal ◽

Data Ingestion ◽

Memory Indexing ◽

Array Databases

Scientific data is mainly multidimensional in its nature, presenting interesting opportunities for optimizations when managed by array databases. However, in scenarios where data is sparse, an efficient implementation is still required. In this paper, we investigate the adoption of the Ph-tree as an in-memory indexing structure for sparse data. We compare the performance in data ingestion and in both range and punctual queries, using SAVIME as the multidimensional array DBMS. Our experiments, using a real weather dataset, highlights the challenges involving providing a fast data ingestion, as proposed by SAVIME, and at the same time efficiently answering multidimensional queries on sparse data.

Download Full-text

Type-aware task placement in geo-distributed data centers with low OPEX using data center resizing

2014 International Conference on Computing, Networking and Communications (ICNC) ◽

10.1109/iccnc.2014.6785333 ◽

2014 ◽

Author(s):

Lin Gu ◽

Deze Zeng ◽

Song Guo ◽

Shui Yu

Keyword(s):

Data Center ◽

Data Centers ◽

Distributed Data ◽

Using Data ◽

Task Placement

Download Full-text