Transparent Throughput Elasticity for Modern Cloud Storage

Author(s):  
Bogdan Nicolae ◽  
Pierre Riteau ◽  
Zhuo Zhen ◽  
Kate Keahey

Storage elasticity on the cloud is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. In this chapter, the authors explore how to transparently boost the I/O bandwidth during peak utilization to deliver high performance without over-provisioning storage resources. The proposal relies on the idea of leveraging short-lived virtual disks of better performance characteristics (and more expensive) to act during peaks as a caching layer for the persistent virtual disks where the application data is stored during runtime. They show how this idea can be achieved efficiently at the block-device level, using a caching mechanism that leverages iterative behavior and learns from past experience. Second, they introduce a corresponding performance and cost prediction methodology. They demonstrate the benefits of our proposal both for micro-benchmarks and for two real-life applications using large-scale experiments. They conclude with a discussion on how these techniques can be generalized for increasingly complex landscape of modern cloud storage.

Author(s):  
Bogdan Nicolae ◽  
Pierre Riteau ◽  
Kate Keahey

Storage elasticity on IaaS clouds is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. This paper provides a transparent solution that automatically boosts I/O bandwidth during peaks for underlying virtual disks, effectively avoiding over-provisioning without performance loss. The authors' proposal relies on the idea of leveraging short-lived virtual disks of better performance characteristics (and thus more expensive) to act during peaks as a caching layer for the persistent virtual disks where the application data is stored. Furthermore, they introduce a performance and cost prediction methodology that can be used both independently to estimate in advance what trade-off between performance and cost is possible, as well as an optimization technique that enables better cache size selection to meet the desired performance level with minimal cost. The authors demonstrate the benefits of their proposal both for microbenchmarks and for two real-life applications using large-scale experiments.


Author(s):  
TAJ ALAM ◽  
PARITOSH DUBEY ◽  
ANKIT KUMAR

Distributed systems are efficient means of realizing high-performance computing (HPC). They are used in meeting the demand of executing large-scale high-performance computational jobs. Scheduling the tasks on such computational resources is one of the prime concerns in the heterogeneous distributed systems. Scheduling jobs on distributed systems are NP-complete in nature. Scheduling requires either heuristic or metaheuristic approach for sub-optimal but acceptable solutions. An adaptive threshold-based scheduler is one such heuristic approach. This work proposes adaptive threshold-based scheduler for batch of independent jobs (ATSBIJ) with the objective of optimizing the makespan of the jobs submitted for execution on cloud computing systems. ATSBIJ exploits the features of interval estimation for calculating the threshold values for generation of efficient schedule of the batch. Simulation studies on CloudSim ensures that the ATSBIJ approach works effectively for real life scenario.


Author(s):  
GEORGE MOURKOUSIS ◽  
MATHEW PROTONOTARIOS ◽  
THEODORA VARVARIGOU

This paper presents a study on the application of a hybrid genetic algorithm (HGA) to an extended instance of the Vehicle Routing Problem. The actual problem is a complex real-life vehicle routing problem regarding the distribution of products to customers. A non homogenous fleet of vehicles with limited capacity and allowed travel time is available to satisfy the stochastic demand of a set of different types of customers with earliest and latest time for servicing. The objective is to minimize distribution costs respecting the imposed constraints (vehicle capacity, customer time windows, driver working hours and so on). The approach for solving the problem was based on a "cluster and route" HGA. Several genetic operators, selection and replacement methods were tested until the HGA became efficient for optimization of a multi-extrema search space system (multi-modal optimization). Finally, High Performance Computing (HPC) has been applied in order to provide near-optimal solutions in a sensible amount of time.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ying-Chih Lin ◽  
Chin-Sheng Yu ◽  
Yen-Jen Lin

Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable.


2011 ◽  
Vol 130-134 ◽  
pp. 2455-2460
Author(s):  
Bo Li ◽  
Hai Ying Zhou ◽  
De Cheng Zuo

It is critical to understand the workload characteristics and resource usage patterns of available applications to guide the design and development of architecture of the future large scale servers. In this paper, we analyze the workload performance characteristics of the actual Bank Intermediary Business (BIB) characteristics with BIBmodel and BIBbench design work, and propose BIB performance workload and use case definitions. The analysis and comparisons of workload and use case illustrate that the workload performance characteristics of BIB is totally different with TPC benchmarks. With the development of economy and technology, the requirements for BIB servers are important in modeling, benchmarks developing and workload performance characteristics studying are increased nowadays.


2007 ◽  
Vol 18 (01) ◽  
pp. 45-61 ◽  
Author(s):  
LIMOR FIX ◽  
ORNA GRUMBERG ◽  
AMNON HEYMAN ◽  
TAMIR HEYMAN ◽  
ASSAF SCHUSTER

Recent advances in scheduling and networking have paved the way for efficient exploitation of large-scale distributed computing platforms such as computational grids and huge clusters. Such infrastructures hold great promise for the highly resource-demanding task of verifying and checking large models, given that model checkers would be designed with a high degree of scalability and flexibility in mind. In this paper we focus on the mechanisms required to execute a high-performance, distributed, symbolic model checker on top of a large-scale distributed environment. We develop a hybrid algorithm for slicing the state space and dynamically distribute the work among the worker processes. We show that the new approach is faster, more effective, and thus much more scalable than previous slicing algorithms. We then present a checkpoint-restart module that has very low overhead. This module can be used to combat failures, the likelihood of which increases with the size of the computing plat-form. However, checkpoint-restart is even more handy for the scheduling system: it can be used to avoid reserving large numbers of workers, thus making the distributed computation work-efficient. Finally, we discuss for the first time the effect of reorder on the distributed model checker and show how the distributed system performs more efficient reordering than the sequential one. We implemented our contributions on a network of 200 processors, using a distributed scalable scheme that employs a high-performance industrial model checker from Intel. Our results show that the system was able to verify real-life models much larger than was previously possible.


2013 ◽  
pp. 287-321
Author(s):  
Judy Qiu ◽  
Jaliya Ekanayake ◽  
Thilina Gunarathne ◽  
Jong Youl Choi ◽  
Seung-Hee Bae ◽  
...  

Data intensive computing, cloud computing, and multicore computing are converging as frontiers to address massive data problems with hybrid programming models and/or runtimes including MapReduce, MPI, and parallel threading on multicore platforms. A major challenge is to utilize these technologies and large-scale computing resources effectively to advance fundamental science discoveries such as those in Life Sciences. The recently developed next-generation sequencers have enabled large-scale genome sequencing in areas such as environmental sample sequencing leading to metagenomic studies of collections of genes. Metagenomic research is just one of the areas that present a significant computational challenge because of the amount and complexity of data to be processed. This chapter discusses the use of innovative data-mining algorithms and new programming models for several Life Sciences applications. The authors particularly focus on methods that are applicable to large data sets coming from high throughput devices of steadily increasing power. They show results for both clustering and dimension reduction algorithms, and the use of MapReduce on modest size problems. They identify two key areas where further research is essential, and propose to develop new O(NlogN) complexity algorithms suitable for the analysis of millions of sequences. They suggest Iterative MapReduce as a promising programming model combining the best features of MapReduce with those of high performance environments such as MPI.


Author(s):  
Judy Qiu ◽  
Jaliya Ekanayake ◽  
Thilina Gunarathne ◽  
Jong Youl Choi ◽  
Seung-Hee Bae ◽  
...  

Data intensive computing, cloud computing, and multicore computing are converging as frontiers to address massive data problems with hybrid programming models and/or runtimes including MapReduce, MPI, and parallel threading on multicore platforms. A major challenge is to utilize these technologies and large-scale computing resources effectively to advance fundamental science discoveries such as those in Life Sciences. The recently developed next-generation sequencers have enabled large-scale genome sequencing in areas such as environmental sample sequencing leading to metagenomic studies of collections of genes. Metagenomic research is just one of the areas that present a significant computational challenge because of the amount and complexity of data to be processed. This chapter discusses the use of innovative data-mining algorithms and new programming models for several Life Sciences applications. The authors particularly focus on methods that are applicable to large data sets coming from high throughput devices of steadily increasing power. They show results for both clustering and dimension reduction algorithms, and the use of MapReduce on modest size problems. They identify two key areas where further research is essential, and propose to develop new O(NlogN) complexity algorithms suitable for the analysis of millions of sequences. They suggest Iterative MapReduce as a promising programming model combining the best features of MapReduce with those of high performance environments such as MPI.


Sign in / Sign up

Export Citation Format

Share Document