Transparent Throughput Elasticity for Modern Cloud Storage

Storage elasticity on IaaS clouds is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. This paper provides a transparent solution that automatically boosts I/O bandwidth during peaks for underlying virtual disks, effectively avoiding over-provisioning without performance loss. The authors' proposal relies on the idea of leveraging short-lived virtual disks of better performance characteristics (and thus more expensive) to act during peaks as a caching layer for the persistent virtual disks where the application data is stored. Furthermore, they introduce a performance and cost prediction methodology that can be used both independently to estimate in advance what trade-off between performance and cost is possible, as well as an optimization technique that enables better cache size selection to meet the desired performance level with minimal cost. The authors demonstrate the benefits of their proposal both for microbenchmarks and for two real-life applications using large-scale experiments.

Download Full-text

Adaptive Threshold Based Scheduler for Batch of Independent Jobs for Cloud Computing System

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch110 ◽

2021 ◽

pp. 2246-2266

Author(s):

TAJ ALAM ◽

PARITOSH DUBEY ◽

ANKIT KUMAR

Keyword(s):

Cloud Computing ◽

Distributed Systems ◽

High Performance ◽

Large Scale ◽

Real Life ◽

Interval Estimation ◽

Computing System ◽

Adaptive Threshold ◽

Batch Simulation ◽

Heterogeneous Distributed Systems

Distributed systems are efficient means of realizing high-performance computing (HPC). They are used in meeting the demand of executing large-scale high-performance computational jobs. Scheduling the tasks on such computational resources is one of the prime concerns in the heterogeneous distributed systems. Scheduling jobs on distributed systems are NP-complete in nature. Scheduling requires either heuristic or metaheuristic approach for sub-optimal but acceptable solutions. An adaptive threshold-based scheduler is one such heuristic approach. This work proposes adaptive threshold-based scheduler for batch of independent jobs (ATSBIJ) with the objective of optimizing the makespan of the jobs submitted for execution on cloud computing systems. ATSBIJ exploits the features of interval estimation for calculating the threshold values for generation of efficient schedule of the batch. Simulation studies on CloudSim ensures that the ATSBIJ approach works effectively for real life scenario.

Download Full-text

APPLICATION OF GENETIC ALGORITHMS TO A LARGE-SCALE MULTIPLE-CONSTRAINT VEHICLE ROUTING PROBLEM

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026803000835 ◽

2003 ◽

Vol 03 (01) ◽

pp. 1-21 ◽

Cited By ~ 3

Author(s):

GEORGE MOURKOUSIS ◽

MATHEW PROTONOTARIOS ◽

THEODORA VARVARIGOU

Keyword(s):

Vehicle Routing ◽

Vehicle Routing Problem ◽

High Performance ◽

Large Scale ◽

Hybrid Genetic Algorithm ◽

Real Life ◽

Search Space ◽

Genetic Operators ◽

Actual Problem ◽

Routing Problem

This paper presents a study on the application of a hybrid genetic algorithm (HGA) to an extended instance of the Vehicle Routing Problem. The actual problem is a complex real-life vehicle routing problem regarding the distribution of products to customers. A non homogenous fleet of vehicles with limited capacity and allowed travel time is available to satisfy the stochastic demand of a set of different types of customers with earliest and latest time for servicing. The objective is to minimize distribution costs respecting the imposed constraints (vehicle capacity, customer time windows, driver working hours and so on). The approach for solving the problem was based on a "cluster and route" HGA. Several genetic operators, selection and replacement methods were tested until the HGA became efficient for optimization of a multi-extrema search space system (multi-modal optimization). Finally, High Performance Computing (HPC) has been applied in order to provide near-optimal solutions in a sensible amount of time.

Download Full-text

Enabling Large-Scale Biomedical Analysis in the Cloud

BioMed Research International ◽

10.1155/2013/185679 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6 ◽

Cited By ~ 10

Author(s):

Ying-Chih Lin ◽

Chin-Sheng Yu ◽

Yen-Jen Lin

Keyword(s):

High Performance ◽

Large Scale ◽

Computing System ◽

Biomedical Data ◽

Data Intensive Computing ◽

Biomedical Analysis ◽

Data Intensive ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable.

Download Full-text

Workload Performance Characterization and Test Strategy of High-Performance Fault-Tolerant Computers Based on BIBbench

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.130-134.2455 ◽

2011 ◽

Vol 130-134 ◽

pp. 2455-2460

Author(s):

Bo Li ◽

Hai Ying Zhou ◽

De Cheng Zuo

Keyword(s):

High Performance ◽

Large Scale ◽

Fault Tolerant ◽

Performance Characteristics ◽

Use Case ◽

Case Definitions ◽

Design Work ◽

Test Strategy ◽

Intermediary Business ◽

Usage Patterns

It is critical to understand the workload characteristics and resource usage patterns of available applications to guide the design and development of architecture of the future large scale servers. In this paper, we analyze the workload performance characteristics of the actual Bank Intermediary Business (BIB) characteristics with BIBmodel and BIBbench design work, and propose BIB performance workload and use case definitions. The analysis and comparisons of workload and use case illustrate that the workload performance characteristics of BIB is totally different with TPC benchmarks. With the development of economy and technology, the requirements for BIB servers are important in modeling, benchmarks developing and workload performance characteristics studying are increased nowadays.

Download Full-text

VERIFYING VERY LARGE INDUSTRIAL CIRCUITS USING 100 PROCESSES AND BEYOND

International Journal of Foundations of Computer Science ◽

10.1142/s0129054107004565 ◽

2007 ◽

Vol 18 (01) ◽

pp. 45-61 ◽

Cited By ~ 2

Author(s):

LIMOR FIX ◽

ORNA GRUMBERG ◽

AMNON HEYMAN ◽

TAMIR HEYMAN ◽

ASSAF SCHUSTER

Keyword(s):

High Performance ◽

Large Scale ◽

Real Life ◽

Distributed Model ◽

Computational Grids ◽

Great Promise ◽

Model Checker ◽

Symbolic Model ◽

Computing Platforms ◽

Industrial Circuits

Recent advances in scheduling and networking have paved the way for efficient exploitation of large-scale distributed computing platforms such as computational grids and huge clusters. Such infrastructures hold great promise for the highly resource-demanding task of verifying and checking large models, given that model checkers would be designed with a high degree of scalability and flexibility in mind. In this paper we focus on the mechanisms required to execute a high-performance, distributed, symbolic model checker on top of a large-scale distributed environment. We develop a hybrid algorithm for slicing the state space and dynamically distribute the work among the worker processes. We show that the new approach is faster, more effective, and thus much more scalable than previous slicing algorithms. We then present a checkpoint-restart module that has very low overhead. This module can be used to combat failures, the likelihood of which increases with the size of the computing plat-form. However, checkpoint-restart is even more handy for the scheduling system: it can be used to avoid reserving large numbers of workers, thus making the distributed computation work-efficient. Finally, we discuss for the first time the effect of reorder on the distributed model checker and show how the distributed system performs more efficient reordering than the sequential one. We implemented our contributions on a network of 200 processors, using a distributed scalable scheme that employs a high-performance industrial model checker from Intel. Our results show that the system was able to verify real-life models much larger than was previously possible.

Download Full-text

PHash: A memory-efficient, high-performance key-value store for large-scale data-intensive applications

Journal of Systems and Software ◽

10.1016/j.jss.2016.09.047 ◽

2017 ◽

Vol 123 ◽

pp. 33-44 ◽

Cited By ~ 2

Author(s):

Hyotaek Shim

Keyword(s):

High Performance ◽

Large Scale ◽

Data Intensive ◽

Large Scale Data ◽

Data Intensive Applications ◽

Scale Data ◽

Memory Efficient

Download Full-text

Data Intensive Computing for Bioinformatics

Bioinformatics ◽

10.4018/978-1-4666-3604-0.ch016 ◽

2013 ◽

pp. 287-321

Author(s):

Judy Qiu ◽

Jaliya Ekanayake ◽

Thilina Gunarathne ◽

Jong Youl Choi ◽

Seung-Hee Bae ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Programming Model ◽

Life Sciences ◽

Programming Models ◽

Data Sets ◽

Data Intensive Computing ◽

Data Intensive ◽

Data Mining Algorithms ◽

Model Combining

Data intensive computing, cloud computing, and multicore computing are converging as frontiers to address massive data problems with hybrid programming models and/or runtimes including MapReduce, MPI, and parallel threading on multicore platforms. A major challenge is to utilize these technologies and large-scale computing resources effectively to advance fundamental science discoveries such as those in Life Sciences. The recently developed next-generation sequencers have enabled large-scale genome sequencing in areas such as environmental sample sequencing leading to metagenomic studies of collections of genes. Metagenomic research is just one of the areas that present a significant computational challenge because of the amount and complexity of data to be processed. This chapter discusses the use of innovative data-mining algorithms and new programming models for several Life Sciences applications. The authors particularly focus on methods that are applicable to large data sets coming from high throughput devices of steadily increasing power. They show results for both clustering and dimension reduction algorithms, and the use of MapReduce on modest size problems. They identify two key areas where further research is essential, and propose to develop new O(NlogN) complexity algorithms suitable for the analysis of millions of sequences. They suggest Iterative MapReduce as a promising programming model combining the best features of MapReduce with those of high performance environments such as MPI.

Download Full-text

A high-performance application data environment for large-scale scientific computations

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2003.1255638 ◽

2003 ◽

Vol 14 (12) ◽

pp. 1262-1274 ◽

Cited By ~ 6

Author(s):

Xiaohui Shen ◽

Wei-keng Liao ◽

A. Choudhary ◽

G. Memik ◽

M. Kandemir

Keyword(s):

High Performance ◽

Large Scale ◽

Application Data ◽

Data Environment ◽

Scientific Computations ◽

High Performance Application

Download Full-text

Data Intensive Computing for Bioinformatics

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Data Intensive Distributed Computing ◽

10.4018/978-1-61520-971-2.ch010 ◽

2012 ◽

pp. 207-241 ◽

Cited By ~ 1

Author(s):

Judy Qiu ◽

Jaliya Ekanayake ◽

Thilina Gunarathne ◽

Jong Youl Choi ◽

Seung-Hee Bae ◽

...

Keyword(s):

High Performance ◽

Large Scale ◽

Programming Model ◽

Life Sciences ◽

Programming Models ◽

Data Sets ◽

Data Intensive Computing ◽

Data Intensive ◽

Data Mining Algorithms ◽

Model Combining

Data intensive computing, cloud computing, and multicore computing are converging as frontiers to address massive data problems with hybrid programming models and/or runtimes including MapReduce, MPI, and parallel threading on multicore platforms. A major challenge is to utilize these technologies and large-scale computing resources effectively to advance fundamental science discoveries such as those in Life Sciences. The recently developed next-generation sequencers have enabled large-scale genome sequencing in areas such as environmental sample sequencing leading to metagenomic studies of collections of genes. Metagenomic research is just one of the areas that present a significant computational challenge because of the amount and complexity of data to be processed. This chapter discusses the use of innovative data-mining algorithms and new programming models for several Life Sciences applications. The authors particularly focus on methods that are applicable to large data sets coming from high throughput devices of steadily increasing power. They show results for both clustering and dimension reduction algorithms, and the use of MapReduce on modest size problems. They identify two key areas where further research is essential, and propose to develop new O(NlogN) complexity algorithms suitable for the analysis of millions of sequences. They suggest Iterative MapReduce as a promising programming model combining the best features of MapReduce with those of high performance environments such as MPI.

Download Full-text