scholarly journals Towards Transparent Throughput Elasticity for IaaS Cloud Storage

Author(s):  
Bogdan Nicolae ◽  
Pierre Riteau ◽  
Kate Keahey

Storage elasticity on IaaS clouds is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. This paper provides a transparent solution that automatically boosts I/O bandwidth during peaks for underlying virtual disks, effectively avoiding over-provisioning without performance loss. The authors' proposal relies on the idea of leveraging short-lived virtual disks of better performance characteristics (and thus more expensive) to act during peaks as a caching layer for the persistent virtual disks where the application data is stored. Furthermore, they introduce a performance and cost prediction methodology that can be used both independently to estimate in advance what trade-off between performance and cost is possible, as well as an optimization technique that enables better cache size selection to meet the desired performance level with minimal cost. The authors demonstrate the benefits of their proposal both for microbenchmarks and for two real-life applications using large-scale experiments.

Author(s):  
Bogdan Nicolae ◽  
Pierre Riteau ◽  
Zhuo Zhen ◽  
Kate Keahey

Storage elasticity on the cloud is a crucial feature in the age of data-intensive computing, especially when considering fluctuations of I/O throughput. In this chapter, the authors explore how to transparently boost the I/O bandwidth during peak utilization to deliver high performance without over-provisioning storage resources. The proposal relies on the idea of leveraging short-lived virtual disks of better performance characteristics (and more expensive) to act during peaks as a caching layer for the persistent virtual disks where the application data is stored during runtime. They show how this idea can be achieved efficiently at the block-device level, using a caching mechanism that leverages iterative behavior and learns from past experience. Second, they introduce a corresponding performance and cost prediction methodology. They demonstrate the benefits of our proposal both for micro-benchmarks and for two real-life applications using large-scale experiments. They conclude with a discussion on how these techniques can be generalized for increasingly complex landscape of modern cloud storage.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ying-Chih Lin ◽  
Chin-Sheng Yu ◽  
Yen-Jen Lin

Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable.


2013 ◽  
pp. 287-321
Author(s):  
Judy Qiu ◽  
Jaliya Ekanayake ◽  
Thilina Gunarathne ◽  
Jong Youl Choi ◽  
Seung-Hee Bae ◽  
...  

Data intensive computing, cloud computing, and multicore computing are converging as frontiers to address massive data problems with hybrid programming models and/or runtimes including MapReduce, MPI, and parallel threading on multicore platforms. A major challenge is to utilize these technologies and large-scale computing resources effectively to advance fundamental science discoveries such as those in Life Sciences. The recently developed next-generation sequencers have enabled large-scale genome sequencing in areas such as environmental sample sequencing leading to metagenomic studies of collections of genes. Metagenomic research is just one of the areas that present a significant computational challenge because of the amount and complexity of data to be processed. This chapter discusses the use of innovative data-mining algorithms and new programming models for several Life Sciences applications. The authors particularly focus on methods that are applicable to large data sets coming from high throughput devices of steadily increasing power. They show results for both clustering and dimension reduction algorithms, and the use of MapReduce on modest size problems. They identify two key areas where further research is essential, and propose to develop new O(NlogN) complexity algorithms suitable for the analysis of millions of sequences. They suggest Iterative MapReduce as a promising programming model combining the best features of MapReduce with those of high performance environments such as MPI.


Author(s):  
Judy Qiu ◽  
Jaliya Ekanayake ◽  
Thilina Gunarathne ◽  
Jong Youl Choi ◽  
Seung-Hee Bae ◽  
...  

Data intensive computing, cloud computing, and multicore computing are converging as frontiers to address massive data problems with hybrid programming models and/or runtimes including MapReduce, MPI, and parallel threading on multicore platforms. A major challenge is to utilize these technologies and large-scale computing resources effectively to advance fundamental science discoveries such as those in Life Sciences. The recently developed next-generation sequencers have enabled large-scale genome sequencing in areas such as environmental sample sequencing leading to metagenomic studies of collections of genes. Metagenomic research is just one of the areas that present a significant computational challenge because of the amount and complexity of data to be processed. This chapter discusses the use of innovative data-mining algorithms and new programming models for several Life Sciences applications. The authors particularly focus on methods that are applicable to large data sets coming from high throughput devices of steadily increasing power. They show results for both clustering and dimension reduction algorithms, and the use of MapReduce on modest size problems. They identify two key areas where further research is essential, and propose to develop new O(NlogN) complexity algorithms suitable for the analysis of millions of sequences. They suggest Iterative MapReduce as a promising programming model combining the best features of MapReduce with those of high performance environments such as MPI.


2014 ◽  
Vol 543-547 ◽  
pp. 1913-1916
Author(s):  
Sheng Hang Wu ◽  
Zhe Wang ◽  
Ming Yuan He ◽  
Huai Lin Dong

With the web information dramatically increases, Distributed processing of mass data through a cluster have been the focus of research field. An efficient distributed algorithm is the determinant of the scalability and performance in data analyses. This dissertation firstly studies the operation mechanism of Storm, which is a simplified distributed and real-time computation platform. Based on the Storm platform, an improved K-Means algorithm which could be used for data intensive computing is designed and implemented. Finally, the experience results show that the K-Means clustering algorithm base on Storm platform could obtain a higher performance in experience and improve the effectiveness and accuracy in large-scale text clustering.


Sign in / Sign up

Export Citation Format

Share Document