Evaluating the usefulness of content addressable storage for high-performance data intensive applications

Custom templates based heterogeneous resource allocation for data-intensive applications

10.32469/10355/86482 ◽

2020 ◽

Author(s):

◽

Ronny Bazan Antequera

Keyword(s):

High Performance ◽

Real Data ◽

University Of Missouri ◽

Application Performance ◽

Data Intensive ◽

Edge Based ◽

The Right ◽

Heterogeneous Cloud ◽

Data Intensive Applications ◽

Cloud Resources

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI-COLUMBIA AT REQUEST OF AUTHOR.] The increase of data-intensive applications in science and engineering fields (i.e., bioinformatics, cybermanufacturing) demand the use of high-performance computing resources. However, data-intensive applications' local resources usually present limited capacity and availability due to sizable upfront costs. Moreover, using remote public resources presents constraints at the private edge network domain. Specifically, mis-configured network policies cause bottlenecks due to the other application cross-traffic attempting to use shared networking resources. Additionally, selecting the right remote resources can be cumbersome especially for those users who are interested in the application execution considering nonfunctional requirements such as performance, security and cost. The data-intensive applications have recurrent deployments and similar infrastructure requirements that can be addressed by creating templates. In this thesis, we handle applications requirements through intelligent resource 'abstractions' coupled with 'reusable' approaches that save time and effort in deploying new cloud architectures. Specifically, we design a novel custom template middleware that can retrieve blue prints of resource configuration, technical/policy information, and benchmarks of workflow performance to facilitate repeatable/reusable resource composition. The middleware considers hybrid-recommendation methodology (Online and offline recommendation) to leverage a catalog to rapidly check custom template solution correctness before/during resource consumption. Further, it prescribes application adaptations by fostering effective social interactions during the application's scaling stages. Based on the above approach, we organize the thesis contributions under two main thrusts: (i) Custom Templates for Cloud Networking for Data-intensive Applications: This involves scheduling transit selection, engineering at the campus-edge based upon real-time policy control. Our solution ensures prioritized application performance delivery for multi-tenant traffic profiles from a diverse set of actual data intensive applications in bioinformatics. (ii) Custom Templates for Cloud Computing for Data-intensive Applications: This involves recommending cloud resources for data-intensive applications based on a custom template catalog. We develop a novel expert system approach that is implemented as a middleware to abstracts data-intensive application requirements for custom templates composition. We uniquely consider heterogeneous cloud resources selection for the deployment of cloud architectures for real data-intensive applications in cybermanufacturing.

Download Full-text

Empirical Performance Analysis of HPC Benchmarks Across Variations in Cloud Computing

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2013010102 ◽

2013 ◽

Vol 3 (1) ◽

pp. 13-26 ◽

Cited By ~ 4

Author(s):

Sanjay P. Ahuja ◽

Sindhu Mani

Keyword(s):

Data Storage ◽

High Performance ◽

Large Data ◽

Extensive Study ◽

Memory Bandwidth ◽

Platform As A Service ◽

Data Intensive ◽

Computational Performance ◽

Empirical Performance ◽

Data Intensive Applications

High Performance Computing (HPC) applications are scientific applications that require significant CPU capabilities. They are also data-intensive applications requiring large data storage. While many researchers have examined the performance of Amazon’s EC2 platform across some HPC benchmarks, an extensive study and their comparison between Amazon’s EC2 and Microsoft’s Windows Azure is largely missing with metrics such as memory bandwidth, I/O performance, and communication and computational performance. The purpose of this paper is to implement existing benchmarks to evaluate and analyze these metrics for EC2 and Windows Azure that span both Infrastructure-as-a-Service and Platform-as-a-Service types. This was accomplished by running MPI versions of STREAM, Interleaved or Random (IOR) and NAS Parallel (NPB) benchmarks on small and medium instance types. In addition a new EC2 medium instance type (m1.medium) was also included in the analysis. These benchmarks measure the memory bandwidth, I/O performance, communication and computational performance.

Download Full-text

Hardware Technologies for High-Performance Data-Intensive Computing

Computer ◽

10.1109/mc.2008.125 ◽

2008 ◽

Vol 41 (4) ◽

pp. 60-68 ◽

Cited By ~ 37

Author(s):

Maya Gokhale ◽

Jonathan Cohen ◽

Andy Yoo ◽

W. Marcus Miller ◽

Arpith Jacob ◽

...

Keyword(s):

High Performance ◽

Performance Data ◽

Data Intensive Computing ◽

Data Intensive

Download Full-text

Secure, Efficient Data Transport and Replica Management for High-Performance Data-Intensive Computing

2001 Eighteenth IEEE Symposium on Mass Storage Systems and Technologies ◽

10.1109/mss.2001.10001 ◽

2001 ◽

Cited By ~ 68

Author(s):

Bill Allcock ◽

Joe Bester ◽

John Bresnahan ◽

Ann Chervenak ◽

Carl Kesselman ◽

...

Keyword(s):

High Performance ◽

Performance Data ◽

Data Transport ◽

Data Intensive Computing ◽

Data Intensive ◽

Replica Management ◽

Efficient Data

Download Full-text

Data intensive analysis on the gordon high performance data and compute system

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '11 ◽

10.1145/2020408.2020526 ◽

2011 ◽

Cited By ~ 8

Author(s):

Robert S. Sinkovits ◽

Pietro Cicotti ◽

Shawn Strande ◽

Mahidhar Tatineni ◽

Paul Rodriguez ◽

...

Keyword(s):

High Performance ◽

Performance Data ◽

Data Intensive

Download Full-text

Impact of high performance sockets on data intensive applications

High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on ◽

10.1109/hpdc.2003.1210013 ◽

2004 ◽

Cited By ~ 10

Author(s):

P. Balaji ◽

Jiesheng Wu ◽

T. Kurc ◽

U. Catalyurek ◽

D.K. Panda ◽

...

Keyword(s):

High Performance ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

Building High-Resolution Sky Images Using the Cell/B.E.

Scientific Programming ◽

10.1155/2009/408370 ◽

2009 ◽

Vol 17 (1-2) ◽

pp. 113-134 ◽

Cited By ~ 4

Author(s):

Ana Lucia Varbanescu ◽

Alexander S. van Amesfoort ◽

Tim Cornwell ◽

Ger van Diepen ◽

Rob van Nieuwpoort ◽

...

Keyword(s):

High Performance ◽

Complete Solution ◽

Performance Potential ◽

Data Intensive ◽

Irregular Data ◽

Original Application ◽

The One ◽

High Level ◽

Data Intensive Applications ◽

Application Specific

The performance potential of the Cell/B.E., as well as its availability, have attracted a lot of attention from various high-performance computing (HPC) fields. While computation intensive kernels proved to be exceptionally well suited for running on the Cell, irregular data-intensive applications are usually considered as poor matches. In this paper, we present our complete solution for enabling such a data-intensive application to run efficiently on the Cell/B.E. processor. Specifically, we target radioastronomy data gridding and degridding, two resembling imaging filters based on convolutional resampling. Our solution is based on building a high-level application model, used to evaluate parallelization alternatives. Next, we choose the one with the best performance potential, and we gradually exploit this potential by applying platform-specific and application-specific optimizations. After several iterations, our target application shows a speed-up factor between 10 and 20 on a dual-Cell blade when compared with the original application running on a commodity machine. Given these results, and based on our empirical observations, we are able to pinpoint a set of ten guidelines for parallelizing similar applications on the Cell/B.E. Finally, we conclude the Cell/B.E. can provide high performance for data-intensive applications at the price of increased programming efforts and with a significant aid from aggressive application-specific optimizations.

Download Full-text

A SECURE WEB APPLICATION PROVIDING PUBLIC ACCESS TO HIGH-PERFORMANCE DATA INTENSIVE SCIENTIFIC RESOURCES - ScalaBLAST Web Application

Proceedings of the Fourth International Conference on Web Information Systems and Technologies ◽

10.5220/0001521902440251 ◽

2008 ◽

Keyword(s):

Web Application ◽

High Performance ◽

Performance Data ◽

Public Access ◽

Data Intensive

Download Full-text

PHash: A memory-efficient, high-performance key-value store for large-scale data-intensive applications

Journal of Systems and Software ◽

10.1016/j.jss.2016.09.047 ◽

2017 ◽

Vol 123 ◽

pp. 33-44 ◽

Cited By ~ 2

Author(s):

Hyotaek Shim

Keyword(s):

High Performance ◽

Large Scale ◽

Data Intensive ◽

Large Scale Data ◽

Data Intensive Applications ◽

Scale Data ◽

Memory Efficient

Download Full-text

Towards Data Intensive Many-Task Computing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Data Intensive Distributed Computing ◽

10.4018/978-1-61520-971-2.ch002 ◽

2012 ◽

pp. 28-73 ◽

Cited By ~ 8

Author(s):

Ioan Raicu ◽

Ian Foster ◽

Yong Zhao ◽

Alex Szalay ◽

Philip Little ◽

...

Keyword(s):

High Performance ◽

File Systems ◽

Data Locality ◽

Resource Provisioning ◽

Parallel File Systems ◽

Data Intensive ◽

Dynamic Resource Provisioning ◽

Rate Of Increase ◽

Parallel File ◽

Data Intensive Applications

Many-task computing aims to bridge the gap between two computing paradigms, high throughput computing and high performance computing. Traditional techniques to support many-task computing commonly found in scientific computing (i.e. the reliance on parallel file systems with static configurations) do not scale to today’s largest systems for data intensive application, as the rate of increase in the number of processors per system is outgrowing the rate of performance increase of parallel file systems. In this chapter, the authors argue that in such circumstances, data locality is critical to the successful and efficient use of large distributed systems for data-intensive applications. They propose a “data diffusion” approach to enable data-intensive many-task computing. They define an abstract model for data diffusion, define and implement scheduling policies with heuristics that optimize real world performance, and develop a competitive online caching eviction policy. They also offer many empirical experiments to explore the benefits of data diffusion, both under static and dynamic resource provisioning, demonstrating approaches that improve both performance and scalability.

Download Full-text