A Network Performance Based Data Placement Policy in Distributed Data-Intensive Applications

Grids, clouds and cloud-like infrastructures are capable of supporting a broad range of data-intensive applications. There are interesting and unique performance issues that appear as the volume of data and degree of distribution increases. New scalable data-placement and management techniques, as well as novel approaches to determine the relative placement of data and computational workload, are required. We develop and study a genome sequence matching application that is simple to control and deploy, yet serves as a prototype of a data-intensive application. The application uses a SAGA-based implementation of the All-Pairs pattern. This paper aims to understand some of the factors that influence the performance of this application and the interplay of those factors. We also demonstrate how the SAGA approach can enable data-intensive applications to be extensible and interoperable over a range of infrastructure. This capability enables us to compare and contrast two different approaches for executing distributed data-intensive applications—simple application-level data-placement heuristics versus distributed file systems.

Download Full-text

Optimizing VM allocation and data placement for data-intensive applications in cloud using ACO metaheuristic algorithm

Engineering Science and Technology an International Journal ◽

10.1016/j.jestch.2016.11.006 ◽

2017 ◽

Vol 20 (2) ◽

pp. 616-628 ◽

Cited By ~ 25

Author(s):

T.P. Shabeera ◽

S.D. Madhu Kumar ◽

Sameera M. Salam ◽

K. Murali Krishnan

Keyword(s):

Data Placement ◽

Metaheuristic Algorithm ◽

Data Intensive ◽

Vm Allocation ◽

Data Intensive Applications

Download Full-text

Grouping-Aware Data Placement in HDFS for Data-Intensive Applications Based on Graph Clustering

Advances in Computer and Computational Sciences - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-10-3773-3_3 ◽

2017 ◽

pp. 21-31 ◽

Cited By ~ 2

Author(s):

S. Vengadeswaran ◽

S. R. Balasundaram

Keyword(s):

Graph Clustering ◽

Data Placement ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

On the Benefits of Multipath Routing for Distributed Data-Intensive Applications with High Bandwidth Requirements and Multidomain Reach

2009 Seventh Annual Communication Networks and Services Research Conference ◽

10.1109/cnsr.2009.26 ◽

2009 ◽

Cited By ~ 11

Author(s):

Xiaomin Chen ◽

Mohit Chamania ◽

Admela Jukan ◽

André C. Drummond ◽

Nelson L. S. da Fonseca

Keyword(s):

Multipath Routing ◽

Distributed Data ◽

Data Intensive ◽

High Bandwidth ◽

Data Intensive Applications

Download Full-text

Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality

Scalable Computing Practice and Experience ◽

10.12694/scpe.v19i3.1375 ◽

2018 ◽

Vol 19 (3) ◽

pp. 245-258

Author(s):

Vengadeswaran Shanmugasundaram ◽

Balasundaram Sadhu Ramakrishnan

Keyword(s):

Big Data ◽

Data Placement ◽

Query Execution ◽

Access Pattern ◽

Clustering Techniques ◽

Data Intensive ◽

Markov Clustering ◽

Default Data ◽

Data Intensive Applications ◽

Grouping Behavior

In this data era, massive volumes of data are being generated every second in variety of domains such as Geoscience, Social Web, Finance, e-Commerce, Health Care, Climate modelling, Physics, Astronomy, Government sectors etc. Hadoop has been well-recognized as de factobig data processing platform that have been extensively adopted, and is currently widely used, in many application domains processing Big Data. Even though it is considered as an efficient solution for such complex query processing, it has its own limitation when the data to be processed exhibit interest locality. The data required for any query execution follows grouping behavior wherein only a part of the Big-Data is accessed frequently. During such scenarion, the time taken to execute a queryand return results, increases exponentially as the amount of data increases leading to much waiting time for the user. Since Hadoop default data placement strategy (HDDPS) does not consider such grouping behavior, it does not perform efficiently resulting in lacunas such as decreased local map task execution, increased query execution time etc. Hence proposed an Optimal Data Placement Strategy (ODPS) based on grouping semantics. In this paper we experiment the significance oftwo most promising clustering techniques viz. Hierarchical Agglomerative Clustering (HAC) and Markov Clustering (MCL) in grouping aware data placement for data intensive applications having interest locality. Initially user access pattern is identified by dynamically analyzing history log.Then both clustering techniques (HAC & MCL) are separately applied over the access pattern to obtain independent clusters. These clusters are interpreted and validated to extract the Optimal Data Groupings (ODG). Finally proposed strategy reorganizes the default data layouts in HDFSbased on ODG to achieve maximum parallel execution per group subjective to Load Balancer and Rack Awareness. Our proposed strategy is tested in 10 node cluster placed in a multi rack with Hadoop installed in every node deployed in cloud platform. Proposed strategy reduces the query execution time, significantly improves the data locality and has proved to be more efficient for massive datasets processing in heterogeneous distributed environment. Also MCL shows a marginal improved performance over HAC for queries exhibiting interest localities.

Download Full-text

DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications with Interest Locality

International Journal of Recent Trends in Engineering and Research ◽

10.23883/ijrter.2018.4333.w8uli ◽

2018 ◽

Vol 4 (6) ◽

pp. 172-181

Keyword(s):

Data Placement ◽

Data Intensive ◽

Data Grouping ◽

Data Intensive Applications

Download Full-text

Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems - 3DAPAS '11

10.1145/1996010 ◽

2011 ◽

Keyword(s):

Distributed Data ◽

Data Intensive ◽

Programming Abstractions ◽

Data Intensive Applications

Download Full-text

A two-phase virtual machine placement policy for data-intensive applications in cloud

Journal of Network and Computer Applications ◽

10.1016/j.jnca.2021.103025 ◽

2021 ◽

Vol 180 ◽

pp. 103025

Author(s):

Samaneh Sadegh ◽

Kamran Zamanifar ◽

Piotr Kasprzak ◽

Ramin Yahyapour

Keyword(s):

Virtual Machine ◽

Virtual Machine Placement ◽

Two Phase ◽

Data Intensive ◽

Placement Policy ◽

Data Intensive Applications

Download Full-text

A Network Performance Based Data Placement Policy in Distributed Data-Intensive Applications

Location-aware associated data placement for geo-distributed data-intensive applications

Genetic Based Data Placement for Geo-Distributed Data-Intensive Applications in Cloud Computing

Understanding performance of distributed data-intensive applications

Optimizing VM allocation and data placement for data-intensive applications in cloud using ACO metaheuristic algorithm

Grouping-Aware Data Placement in HDFS for Data-Intensive Applications Based on Graph Clustering

On the Benefits of Multipath Routing for Distributed Data-Intensive Applications with High Bandwidth Requirements and Multidomain Reach

Significance of Hierarchical and Markov Clustering in Grouping Aware Data Placement for Data Intensive Applications Having Interest Locality

DRAW: A New Data-gRouping-AWare Data Placement Scheme for Data Intensive Applications with Interest Locality

Proceedings of the 2011 workshop on Dynamic distributed data-intensive applications, programming abstractions, and systems - 3DAPAS '11

A two-phase virtual machine placement policy for data-intensive applications in cloud

Export Citation Format