Genetic Based Data Placement for Geo-Distributed Data-Intensive Applications in Cloud Computing

Grids, clouds and cloud-like infrastructures are capable of supporting a broad range of data-intensive applications. There are interesting and unique performance issues that appear as the volume of data and degree of distribution increases. New scalable data-placement and management techniques, as well as novel approaches to determine the relative placement of data and computational workload, are required. We develop and study a genome sequence matching application that is simple to control and deploy, yet serves as a prototype of a data-intensive application. The application uses a SAGA-based implementation of the All-Pairs pattern. This paper aims to understand some of the factors that influence the performance of this application and the interplay of those factors. We also demonstrate how the SAGA approach can enable data-intensive applications to be extensible and interoperable over a range of infrastructure. This capability enables us to compare and contrast two different approaches for executing distributed data-intensive applications—simple application-level data-placement heuristics versus distributed file systems.

Download Full-text

A Network Performance Based Data Placement Policy in Distributed Data-Intensive Applications

2014 IEEE International Conference on Computer and Information Technology ◽

10.1109/cit.2014.60 ◽

2014 ◽

Cited By ~ 1

Author(s):

Dawei Xu ◽

Xianglin Miao ◽

Peng Hu ◽

Zhongzhi Luan

Keyword(s):

Network Performance ◽

Data Placement ◽

Distributed Data ◽

Data Intensive ◽

Placement Policy ◽

Data Intensive Applications

Download Full-text

Probabilistic State Estimation Based Scheduling Approach for Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.55-57.1053 ◽

2011 ◽

Vol 55-57 ◽

pp. 1053-1057

Author(s):

Gui De Zheng ◽

Ming Chen

Keyword(s):

Cloud Computing ◽

Grid Computing ◽

State Estimation ◽

Distributed Data ◽

Scientific Instruments ◽

Next Generation ◽

Data Intensive ◽

Application Model ◽

Scientific Experiments ◽

The World

The next generation of scientific experiments and studies are being carried out by large collaborations of researchers distributed around the world engaged in analysis of huge collections of data generated by scientific instruments. Grid computing has emerged as an enabler for such collaborations as it aids communities in sharing resource to achieve common objective. This paper defines the problem of scheduling distributed data-intensive application on to Gird resource and presents a formal resource and application model for the problem.

Download Full-text

PrEstoCloud

Information Resources Management Journal ◽

10.4018/irmj.2021010104 ◽

2021 ◽

Vol 34 (1) ◽

pp. 66-85

Author(s):

Yiannis Verginadis ◽

Dimitris Apostolou ◽

Salman Taherizadeh ◽

Ioannis Ledakis ◽

Gregoris Mentzas ◽

...

Keyword(s):

Cloud Computing ◽

Software Engineering ◽

Response Time ◽

Fog Computing ◽

Data Sources ◽

Data Intensive ◽

Service Response Time ◽

Enabling Services ◽

Data Intensive Applications ◽

Multi Cloud

Fog computing extends multi-cloud computing by enabling services or application functions to be hosted close to their data sources. To take advantage of the capabilities of fog computing, serverless and the function-as-a-service (FaaS) software engineering paradigms allow for the flexible deployment of applications on multi-cloud, fog, and edge resources. This article reviews prominent fog computing frameworks and discusses some of the challenges and requirements of FaaS-enabled applications. Moreover, it proposes a novel framework able to dynamically manage multi-cloud, fog, and edge resources and to deploy data-intensive applications developed using the FaaS paradigm. The proposed framework leverages the FaaS paradigm in a way that improves the average service response time of data-intensive applications by a factor of three regardless of the underlying multi-cloud, fog, and edge resource infrastructure.

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

Optimizing VM allocation and data placement for data-intensive applications in cloud using ACO metaheuristic algorithm

Engineering Science and Technology an International Journal ◽

10.1016/j.jestch.2016.11.006 ◽

2017 ◽

Vol 20 (2) ◽

pp. 616-628 ◽

Cited By ~ 25

Author(s):

T.P. Shabeera ◽

S.D. Madhu Kumar ◽

Sameera M. Salam ◽

K. Murali Krishnan

Keyword(s):

Data Placement ◽

Metaheuristic Algorithm ◽

Data Intensive ◽

Vm Allocation ◽

Data Intensive Applications

Download Full-text

Grouping-Aware Data Placement in HDFS for Data-Intensive Applications Based on Graph Clustering

Advances in Computer and Computational Sciences - Advances in Intelligent Systems and Computing ◽

10.1007/978-981-10-3773-3_3 ◽

2017 ◽

pp. 21-31 ◽

Cited By ~ 2

Author(s):

S. Vengadeswaran ◽

S. R. Balasundaram

Keyword(s):

Graph Clustering ◽

Data Placement ◽

Data Intensive ◽

Data Intensive Applications

Download Full-text

On the Benefits of Multipath Routing for Distributed Data-Intensive Applications with High Bandwidth Requirements and Multidomain Reach

2009 Seventh Annual Communication Networks and Services Research Conference ◽

10.1109/cnsr.2009.26 ◽

2009 ◽

Cited By ~ 11

Author(s):

Xiaomin Chen ◽

Mohit Chamania ◽

Admela Jukan ◽

André C. Drummond ◽

Nelson L. S. da Fonseca

Keyword(s):

Multipath Routing ◽

Distributed Data ◽

Data Intensive ◽

High Bandwidth ◽

Data Intensive Applications

Download Full-text

A High-Speed Railway Data Placement Strategy Based on Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.135-136.43 ◽

2011 ◽

Vol 135-136 ◽

pp. 43-49

Author(s):

Han Ning Wang ◽

Wei Xiang Xu ◽

Chao Long Jia

Keyword(s):

Cloud Computing ◽

High Speed ◽

Data Access ◽

Interval Mapping ◽

Data Placement ◽

Programming Algorithm ◽

High Speed Railway ◽

Mapping Algorithm ◽

Data Intensive ◽

Study Results

The application of high-speed railway data, which is an important component of China's transportation science data sharing, has embodied the typical characteristics of data-intensive computing. A reasonable and effective data placement strategy is needed to deploy and execute data-intensive applications in the cloud computing environment. Study results of current data placement approaches have been analyzed and compared in this paper. Combining the semi-definite programming algorithm with the dynamic interval mapping algorithm, a hierarchical structure data placement strategy is proposed. The semi-definite programming algorithm is suitable for the placement of files with various replications, ensuring that different replications of a file are placed on different storage devices. And the dynamic interval mapping algorithm could guarantee better self-adaptability of the data storage system. It has been proved both by theoretical analysis and experiment demonstration that a hierarchical data placement strategy could guarantee the self-adaptability, data reliability and high-speed data access for large-scale networks.

Download Full-text