Particle Swarm Approach to Scheduling Work-Flow Applications in Distributed Data-Intensive Computing Environments

In the distributed data-intensive computing environment, relegating certain assignments to specific machines in a protected way is a major test for the employment planning issue. The unpredictability of this issue increments with the size of the activity and it is hard to understand viably. A few metaheuristic calculations including particle swarm optimization (PSO) strategy and variable neighborhood particle swarm optimization VNPSO) system are utilized to tackle the employment planning issue in distributed computing. While allocating assignments to the machines, to fulfill the security requirements and to limit the cost capacity, we proposed an altered PSO with a scout adjustment (MPSO-SA) calculation which utilized a cyclic term called change administrator to get the best cost capacity. The exhibition of the proposed MPSO-SA booking component is contrasted and the Genetic calculation (GA), PSO and VNPSO systems and the exploratory outcome demonstrate that the proposed technique diminishes the likelihood of hazard with security requirements and it has preferable intermingling property over the current conventions.

Download Full-text

The bounds of the distributed data-intensive computing systems

Pollack Periodica ◽

10.1556/pollack.2.2007.s.8 ◽

2007 ◽

Vol 2 (Supplement 1) ◽

pp. 85-96 ◽

Cited By ~ 1

Author(s):

Antal Buza

Keyword(s):

Distributed Data ◽

Data Intensive Computing ◽

Computing Systems ◽

Data Intensive

Download Full-text

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Electronics ◽

10.3390/electronics10121471 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1471

Author(s):

Jun-Yeong Lee ◽

Moon-Hyun Kim ◽

Syed Asif Raza Raza Shah ◽

Sang-Un Ahn ◽

Heejun Yoon ◽

...

Keyword(s):

Data Storage ◽

Scale Up ◽

File Systems ◽

Performance Evaluations ◽

Distributed File Systems ◽

Data Intensive Computing ◽

Data Intensive ◽

Tremendous Amount ◽

Computing Environments ◽

And Performance

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.

Download Full-text

A Comprehensive Survey on Data-Intensive Computing and MapReduce Paradigm in Cloud Computing Environments

Informatics and Communication Technologies for Societal Development ◽

10.1007/978-81-322-1916-3_9 ◽

2014 ◽

pp. 85-93

Author(s):

Girish Neelakanta Iyer ◽

Salaja Silas

Keyword(s):

Cloud Computing ◽

Data Intensive Computing ◽

Data Intensive ◽

Comprehensive Survey ◽

Computing Environments ◽

Mapreduce Paradigm

Download Full-text

G-Hadoop: MapReduce across distributed data centers for data-intensive computing

Future Generation Computer Systems ◽

10.1016/j.future.2012.09.001 ◽

2013 ◽

Vol 29 (3) ◽

pp. 739-750 ◽

Cited By ~ 227

Author(s):

Lizhe Wang ◽

Jie Tao ◽

Rajiv Ranjan ◽

Holger Marten ◽

Achim Streit ◽

...

Keyword(s):

Data Centers ◽

Distributed Data ◽

Data Intensive Computing ◽

Data Intensive ◽

Hadoop Mapreduce

Download Full-text

A New Data Classification Algorithm for Data-Intensive Computing Environments

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3318 ◽

2013 ◽

Vol 756-759 ◽

pp. 3318-3323

Author(s):

Qi Zhi Deng ◽

Long Bo Zhang ◽

Xin Qian ◽

Ya Li Chen ◽

Feng Ying Wang

Keyword(s):

Data Mining ◽

Large Datasets ◽

Data Availability ◽

Learning Method ◽

Data Intensive Computing ◽

Data Intensive ◽

Distributed Computations ◽

Split Point ◽

Computing Environments ◽

Mapreduce Model

In order to solve the problem of how to improve the scalability of data processing capabilities and the data availability which encountered by data mining techniques for Data-intensive computing, a new method of tree learning is presented in this paper. By introducing the MapReduce, the tree learning method based on SPRINT can obtain a well scalability when address large datasets. Moreover, we define the process of split point as a series of distributed computations, which is implemented with the MapReduce model respectively. And a new data structure called class distribution table is introduced to assist the calculation of histogram. Experiments and results analysis shows that the algorithm has strong processing capabilities of data mining for data-intensive computing environments.

Download Full-text