A Comprehensive Survey on Data-Intensive Computing and MapReduce Paradigm in Cloud Computing Environments

Performance Evaluations of Distributed File Systems for Scientific Big Data in FUSE Environment

Electronics ◽

10.3390/electronics10121471 ◽

2021 ◽

Vol 10 (12) ◽

pp. 1471

Author(s):

Jun-Yeong Lee ◽

Moon-Hyun Kim ◽

Syed Asif Raza Raza Shah ◽

Sang-Un Ahn ◽

Heejun Yoon ◽

...

Keyword(s):

Data Storage ◽

Scale Up ◽

File Systems ◽

Performance Evaluations ◽

Distributed File Systems ◽

Data Intensive Computing ◽

Data Intensive ◽

Tremendous Amount ◽

Computing Environments ◽

And Performance

Data are important and ever growing in data-intensive scientific environments. Such research data growth requires data storage systems that play pivotal roles in data management and analysis for scientific discoveries. Redundant Array of Independent Disks (RAID), a well-known storage technology combining multiple disks into a single large logical volume, has been widely used for the purpose of data redundancy and performance improvement. However, this requires RAID-capable hardware or software to build up a RAID-enabled disk array. In addition, it is difficult to scale up the RAID-based storage. In order to mitigate such a problem, many distributed file systems have been developed and are being actively used in various environments, especially in data-intensive computing facilities, where a tremendous amount of data have to be handled. In this study, we investigated and benchmarked various distributed file systems, such as Ceph, GlusterFS, Lustre and EOS for data-intensive environments. In our experiment, we configured the distributed file systems under a Reliable Array of Independent Nodes (RAIN) structure and a Filesystem in Userspace (FUSE) environment. Our results identify the characteristics of each file system that affect the read and write performance depending on the features of data, which have to be considered in data-intensive computing environments.

Download Full-text

A New Data Classification Algorithm for Data-Intensive Computing Environments

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.3318 ◽

2013 ◽

Vol 756-759 ◽

pp. 3318-3323

Author(s):

Qi Zhi Deng ◽

Long Bo Zhang ◽

Xin Qian ◽

Ya Li Chen ◽

Feng Ying Wang

Keyword(s):

Data Mining ◽

Large Datasets ◽

Data Availability ◽

Learning Method ◽

Data Intensive Computing ◽

Data Intensive ◽

Distributed Computations ◽

Split Point ◽

Computing Environments ◽

Mapreduce Model

In order to solve the problem of how to improve the scalability of data processing capabilities and the data availability which encountered by data mining techniques for Data-intensive computing, a new method of tree learning is presented in this paper. By introducing the MapReduce, the tree learning method based on SPRINT can obtain a well scalability when address large datasets. Moreover, we define the process of split point as a series of distributed computations, which is implemented with the MapReduce model respectively. And a new data structure called class distribution table is introduced to assist the calculation of histogram. Experiments and results analysis shows that the algorithm has strong processing capabilities of data mining for data-intensive computing environments.

Download Full-text

Data classification algorithm for data-intensive computing environments

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-017-1002-4 ◽

2017 ◽

Vol 2017 (1) ◽

Cited By ~ 1

Author(s):

Tiedong Chen ◽

Shifeng Liu ◽

Daqing Gong ◽

Honghu Gao

Keyword(s):

Data Classification ◽

Classification Algorithm ◽

Data Intensive Computing ◽

Data Intensive ◽

Computing Environments

Download Full-text

An Inter-framework Cache for Diverse Data-Intensive Computing Environments

2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity) ◽

10.1109/smartcity.2015.192 ◽

2015 ◽

Author(s):

Chun-Yu Wang ◽

Tzu-En Huang ◽

Yu-Tang Huang ◽

Jyh-Biau Chang ◽

Ce-Kuen Shieh

Keyword(s):

Data Intensive Computing ◽

Data Intensive ◽

Computing Environments ◽

Diverse Data

Download Full-text

A Comprehensive Survey of Services Provided by Prevalent Cloud Computing Environments

Smart Intelligent Computing and Applications - Smart Innovation, Systems and Technologies ◽

10.1007/978-981-13-1921-1_41 ◽

2018 ◽

pp. 413-424 ◽

Cited By ~ 3

Author(s):

N. Joshi ◽

S. Shah

Keyword(s):

Cloud Computing ◽

Comprehensive Survey ◽

Computing Environments

Download Full-text

Swarm scheduling approaches for work-flow applications with security constraints in distributed data-intensive computing environments

Information Sciences ◽

10.1016/j.ins.2011.12.032 ◽

2012 ◽

Vol 192 ◽

pp. 228-243 ◽

Cited By ~ 49

Author(s):

Hongbo Liu ◽

Ajith Abraham ◽

Václav Snášel ◽

Seán McLoone

Keyword(s):

Work Flow ◽

Distributed Data ◽

Data Intensive Computing ◽

Data Intensive ◽

Computing Environments

Download Full-text

A New Data Classification Algorithm for Data-Intensive Computing Environments

Proceedings of the 2012 2nd International Conference on Computer and Information Applications (ICCIA 2012) ◽

10.2991/iccia.2012.335 ◽

2012 ◽

Author(s):

Qizhi Deng ◽

Longbo Zhang ◽

Xin Qian ◽

Yali Chen ◽

Fengying Wang

Keyword(s):

Data Classification ◽

Classification Algorithm ◽

Data Intensive Computing ◽

Data Intensive ◽

Computing Environments

Download Full-text

Particle Swarm Approach to Scheduling Work-Flow Applications in Distributed Data-Intensive Computing Environments

Sixth International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2006.253915 ◽

2006 ◽

Cited By ~ 6

Author(s):

Hongbo Liu ◽

Shichang Sun ◽

Ajith Abraham

Keyword(s):

Particle Swarm ◽

Work Flow ◽

Distributed Data ◽

Data Intensive Computing ◽

Data Intensive ◽

Computing Environments

Download Full-text

Challenges and Cloud Computing Environments Towards Big Data

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset207277 ◽

2014 ◽

pp. 203-208

Author(s):

Kiran Kumar S V N Madupu

Keyword(s):

Data Mining ◽

Cloud Computing ◽

Big Data ◽

Technology Development ◽

Computing Environments ◽

Modern Technologies

Big Data has terrific influence on scientific discoveries and also value development. This paper presents approaches in data mining and modern technologies in Big Data. Difficulties of data mining as well as data mining with big data are discussed. Some technology development of data mining as well as data mining with big data are additionally presented.

Download Full-text

The bounds of the distributed data-intensive computing systems

Pollack Periodica ◽

10.1556/pollack.2.2007.s.8 ◽

2007 ◽

Vol 2 (Supplement 1) ◽

pp. 85-96 ◽

Cited By ~ 1

Author(s):

Antal Buza

Keyword(s):

Distributed Data ◽

Data Intensive Computing ◽

Computing Systems ◽

Data Intensive

Download Full-text