“Saksham Model” Performance Improvisation Using Node Capability Evaluation in Apache Hadoop

2021 ◽

pp. 1282-1302

Author(s):

Ankit Shah ◽

Mamta C. Padole

Keyword(s):

Big Data ◽

Distributed Computing ◽

Data Processing ◽

Data Storage ◽

Model Performance ◽

Big Data Processing ◽

Apache Hadoop ◽

Processing Capability ◽

Proposed Model ◽

Capability Evaluation

Big Data processing and analysis requires tremendous processing capability. Distributed computing brings many commodity systems under the common platform to answer the need for Big Data processing and analysis. Apache Hadoop is the most suitable set of tools for Big Data storage, processing, and analysis. But Hadoop found to be inefficient when it comes to heterogeneous set computers which have different processing capabilities. In this research, we propose the Saksham model which optimizes the processing time by efficient use of node processing capability and file management. The proposed model shows the performance improvement for Big Data processing. To achieve better performance, Saksham model uses two vital aspects of heterogeneous distributed computing: Effective block rearrangement policy and use of node processing capability. The results demonstrate that the proposed model successfully achieves better job execution time and improves data locality.

Download Full-text

NoSQL Databases

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch008 ◽

2014 ◽

pp. 186-215 ◽

Cited By ~ 2

Author(s):

Ganesh Chandra Deka

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Open Source ◽

Data Storage ◽

Big Data Processing ◽

Nosql Databases ◽

Data Intensive ◽

Huge Data ◽

Data Intensive Applications

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.

Download Full-text

Design of Big Data Processing System Architecture Based on Hadoop under the Cloud Computing

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.556-562.6302 ◽

2014 ◽

Vol 556-562 ◽

pp. 6302-6306 ◽

Cited By ~ 3

Author(s):

Chun Mei Duan

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Data Storage ◽

System Architecture ◽

Processing System ◽

System Stability ◽

Security Model ◽

Data Processing System ◽

Big Data Processing

In allusion to limitations of traditional data processing technology in big data processing, big data processing system architecture based on hadoop is designed, using the characteristics of quantification, unstructured and dynamic of cloud computing.It uses HDFS be responsible for big data storage, and uses MapReduce be responsible for big data calculation and uses Hbase as unstructured data storage database, at the same time a system of storage and cloud computing security model are designed, in order to implement efficient storage, management, and retrieval of data,thus it can save construction cost, and guarantee system stability, reliability and security.

Download Full-text

Employing Vertical Elasticity for Efficient Big Data Processing in Container-Based Cloud Environments

Applied Sciences ◽

10.3390/app11136200 ◽

2021 ◽

Vol 11 (13) ◽

pp. 6200

Author(s):

Jin-young Choi ◽

Minkyoung Cho ◽

Jik-Soo Kim

Keyword(s):

Cloud Computing ◽

Big Data ◽

Data Processing ◽

Data Storage ◽

Resource Utilization ◽

System Throughput ◽

Big Data Processing ◽

Cloud Environments ◽

Utilization Scheme ◽

Adaptive Resource

Recently, “Big Data” platform technologies have become crucial for distributed processing of diverse unstructured or semi-structured data as the amount of data generated increases rapidly. In order to effectively manage these Big Data, Cloud Computing has been playing an important role by providing scalable data storage and computing resources for competitive and economical Big Data processing. Accordingly, server virtualization technologies that are the cornerstone of Cloud Computing have attracted a lot of research interests. However, conventional hypervisor-based virtualization can cause performance degradation problems due to its heavily loaded guest operating systems and rigid resource allocations. On the other hand, container-based virtualization technology can provide the same level of service faster with a lightweight capacity by effectively eliminating the guest OS layers. In addition, container-based virtualization enables efficient cloud resource management by dynamically adjusting the allocated computing resources (e.g., CPU and memory) during the runtime through “Vertical Elasticity”. In this paper, we present our practice and experience of employing an adaptive resource utilization scheme for Big Data workloads in container-based cloud environments by leveraging the vertical elasticity of Docker, a representative container-based virtualization technique. We perform extensive experiments running several Big Data workloads on representative Big Data platforms: Apache Hadoop and Spark. During the workload executions, our adaptive resource utilization scheme periodically monitors the resource usage patterns of running containers and dynamically adjusts allocated computing resources that could result in substantial improvements in the overall system throughput.

Download Full-text

Researching a Distributed Computing Automation Platform for Big Data Processing

2020 International Conference Engineering and Telecommunication (En&T) ◽

10.1109/ent50437.2020.9431254 ◽

2020 ◽

Author(s):

Nadezhda Bahareva ◽

Yury Ushakov ◽

Margarita Ushakova ◽

Denis Parfenov ◽

Leonid Legashev ◽

...

Keyword(s):

Big Data ◽

Distributed Computing ◽

Data Processing ◽

Big Data Processing

Download Full-text

BIG DATA PROCESSING: BIG CHALLENGES AND OPPORTUNITIES

Journal of Interconnection Networks ◽

10.1142/s0219265912500090 ◽

2012 ◽

Vol 13 (03n04) ◽

pp. 1250009 ◽

Cited By ~ 14

Author(s):

CHANGQING JI ◽

YU LI ◽

WENMING QIU ◽

YINGWEI JIN ◽

YUJIE XU ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Data Management ◽

Data Storage ◽

Large Scale ◽

Distributed Applications ◽

Big Data Processing ◽

Cloud Data ◽

Management Platform ◽

Challenges And Opportunities

With the rapid growth of emerging applications like social network, semantic web, sensor networks and LBS (Location Based Service) applications, a variety of data to be processed continues to witness a quick increase. Effective management and processing of large-scale data poses an interesting but critical challenge. Recently, big data has attracted a lot of attention from academia, industry as well as government. This paper introduces several big data processing techniques from system and application aspects. First, from the view of cloud data management and big data processing mechanisms, we present the key issues of big data processing, including definition of big data, big data management platform, big data service models, distributed file system, data storage, data virtualization platform and distributed applications. Following the MapReduce parallel processing framework, we introduce some MapReduce optimization strategies reported in the literature. Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.

Download Full-text

A Novel Wireless Sensor Network Architecture Based on Cloud Computing and Big Data

International Journal of Online Engineering (iJOE) ◽

10.3991/ijoe.v13i12.7890 ◽

2017 ◽

Vol 13 (12) ◽

pp. 18 ◽

Cited By ~ 2

Author(s):

Changtong Song

Keyword(s):

Big Data ◽

Wireless Sensor Network ◽

Data Processing ◽

Sensor Network ◽

Network Architecture ◽

Wireless Sensor ◽

Big Data Processing ◽

Optimal Network ◽

Proposed Model ◽

The Relationship

<span style="font-family: 'Times New Roman',serif; font-size: 10pt; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-language: DE; mso-bidi-language: AR-SA;">To explore big data processing and its application in wireless sensor network (WSN), this paper studies structural construction of the WSN based on big data processing, and numerically simulates SVC4WSN and MDF4LWSN architectures. Moreover, the relationship between the optimal network layer and node communication radius was verified at different node densities. The results indicate that the proposed model achieved better lifecycle and loading balancing effect than the other network.</span>

Download Full-text

Optimized and Efficient Computation of Big Data in Heterogonous Internet of Things

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1801.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6005-6010

Keyword(s):

Big Data ◽

Internet Of Things ◽

Wireless Communication ◽

Distributed Computing ◽

Data Processing ◽

Phase 1 ◽

Efficient Computation ◽

Big Data Processing ◽

Memory Overhead ◽

Cpu Usage

The development of information technology, distributed computing, hardware, wireless communication and intelligent technology has been increased in Internet of Things (IoT) heterogonous filed to improve the limitations of cloud computing in big data processing. Computation of data over wireless communication based distributed computing face different challenges in term of off-loading decision, data delay in heterogonous IoT devices. Optimization of caching, data computation and load maintenance of different edge clouds is still a challenging task in heterogonous IOT for effective processing of big data. So that this paper presents Novel Optimized and Sorted Positional Index List (OSPIL) approach on big data processing to reduce and optimize delay, I/O cost, CPU usage and memory overhead significantly. In this approach sorted index is used to build attributes are (data consists different attributes) arranged in ascending order. This approach consists two phases in data processing, in Phase 1, scan the depth of all the sorted list and schedule the processing data. In phase 2, explore the sorted list and then give results in sequential order on hash table. Experimental results of proposed approach give better and significant data processing results to optimize delay, I/O cost, CPU usage and memory overhead on real world data sets relates wireless communication

Download Full-text

The role of clustering algorithm-based big data processing in information economy development

PLoS ONE ◽

10.1371/journal.pone.0246718 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0246718

Author(s):

Hongyan Ma

Keyword(s):

Big Data ◽

Energy Consumption ◽

Power Systems ◽

Data Processing ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Economic Dispatch ◽

Big Data Processing ◽

Proposed Model ◽

New Energy

The purposes are to evaluate the Distributed Clustering Algorithm (DCA) applicability in the power system’s big data processing and find the information economic dispatch strategy suitable for new energy consumption in power systems. A two-layer DCA algorithm is proposed based on K-Means Clustering (KMC) and Affinity Propagation (AP) clustering algorithms. Then the incentive Demand Response (DR) is introduced, and the DR flexibility of the user side is analyzed. Finally, the day-ahead dispatch and real-time dispatch schemes are combined, and a multi-period information economic dispatch model is constructed. The algorithm performance is analyzed according to case analyses of new energy consumption. Results demonstrate that the two-layer DCA’s calculation time is 5.23s only, the number of iterations is small, and the classification accuracy rate reaches 0.991. Case 2 corresponding to the proposed model can consume the new energy, and the income of the aggregator can be maximized. In short, the multi-period information economic dispatch model can consume the new energy and meet the DR of the user side.

Download Full-text

A Critical Analysis of Apache Hadoop and Spark for Big Data Processing

10.1109/ispcc53510.2021.9609518 ◽

2021 ◽

Author(s):

Piyush Sewal ◽

Hari Singh

Keyword(s):

Big Data ◽

Data Processing ◽

Critical Analysis ◽

Big Data Processing ◽

Apache Hadoop

Download Full-text