Retraction: Parallel Algorithm for Reduction of Data Processing Time in Big Data (Journal of Physics: Conference Series 1432 012095)

Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental problem for data-parallel frameworks such as Hadoop and Spark. This problem is exacerbated in multicore server-based clusters, where multiple tasks running on the same server compete for the server’s network bandwidth. Existing approaches solve this problem by scheduling computational tasks near the input data and considering the server’s free time, data placements, and data transfer costs. However, such approaches usually set identical values for data transfer costs, even though a multicore server’s data transfer cost increases with the number of data-remote tasks. Eventually, this hampers data-processing time, by minimizing it ineffectively. As a solution, we propose DynDL (Dynamic Data Locality), a novel data-locality-aware task-scheduling model that handles dynamic data transfer costs for multicore servers. DynDL offers greater flexibility than existing approaches by using a set of non-decreasing functions to evaluate dynamic data transfer costs. We also propose online and offline algorithms (based on DynDL) that minimize data-processing time and adaptively adjust data locality. Although DynDL is NP-complete (nondeterministic polynomial-complete), we prove that the offline algorithm runs in quadratic time and generates optimal results for DynDL’s specific uses. Using a series of simulations and real-world executions, we show that our algorithms are 30% better than algorithms that do not consider dynamic data transfer costs in terms of data-processing time. Moreover, they can adaptively adjust data localities based on the server’s free time, data placement, and network bandwidth, and schedule tens of thousands of tasks within subseconds or seconds.

Download Full-text

Big Data Deployment for an Efficient Resource Prerequisite Job

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8163 ◽

2019 ◽

Vol 16 (8) ◽

pp. 3211-3215 ◽

Cited By ~ 1

Author(s):

S. Prince Mary ◽

D. Usha Nandini ◽

B. Ankayarkanni ◽

R. Sathyabama Krishna

Keyword(s):

Big Data ◽

Data Processing ◽

Processing Time ◽

Map Reduce ◽

Efficient Resource ◽

Server System ◽

Distributed Server

Integration of cloud and big data very difficult and challenging task and to find the number of resources to complete their job is very difficult and challenging. So, virtualization is implemented it involves 3 phases map reduce, shuffle phase and reduce phase. Many researchers have been done already they have applied Heterogeneousmap reduce application and they use least-work-left policy technique to distributed server system. In this paper we have discussed about virtualization is used for hadoop jobs for effective data processing and to find the processing time of job and balance partition algorithm is used. The main objective is to implement virtualization in our local machines.

Download Full-text

Vertical Data Processing for Mining Big Data: A Predicate Tree Approach

10.29007/db8n ◽

2019 ◽

Author(s):

Mohammad Hossain ◽

Maninder Singh ◽

Sameer Abufardeh

Keyword(s):

Data Mining ◽

Big Data ◽

Data Processing ◽

Processing Time ◽

Traditional Approach ◽

Critical Factor ◽

Boolean Operations ◽

Data Mining Algorithms ◽

Vertical Data ◽

Big Data Application

Time is a critical factor in processing a very large volume of data a.k.a ‘Big Data’. Many existing data mining algorithms (supervised and unsupervised) become futile because of the ubiquitous use of horizontal processing i.e. row-by-row processing of stored data. Processing time for big data is further exacerbated by its high dimensionality (# of features) and high cardinality (# of records). To address this processing-time issue, we proposed a vertical approach with predicate trees (pTree). Our approach structures data into columns of bit slices, which range from few to hundreds and are processed vertically i.e. column by column. We tested and compared our vertical approach to traditional (horizontal) approach using three basic Boolean operations namely addition, subtraction and multiplication with 10 data sizes. The length of data size ranged from half a billion bits to 5 billion bits. The results are analyzed w.r.t processing speed time and speed gain for both the approaches. The result shows that our vertical approach outperformed the traditional approach for all Boolean operations (add, subtract and multiply) across all data sizes and results in speed-gain between 24% to 96%. We concluded from our results that our approach being in data-mining ready format is best suited to apply to operations involving complex computations in big data application to achieve significant speed gain.

Download Full-text