scholarly journals ANN-inspired Straggler Map Reduce Detection in Big Data Processing

CONVERTER ◽  
2021 ◽  
pp. 116-127
Author(s):  
Ajay Bansal, Manmohan Sharma, Ashu Gupta

One of the most challenging aspects of using MapReduce to parallelize and distribute large-scale data processingis detecting straggler tasks. Identifying ongoing tasks on weak nodes is how it’s described. The total computation time isthe sum of the execution times of the two stages in the Map process (copy, combine) and the three stages in the Reducephase (shuffle, sort, and reduce). The main aim of this paper is to estimate the accurate execution time in each location. Theproposed approach uses a backpropagation neural network on Hadoop to detect straggler tasks and calculate the remainingtask execution time, which is crucial in straggler task identification. The comparative analysis is done with some efficientmodels in this domain, such as LATE, ESAMR, and the real remaining time for WordCount and Sort benchmarks. It wasfound that the proposed model is capable of detecting straggler tasks in accurately estimating execution time. It also helpsin reducing the execution time that it takes to complete a task.

Author(s):  
Karthikeyani Visalakshi N. ◽  
Shanthi S. ◽  
Lakshmi K.

Cluster analysis is the prominent data mining technique in knowledge discovery and it discovers the hidden patterns from the data. The K-Means, K-Modes and K-Prototypes are partition based clustering algorithms and these algorithms select the initial centroids randomly. Because of its random selection of initial centroids, these algorithms provide the local optima in solutions. To solve these issues, the strategy of Crow Search algorithm is employed with these algorithms to obtain the global optimum solution. With the advances in information technology, the size of data increased in a drastic manner from terabytes to petabytes. To make proposed algorithms suitable to handle these voluminous data, the phenomena of parallel implementation of these clustering algorithms with Hadoop Mapreduce framework. The proposed algorithms are experimented with large scale data and the results are compared in terms of cluster evaluation measures and computation time with the number of nodes.


Cluster analysis is the prominent data mining technique in knowledge discovery and it discovers the hidden patterns from the data. The K-Means, K-Modes and K-Prototypes are partition based clustering algorithms and these algorithms select the initial centroids randomly. Because of its random selection of initial centroids, these algorithms provide the local optima in solutions. To solve these issues, the strategy of Crow Search algorithm is employed with these algorithms to obtain the global optimum solution. With the advances in information technology, the size of data increased in a drastic manner from terabytes to petabytes. To make proposed algorithms suitable to handle these voluminous data, the phenomena of parallel implementation of these clustering algorithms with Hadoop Mapreduce framework. The proposed algorithms are experimented with large scale data and the results are compared in terms of cluster evaluation measures and computation time with the number of nodes.


2021 ◽  
pp. 1-18
Author(s):  
Salahaldeen Rababa ◽  
Amer Al-Badarneh

Large-scale datasets collected from heterogeneous sources often require a join operation to extract valuable information. MapReduce is an efficient programming model for processing large-scale data. However, it has some limitations in processing heterogeneous datasets. This is because of the large amount of redundant intermediate records that are transferred through the network. Several filtering techniques have been developed to improve the join performance, but they require multiple MapReduce jobs to process the input datasets. To address this issue, the adaptive filter-based join algorithms are presented in this paper. Specifically, three join algorithms are introduced to perform the processes of filters creation and redundant records elimination within a single MapReduce job. A cost analysis of the introduced join algorithms shows that the I/O cost is reduced compared to the state-of-the-art filter-based join algorithms. The performance of the join algorithms was evaluated in terms of the total execution time and the total amount of I/O data transferred. The experimental results show that the adaptive Bloom join, semi-adaptive intersection Bloom join, and adaptive intersection Bloom join decrease the total execution time by 30%, 25%, and 35%, respectively; and reduce the total amount of I/O data transferred by 18%, 25%, and 50%, respectively.


Author(s):  
Anwar H. Katrawi ◽  
Rosni Abdullah ◽  
Mohammed Anbar ◽  
Ammar Kamal Abasi

Using MapReduce in Hadoop helps in lowering the execution time and power consumption for large scale data. However, there can be a delay in job processing in circumstances where tasks are assigned to bad or congested machines called "straggler tasks"; which increases the time, power consumptions and therefore increasing the costs and leading to a poor performance of computing systems. This research proposes a hybrid MapReduce framework referred to as the combinatory late-machine (CLM) framework. Implementation of this framework will facilitate early and timely detection and identification of stragglers thereby facilitating prompt appropriate and effective actions.


2021 ◽  
Author(s):  
Daniel Silver ◽  
Thiago H Silva

Why some neighbourhoods change over time but others retain their identity remains an open question. Several attempts have been made to answer this question, with a family of models emerging as a result. However, empirically evaluating neighbourhood evolution models is a challenging task, because most require information that is difficult to obtain in traditional sources. For this reason, researchers have turned to new datasets, such as census microdata, Twitter, and Yelp. In this study, we articulate a functional model of neighbourhood change and continuity, adapted from a classical functionalist model proposed by Stinchcombe in 1968. We argue this model provides a relatively simple way to capture key aspects of the complex causal structure of neighbourhood change that are implicit in much neighbourhood change research but rarely formulated explicitly. We demonstrate how to assess the proposed model empirically using large-scale data from Yelp.com. Our results indicate that our approach can potentially help to understand the nature of neighbourhood change and be useful in different applications.


2018 ◽  
Vol 5 (2) ◽  
pp. 1-20
Author(s):  
Sudhansu Shekhar Patra ◽  
Veena Goswami

Due to the advancements in virtualization technology, it is now an up and coming field and has become a more appealing area of internet technology. Since there is a rapid growth for the demand of computational power increases by scientific, business, and web-applications, it leads to the creation of large-scale data centers. These data centers consume enormous amounts of electrical power. In this article, the authors study energy saving methods by consolidation and by switching off those virtual machines which are not in use. According to this policy, c virtual machines continue serving the customer until the number of idle server attains the threshold level d; then d idle servers take synchronous vacation simultaneously, otherwise these servers would begin serving the customers. Numerical results are provided to demonstrate the applicability of the proposed model for data center management in particular, to quantify the tradeoff theoretically between the conflicting aims of energy efficiency and QoS.


2019 ◽  
Vol 2019 ◽  
pp. 1-7
Author(s):  
Mohammad Shabaz ◽  
Ashok Kumar

Sorting is one of the operations on data structures used in a special situation. Sorting is defined as an arrangement of data or records in a particular logical order. A number of algorithms are developed for sorting the data. The reason behind developing these algorithms is to optimize the efficiency and complexity. The work on creating new sorting approaches is still going on. With the rise in the generation of big data, the concept of big number comes into existence. To sort thousands of records either sorted or unsorted, traditional sorting approaches can be used. In those cases, we can ignore the complexities as very minute difference exists in their execution time. But in case the data are very large, where execution time or processed time of billion or trillion of records is very large, we cannot ignore the complexity at this situation; therefore, an optimized sorting approach is required. Thus, SA sorting is one of the approaches developed to check sorted big numbers as it works better on sorted numbers than quick sort and many others. It can also be used to sort unsorted records as well.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

Sign in / Sign up

Export Citation Format

Share Document