mapreduce paradigm
Recently Published Documents


TOTAL DOCUMENTS

45
(FIVE YEARS 15)

H-INDEX

7
(FIVE YEARS 2)

2022 ◽  
Vol 6 (1) ◽  
pp. 5
Author(s):  
Giuseppe Di Modica ◽  
Orazio Tomarchio

In the past twenty years, we have witnessed an unprecedented production of data worldwide that has generated a growing demand for computing resources and has stimulated the design of computing paradigms and software tools to efficiently and quickly obtain insights on such a Big Data. State-of-the-art parallel computing techniques such as the MapReduce guarantee high performance in scenarios where involved computing nodes are equally sized and clustered via broadband network links, and the data are co-located with the cluster of nodes. Unfortunately, the mentioned techniques have proven ineffective in geographically distributed scenarios, i.e., computing contexts where nodes and data are geographically distributed across multiple distant data centers. In the literature, researchers have proposed variants of the MapReduce paradigm that obtain awareness of the constraints imposed in those scenarios (such as the imbalance of nodes computing power and of interconnecting links) to enforce smart task scheduling strategies. We have designed a hierarchical computing framework in which a context-aware scheduler orchestrates computing tasks that leverage the potential of the vanilla Hadoop framework within each data center taking part in the computation. In this work, after presenting the features of the developed framework, we advocate the opportunity of fragmenting the data in a smart way so that the scheduler produces a fairer distribution of the workload among the computing tasks. To prove the concept, we implemented a software prototype of the framework and ran several experiments on a small-scale testbed. Test results are discussed in the last part of the paper.


Author(s):  
Imad Sassi ◽  
Samir Anter ◽  
Abdelkrim Bekkhoucha

<span lang="EN-US">Hidden </span><span lang="IN">M</span><span lang="EN-US">arkov models (HMMs) are one of machine learning algorithms which have been widely used and demonstrated their efficiency in many conventional applications. This paper proposes a modified posterior decoding algorithm to solve hidden Markov models decoding problem based on MapReduce paradigm and spark’s resilient distributed dataset (RDDs) concept, for large-scale data processing. The objective of this work is to improve the performances of HMM to deal with big data challenges. The proposed algorithm shows a great improvement in reducing time complexity and provides good results in terms of running time, speedup, and parallelization efficiency for a large amount of data, i.e., large states number and large sequences number.</span>


2021 ◽  
Vol 5 (3) ◽  
pp. 38
Author(s):  
Wei Li ◽  
Maolin Tang

This paper identifies four common misconceptions about the scalability of volunteer computing on big data problems. The misconceptions are then clarified by analyzing the relationship between scalability and the impact factors including the problem size of big data, the heterogeneity and dynamics of volunteers, and the overlay structure. This paper proposes optimization strategies to find the optimal overlay for the given big data problem. This paper forms multiple overlays to optimize the performance of individual steps in terms of MapReduce paradigm. The optimization is to achieve the maximum overall performance by using a minimum number of volunteers, not overusing resources. This paper has demonstrated that the simulations on the concerned factors can fast find the optimization points. This paper concludes that always welcoming more volunteers is an overuse of available resources because they do not always bring benefit to the overall performance. Finding optimal use of volunteers are possible for the given big data problems even on the dynamics and opportunism of volunteers.


2021 ◽  
pp. 145-160
Author(s):  
Douglas E. Comer
Keyword(s):  

2021 ◽  
Vol 2 (1) ◽  
pp. 55-60
Author(s):  
Yusifov S.I ◽  
Ragimova N.A ◽  
Abdullayev V.H ◽  
Khalilov M.E

The rapid development of information technologies accelerates the approximations of industry 4.0, which is why sectors of the economy and science must adapt to these changes. Global changes in geography have led to the emergence of a new scientific discipline called geoinformatics. It then provides insight into the Smart Geographic Area, its structure and the main components. To do this, there used methods for communicating the main components IIoT, IoE), for analyzing data (Big Data, Hadoop), for managing processes (CPs), for storing data (Cloud Computing, Fog Computing). As a result of the study, there was developed a Smart Geographic Area algorithm based on the MapReduce paradigm.


Sign in / Sign up

Export Citation Format

Share Document