Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

Author(s):  
Georgios L. Stavrinides ◽  
Helen D. Karatza
2007 ◽  
Vol 41 (2) ◽  
pp. 83-88
Author(s):  
Flavio P. Junqueira ◽  
Vassilis Plachouras ◽  
Fabrizio Silvestri ◽  
Ivana Podnar

2020 ◽  
Vol 2 (1) ◽  
pp. 92
Author(s):  
Rahim Rahmani ◽  
Ramin Firouzi ◽  
Sachiko Lim ◽  
Mahbub Alam

The major challenges of operating data-intensive of Distributed Ledger Technology (DLT) are (1) to reach consensus on the main chain as a set of validators cast public votes to decide on which blocks to finalize and (2) scalability on how to increase the number of chains which will be running in parallel. In this paper, we introduce a new proximal algorithm that scales DLT in a large-scale Internet of Things (IoT) devices network. We discuss how the algorithm benefits the integrating DLT in IoT by using edge computing technology, taking the scalability and heterogeneous capability of IoT devices into consideration. IoT devices are clustered dynamically into groups based on proximity context information. A cluster head is used to bridge the IoT devices with the DLT network where a smart contract is deployed. In this way, the security of the IoT is improved and the scalability and latency are solved. We elaborate on our mechanism and discuss issues that should be considered and implemented when using the proposed algorithm, we even show how it behaves with varying parameters like latency or when clustering.


Author(s):  
Valentin Tablan ◽  
Ian Roberts ◽  
Hamish Cunningham ◽  
Kalina Bontcheva

Cloud computing is increasingly being regarded as a key enabler of the ‘democratization of science’, because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research—GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost–benefit analysis and usage evaluation.


2015 ◽  
Vol 2015 ◽  
pp. 1-9 ◽  
Author(s):  
Sol Ji Kang ◽  
Sang Yeon Lee ◽  
Keon Myung Lee

With problem size and complexity increasing, several parallel and distributed programming models and frameworks have been developed to efficiently handle such problems. This paper briefly reviews the parallel computing models and describes three widely recognized parallel programming frameworks: OpenMP, MPI, and MapReduce. OpenMP is the de facto standard for parallel programming on shared memory systems. MPI is the de facto industry standard for distributed memory systems. MapReduce framework has become the de facto standard for large scale data-intensive applications. Qualitative pros and cons of each framework are known, but quantitative performance indexes help get a good picture of which framework to use for the applications. As benchmark problems to compare those frameworks, two problems are chosen: all-pairs-shortest-path problem and data join problem. This paper presents the parallel programs for the problems implemented on the three frameworks, respectively. It shows the experiment results on a cluster of computers. It also discusses which is the right tool for the jobs by analyzing the characteristics and performance of the paradigms.


2014 ◽  
Vol 46 (4) ◽  
pp. 1-31 ◽  
Author(s):  
Anne-Cecile Orgerie ◽  
Marcos Dias de Assuncao ◽  
Laurent Lefevre

Sign in / Sign up

Export Citation Format

Share Document