Scheduling Data-Intensive Workloads in Large-Scale Distributed Systems: Trends and Challenges

An Integrated Specification and Verification Environment for Component-Based Architectures of Large-Scale Distributed Systems

10.21236/ada501823 ◽

2009 ◽

Cited By ~ 1

Author(s):

John Hatcliff ◽

Torben Amtoft ◽

Anindya Banerjee

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Specification And Verification

Download Full-text

Workshop on large-scale distributed systems for information retrieval

ACM SIGIR Forum ◽

10.1145/1328964.1328979 ◽

2007 ◽

Vol 41 (2) ◽

pp. 83-88

Author(s):

Flavio P. Junqueira ◽

Vassilis Plachouras ◽

Fabrizio Silvestri ◽

Ivana Podnar

Keyword(s):

Information Retrieval ◽

Distributed Systems ◽

Large Scale

Download Full-text

A Proximal Algorithm for Fork-Choice in Distributed Ledger Technology for Context-Based Clustering on Edge Computing

Engineering Proceedings ◽

10.3390/ecsa-7-08261 ◽

2020 ◽

Vol 2 (1) ◽

pp. 92

Author(s):

Rahim Rahmani ◽

Ramin Firouzi ◽

Sachiko Lim ◽

Mahbub Alam

Keyword(s):

Large Scale ◽

Cluster Head ◽

Edge Computing ◽

Computing Technology ◽

Proximal Algorithm ◽

Data Intensive ◽

Distributed Ledger ◽

Smart Contract ◽

Distributed Ledger Technology ◽

Iot Devices

The major challenges of operating data-intensive of Distributed Ledger Technology (DLT) are (1) to reach consensus on the main chain as a set of validators cast public votes to decide on which blocks to finalize and (2) scalability on how to increase the number of chains which will be running in parallel. In this paper, we introduce a new proximal algorithm that scales DLT in a large-scale Internet of Things (IoT) devices network. We discuss how the algorithm benefits the integrating DLT in IoT by using edge computing technology, taking the scalability and heterogeneous capability of IoT devices into consideration. IoT devices are clustered dynamically into groups based on proximity context information. A cluster head is used to bridge the IoT devices with the DLT network where a smart contract is deployed. In this way, the security of the IoT is improved and the scalability and latency are solved. We elaborate on our mechanism and discuss issues that should be considered and implemented when using the proposed algorithm, we even show how it behaves with varying parameters like latency or when clustering.

Download Full-text

Measuring large-scale distributed systems: case of BitTorrent Mainline DHT

IEEE P2P 2013 Proceedings ◽

10.1109/p2p.2013.6688697 ◽

2013 ◽

Cited By ~ 23

Author(s):

Liang Wang ◽

Jussi Kangasharju

Keyword(s):

Distributed Systems ◽

Large Scale

Download Full-text

Towards Scalable Simulation of Large Scale Distributed Systems

2009 International Conference on Network-Based Information Systems ◽

10.1109/nbis.2009.46 ◽

2009 ◽

Cited By ~ 1

Author(s):

Ciprian Dobre ◽

Florin Pop ◽

Valentin Cristea

Keyword(s):

Distributed Systems ◽

Large Scale

Download Full-text

GATECloud.net: a platform for large-scale, open-source text processing on the cloud

Philosophical Transactions of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rsta.2012.0071 ◽

2013 ◽

Vol 371 (1983) ◽

pp. 20120071 ◽

Cited By ~ 23

Author(s):

Valentin Tablan ◽

Ian Roberts ◽

Hamish Cunningham ◽

Kalina Bontcheva

Keyword(s):

Cloud Computing ◽

Language Processing ◽

Large Scale ◽

Virtual Machines ◽

Cost Benefit Analysis ◽

Text Processing ◽

Cost Benefit ◽

Data Intensive ◽

On Demand ◽

Usage Evaluation

Cloud computing is increasingly being regarded as a key enabler of the ‘democratization of science’, because on-demand, highly scalable cloud computing facilities enable researchers anywhere to carry out data-intensive experiments. In the context of natural language processing (NLP), algorithms tend to be complex, which makes their parallelization and deployment on cloud platforms a non-trivial task. This study presents a new, unique, cloud-based platform for large-scale NLP research—GATECloud. net. It enables researchers to carry out data-intensive NLP experiments by harnessing the vast, on-demand compute power of the Amazon cloud. Important infrastructural issues are dealt with by the platform, completely transparently for the researcher: load balancing, efficient data upload and storage, deployment on the virtual machines, security and fault tolerance. We also include a cost–benefit analysis and usage evaluation.

Download Full-text

Decentralized adaptive replica location mechanism in large-scale distributed systems

Proceedings of the 8th International Scientific and Practical Conference of Students, Post-graduates and Young Scientists. Modern Technique and Technologies. MTT'2002 (Cat. No.02EX550) ◽

10.1109/pdcat.2003.1236402 ◽

2004 ◽

Author(s):

Dongsheng Li ◽

Xicheng Lu ◽

Yijie Wang ◽

Kai Lu ◽

Nong Xiao

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Replica Location

Download Full-text

Performance Comparison of OpenMP, MPI, and MapReduce in Practical Problems

Advances in Multimedia ◽

10.1155/2015/575687 ◽

2015 ◽

Vol 2015 ◽

pp. 1-9 ◽

Cited By ~ 30

Author(s):

Sol Ji Kang ◽

Sang Yeon Lee ◽

Keon Myung Lee

Keyword(s):

Parallel Programming ◽

Large Scale ◽

Memory Systems ◽

Performance Comparison ◽

Benchmark Problems ◽

Distributed Programming ◽

Problem Size ◽

Good Picture ◽

Data Intensive ◽

The Right

With problem size and complexity increasing, several parallel and distributed programming models and frameworks have been developed to efficiently handle such problems. This paper briefly reviews the parallel computing models and describes three widely recognized parallel programming frameworks: OpenMP, MPI, and MapReduce. OpenMP is the de facto standard for parallel programming on shared memory systems. MPI is the de facto industry standard for distributed memory systems. MapReduce framework has become the de facto standard for large scale data-intensive applications. Qualitative pros and cons of each framework are known, but quantitative performance indexes help get a good picture of which framework to use for the applications. As benchmark problems to compare those frameworks, two problems are chosen: all-pairs-shortest-path problem and data join problem. This paper presents the parallel programs for the problems implemented on the three frameworks, respectively. It shows the experiment results on a cluster of computers. It also discusses which is the right tool for the jobs by analyzing the characteristics and performance of the paradigms.

Download Full-text