Monitoring and Controlling Large Scale Systems

The architectural shift presented in the previous chapters towards high performance computers assembled from large numbers of commodity resources raises numerous design issues and assumptions pertaining to traceability, fault tolerance and scalability. Hence, one of the key challenges faced by high performance distributed systems is scalable monitoring of system state. The aim of this chapter is to realize a survey study of existing work and trends in distributed systems monitoring by introducing the involved concepts and requirements, techniques, models and related standardization activities. Monitoring can be defined as the process of dynamic collection, interpretation and presentation of information concerning the characteristics and status of resources of interest. It is needed for various purposes such as debugging, testing, program visualization and animation. It may also be used for general management activities, which have a more permanent and continuous nature (performance management, configuration management, fault management, security management, etc.). In this case the behavior of the system is observed and monitoring information is gathered. This information is used to make management decisions and perform the appropriate control actions on the system. Unlike monitoring which is generally a passive process, control actively changes the behavior of the managed system and it has to be considered and modeled separately. Monitoring proves to be an essential process to observe and improve the reliability and the performance of large-scale distributed systems.

Download Full-text

Adaptive Threshold Based Scheduler for Batch of Independent Jobs for Cloud Computing System

Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing ◽

10.4018/978-1-7998-5339-8.ch110 ◽

2021 ◽

pp. 2246-2266

Author(s):

TAJ ALAM ◽

PARITOSH DUBEY ◽

ANKIT KUMAR

Keyword(s):

Cloud Computing ◽

Distributed Systems ◽

High Performance ◽

Large Scale ◽

Real Life ◽

Interval Estimation ◽

Computing System ◽

Adaptive Threshold ◽

Batch Simulation ◽

Heterogeneous Distributed Systems

Distributed systems are efficient means of realizing high-performance computing (HPC). They are used in meeting the demand of executing large-scale high-performance computational jobs. Scheduling the tasks on such computational resources is one of the prime concerns in the heterogeneous distributed systems. Scheduling jobs on distributed systems are NP-complete in nature. Scheduling requires either heuristic or metaheuristic approach for sub-optimal but acceptable solutions. An adaptive threshold-based scheduler is one such heuristic approach. This work proposes adaptive threshold-based scheduler for batch of independent jobs (ATSBIJ) with the objective of optimizing the makespan of the jobs submitted for execution on cloud computing systems. ATSBIJ exploits the features of interval estimation for calculating the threshold values for generation of efficient schedule of the batch. Simulation studies on CloudSim ensures that the ATSBIJ approach works effectively for real life scenario.

Download Full-text

Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers

Frontiers in Neuroinformatics ◽

10.3389/fninf.2017.00013 ◽

2017 ◽

Vol 11 ◽

Cited By ~ 10

Author(s):

Weiliang Chen ◽

Erik De Schutter

Keyword(s):

High Performance ◽

Large Scale ◽

Reaction Diffusion ◽

High Performance Computers ◽

Diffusion Simulation

Download Full-text

TESTS OF RANDOM NUMBER GENERATORS USING ISING MODEL SIMULATIONS

International Journal of Modern Physics C ◽

10.1142/s0129183196000235 ◽

1996 ◽

Vol 07 (03) ◽

pp. 295-303 ◽

Cited By ~ 22

Author(s):

P. D. CODDINGTON

Keyword(s):

Monte Carlo ◽

Ising Model ◽

Monte Carlo Simulations ◽

High Performance ◽

Large Scale ◽

Random Number ◽

Random Number Generators ◽

Lattice Monte Carlo ◽

High Performance Computers

Large-scale Monte Carlo simulations require high-quality random number generators to ensure correct results. The contrapositive of this statement is also true — the quality of random number generators can be tested by using them in large-scale Monte Carlo simulations. We have tested many commonly-used random number generators with high precision Monte Carlo simulations of the 2-d Ising model using the Metropolis, Swendsen-Wang, and Wolff algorithms. This work is being extended to the testing of random number generators for parallel computers. The results of these tests are presented, along with recommendations for random number generators for high-performance computers, particularly for lattice Monte Carlo simulations.

Download Full-text

Data management for large-scale scientific computations in high performance distributed systems

Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469) ◽

10.1109/hpdc.1999.805306 ◽

2003 ◽

Cited By ~ 8

Author(s):

A. Choudhary ◽

M. Kandemir ◽

H. Nagesh ◽

J. No ◽

X. Shen ◽

...

Keyword(s):

Distributed Systems ◽

Data Management ◽

High Performance ◽

Large Scale ◽

Scientific Computations

Download Full-text

Communication

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Large-Scale Distributed Computing and Applications ◽

10.4018/978-1-61520-703-9.ch003 ◽

2010 ◽

pp. 47-74

Author(s):

Valentin Cristea ◽

Ciprian Dobre ◽

Corina Stratan ◽

Florin Pop

Keyword(s):

Distributed Systems ◽

Traffic Engineering ◽

High Speed ◽

High Performance ◽

Large Scale ◽

Data Communication ◽

Path Diversity ◽

Grid Systems ◽

Distributed Communication ◽

High Bandwidth

Communication in large scale distributed systems has a major impact on the overall performance and widely acceptance of such systems. In this chapter we analyze existing work in enabling high-performance communications in large scale distributed systems, presenting specific problems and existing solutions, as well as several future trends. Because applications running in Grids, P2Ps and other types of large scale distributed systems have specific communication requirements, we present different the problem of delivering efficient communication in case of P2P and Grid systems. We present existing work in enabling high-speed networks to support research worldwide, together with problems related to traffic engineering, QoS assurance, protocols designed to overcome current limitation with the TCP protocol in the context of high bandwidth traffic. We next analyze several group communication models, based on hybrid multicast delivery frameworks, path diversity, multicast trees, and distributed communication. Finally, we analyze data communication solutions specifically designed for P2P and Grid systems.

Download Full-text

Beernet

Technological Innovations in Adaptive and Dependable Systems ◽

10.4018/978-1-4666-0255-7.ch017 ◽

2012 ◽

pp. 282-304

Author(s):

B. Mejías ◽

P. Van Roy

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Single Point ◽

Peer To Peer ◽

Self Healing ◽

Decentralized System ◽

Large Scale Systems ◽

Self Organized ◽

Peer To Peer Networks ◽

Control And Synchronization

Distributed systems with a centralized architecture present the well known problems of single point of failure and single point of congestion; therefore, they do not scale. Decentralized systems, especially as peer-to-peer networks, are gaining popularity because they scale well, and do not need a server to work. However, their complexity is higher due to the lack of a single point of control and synchronization, and because consistent decentralized storage is difficult to maintain when data constantly evolves. Self-management is a way of handling this higher complexity. In this paper, the authors present a decentralized system built with a structured overlay network that is self-organized and self-healing, providing a transactional replicated storage for small or large scale systems.

Download Full-text

Large-Scale Blast Analysis of Reinforced Concrete with Advanced Constitutive Models on High Performance Computers

Structural Failure and Plasticity ◽

10.1016/b978-008043875-7/50170-2 ◽

2000 ◽

pp. 229-234

Author(s):

Kent T. Danielson ◽

Mark D. Adley ◽

Stephen A. Akers ◽

Photios P. Papados

Keyword(s):

Reinforced Concrete ◽

High Performance ◽

Large Scale ◽

Constitutive Models ◽

Blast Analysis ◽

High Performance Computers

Download Full-text

A Scalable Monitoring Solution for Large-Scale Distributed Systems

Computer Aided Systems Theory – EUROCAST 2015 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-27340-2_28 ◽

2015 ◽

pp. 219-227 ◽

Cited By ~ 3

Author(s):

Andreea Buga

Keyword(s):

Distributed Systems ◽

Large Scale ◽

Scalable Monitoring

Download Full-text

Strategies for large-scale structural problems on high-performance computers

ACM SIGARCH Computer Architecture News ◽

10.1145/255129.255164 ◽

1990 ◽

Vol 18 (3b) ◽

pp. 267-280

Author(s):

Ahmed K. Noor ◽

Jeanne M. Peters

Keyword(s):

High Performance ◽

Large Scale ◽

Structural Problems ◽

High Performance Computers

Download Full-text

Computation of Engine Noise Propagation and Scattering off An Aircraft

International Journal of Aeroacoustics ◽

10.1260/147547202765275989 ◽

2002 ◽

Vol 1 (4) ◽

pp. 403-420 ◽

Cited By ~ 18

Author(s):

D. Stanescu ◽

J. Xu ◽

M.Y. Hussaini ◽

F. Farassat

Keyword(s):

Experimental Data ◽

Time Domain ◽

High Performance ◽

Large Scale ◽

Numerical Algorithms ◽

Spectral Element ◽

Spectral Element Methods ◽

Data Set ◽

Noise Field ◽

High Performance Computers

The purpose of this paper is to demonstrate the feasibility of computing the fan inlet noise field around a real twin-engine aircraft, which includes the radiation of the main spinning modes from the engine as well as the reflection and scattering by the fuselage and the wing. This first-cut large-scale computation is based on time domain and frequency domain approaches that employ spectral element methods for spatial discretization. The numerical algorithms are designed to exploit high-performance computers such as the IBM SP4. Although the simulations could not match the exact conditions of the only available experimental data set, they are able to predict the trends of the measured noise field fairly well.

Download Full-text