Monitoring and Controlling Large Scale Systems

Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

The architectural shift presented in the previous chapters towards high performance computers assembled from large numbers of commodity resources raises numerous design issues and assumptions pertaining to traceability, fault tolerance and scalability. Hence, one of the key challenges faced by high performance distributed systems is scalable monitoring of system state. The aim of this chapter is to realize a survey study of existing work and trends in distributed systems monitoring by introducing the involved concepts and requirements, techniques, models and related standardization activities. Monitoring can be defined as the process of dynamic collection, interpretation and presentation of information concerning the characteristics and status of resources of interest. It is needed for various purposes such as debugging, testing, program visualization and animation. It may also be used for general management activities, which have a more permanent and continuous nature (performance management, configuration management, fault management, security management, etc.). In this case the behavior of the system is observed and monitoring information is gathered. This information is used to make management decisions and perform the appropriate control actions on the system. Unlike monitoring which is generally a passive process, control actively changes the behavior of the managed system and it has to be considered and modeled separately. Monitoring proves to be an essential process to observe and improve the reliability and the performance of large-scale distributed systems.

Author(s):  
TAJ ALAM ◽  
PARITOSH DUBEY ◽  
ANKIT KUMAR

Distributed systems are efficient means of realizing high-performance computing (HPC). They are used in meeting the demand of executing large-scale high-performance computational jobs. Scheduling the tasks on such computational resources is one of the prime concerns in the heterogeneous distributed systems. Scheduling jobs on distributed systems are NP-complete in nature. Scheduling requires either heuristic or metaheuristic approach for sub-optimal but acceptable solutions. An adaptive threshold-based scheduler is one such heuristic approach. This work proposes adaptive threshold-based scheduler for batch of independent jobs (ATSBIJ) with the objective of optimizing the makespan of the jobs submitted for execution on cloud computing systems. ATSBIJ exploits the features of interval estimation for calculating the threshold values for generation of efficient schedule of the batch. Simulation studies on CloudSim ensures that the ATSBIJ approach works effectively for real life scenario.


1996 ◽  
Vol 07 (03) ◽  
pp. 295-303 ◽  
Author(s):  
P. D. CODDINGTON

Large-scale Monte Carlo simulations require high-quality random number generators to ensure correct results. The contrapositive of this statement is also true — the quality of random number generators can be tested by using them in large-scale Monte Carlo simulations. We have tested many commonly-used random number generators with high precision Monte Carlo simulations of the 2-d Ising model using the Metropolis, Swendsen-Wang, and Wolff algorithms. This work is being extended to the testing of random number generators for parallel computers. The results of these tests are presented, along with recommendations for random number generators for high-performance computers, particularly for lattice Monte Carlo simulations.


Author(s):  
Valentin Cristea ◽  
Ciprian Dobre ◽  
Corina Stratan ◽  
Florin Pop

Communication in large scale distributed systems has a major impact on the overall performance and widely acceptance of such systems. In this chapter we analyze existing work in enabling high-performance communications in large scale distributed systems, presenting specific problems and existing solutions, as well as several future trends. Because applications running in Grids, P2Ps and other types of large scale distributed systems have specific communication requirements, we present different the problem of delivering efficient communication in case of P2P and Grid systems. We present existing work in enabling high-speed networks to support research worldwide, together with problems related to traffic engineering, QoS assurance, protocols designed to overcome current limitation with the TCP protocol in the context of high bandwidth traffic. We next analyze several group communication models, based on hybrid multicast delivery frameworks, path diversity, multicast trees, and distributed communication. Finally, we analyze data communication solutions specifically designed for P2P and Grid systems.


Author(s):  
B. Mejías ◽  
P. Van Roy

Distributed systems with a centralized architecture present the well known problems of single point of failure and single point of congestion; therefore, they do not scale. Decentralized systems, especially as peer-to-peer networks, are gaining popularity because they scale well, and do not need a server to work. However, their complexity is higher due to the lack of a single point of control and synchronization, and because consistent decentralized storage is difficult to maintain when data constantly evolves. Self-management is a way of handling this higher complexity. In this paper, the authors present a decentralized system built with a structured overlay network that is self-organized and self-healing, providing a transactional replicated storage for small or large scale systems.


2002 ◽  
Vol 1 (4) ◽  
pp. 403-420 ◽  
Author(s):  
D. Stanescu ◽  
J. Xu ◽  
M.Y. Hussaini ◽  
F. Farassat

The purpose of this paper is to demonstrate the feasibility of computing the fan inlet noise field around a real twin-engine aircraft, which includes the radiation of the main spinning modes from the engine as well as the reflection and scattering by the fuselage and the wing. This first-cut large-scale computation is based on time domain and frequency domain approaches that employ spectral element methods for spatial discretization. The numerical algorithms are designed to exploit high-performance computers such as the IBM SP4. Although the simulations could not match the exact conditions of the only available experimental data set, they are able to predict the trends of the measured noise field fairly well.


Sign in / Sign up

Export Citation Format

Share Document