Design Issues of a Novel Toolkit for Parallel Application Performance Monitoring and Analysis in Cluster and Grid Environments

Author(s):  
Kuan-Ching Li ◽  
Hsiao-Hsi Wang ◽  
Chiou-Nan Chen ◽  
Chun-Chieh Liu ◽  
Chia-Fu Chang ◽  
...  
Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1590
Author(s):  
Arnak Poghosyan ◽  
Ashot Harutyunyan ◽  
Naira Grigoryan ◽  
Clement Pang ◽  
George Oganesyan ◽  
...  

The main purpose of an application performance monitoring/management (APM) software is to ensure the highest availability, efficiency and security of applications. An APM software accomplishes the main goals through automation, measurements, analysis and diagnostics. Gartner specifies the three crucial capabilities of APM softwares. The first is an end-user experience monitoring for revealing the interactions of users with application and infrastructure components. The second is application discovery, diagnostics and tracing. The third key component is machine learning (ML) and artificial intelligence (AI) powered data analytics for predictions, anomaly detection, event correlations and root cause analysis. Time series metrics, logs and traces are the three pillars of observability and the valuable source of information for IT operations. Accurate, scalable and robust time series forecasting and anomaly detection are the requested capabilities of the analytics. Approaches based on neural networks (NN) and deep learning gain an increasing popularity due to their flexibility and ability to tackle complex nonlinear problems. However, some of the disadvantages of NN-based models for distributed cloud applications mitigate expectations and require specific approaches. We demonstrate how NN-models, pretrained on a global time series database, can be applied to customer specific data using transfer learning. In general, NN-models adequately operate only on stationary time series. Application to nonstationary time series requires multilayer data processing including hypothesis testing for data categorization, category specific transformations into stationary data, forecasting and backward transformations. We present the mathematical background of this approach and discuss experimental results based on implementation for Wavefront by VMware (an APM software) while monitoring real customer cloud environments.


Author(s):  
Ming Mao ◽  
Marty Humphrey

It is a challenge to provision and allocate resources in the Cloud so as to meet both the performance and cost goals of Cloud users. For a Cloud consumer, the ability to acquire and release resources dynamically and trivially in the Cloud, while being a powerful and useful aspect, complicates the resource provisioning and allocation task in the Cloud. While on the one hand, resource under-provisioning may hurt application performance and deteriorate service quality; on the other hand, resource over-provisioning could cost users more and offset Cloud advantages. Although resource management and job scheduling have been studied extensively in the Grid environments and the Cloud shares many common features with the Grid, the mapping from user objectives to resource provisioning and allocation in the Cloud has many challenges due to the seemingly unlimited resource pools, virtualization, and isolation features provided by the Cloud. This chapter focuses on surveying the research trends in resource provisioning in the Cloud based on several factors such as the type of the workload, the VM heterogeneity, data transfer requirements, solution methods, and optimization goals and constraints, and attempts to provide guidelines for future research.


Author(s):  
Kuan-Ching Li ◽  
Hsiang-Yao Cheng ◽  
Chao-Tung Yang ◽  
Ching-Hsien Hsu ◽  
Hsiao-Hsi Wang ◽  
...  

2009 ◽  
Vol 19 (04) ◽  
pp. 535-552
Author(s):  
HIKMET DURSUN ◽  
KEVIN J. BARKER ◽  
DARREN J. KERBYSON ◽  
SCOTT PAKIN ◽  
RICHARD SEYMOUR ◽  
...  

In this paper, we present a methodology for profiling parallel applications executing on the family of architectures commonly referred as the "Cell" processor. Specifically, we examine Cell-centric MPI programs on hybrid clusters containing multiple Opteron and IBM PowerXCell 8i processors per node such as those used in the petascale Roadrunner system. We analyze the performance of our approach on a PlayStation3 console based on Cell Broadband Engine—the CBE—as well as an IBM BladeCenter QS22 based on PowerXCell 8i. Our implementation incurs less than 0.5% overhead and 0.3 µs per profiler call for a typical molecular dynamics code on the Cell BE while efficiently utilizing the limited local store of the Cell's SPE cores. Our worst-case overhead analysis on the PowerXCell 8i costs 3.2 µs per profiler call while using only two 5 KiB buffers. We demonstrate the use of our profiler on a cluster of hybrid nodes running a suite of scientific applications. Our analyses of inter-SPE communication (across the entire cluster) and function call patterns provide valuable information that can be used to optimize application performance.


Sign in / Sign up

Export Citation Format

Share Document