Design Issues of a Novel Toolkit for Parallel Application Performance Monitoring and Analysis in Cluster and Grid Environments

The main purpose of an application performance monitoring/management (APM) software is to ensure the highest availability, efficiency and security of applications. An APM software accomplishes the main goals through automation, measurements, analysis and diagnostics. Gartner specifies the three crucial capabilities of APM softwares. The first is an end-user experience monitoring for revealing the interactions of users with application and infrastructure components. The second is application discovery, diagnostics and tracing. The third key component is machine learning (ML) and artificial intelligence (AI) powered data analytics for predictions, anomaly detection, event correlations and root cause analysis. Time series metrics, logs and traces are the three pillars of observability and the valuable source of information for IT operations. Accurate, scalable and robust time series forecasting and anomaly detection are the requested capabilities of the analytics. Approaches based on neural networks (NN) and deep learning gain an increasing popularity due to their flexibility and ability to tackle complex nonlinear problems. However, some of the disadvantages of NN-based models for distributed cloud applications mitigate expectations and require specific approaches. We demonstrate how NN-models, pretrained on a global time series database, can be applied to customer specific data using transfer learning. In general, NN-models adequately operate only on stationary time series. Application to nonstationary time series requires multilayer data processing including hypothesis testing for data categorization, category specific transformations into stationary data, forecasting and backward transformations. We present the mathematical background of this approach and discuss experimental results based on implementation for Wavefront by VMware (an APM software) while monitoring real customer cloud environments.

Download Full-text

Cloud-Scale Application Performance Monitoring with SDN and NFV

2015 IEEE International Conference on Cloud Engineering ◽

10.1109/ic2e.2015.45 ◽

2015 ◽

Cited By ~ 12

Author(s):

Guyue Liu ◽

Timothy Wood

Keyword(s):

Performance Monitoring ◽

Application Performance

Download Full-text

On the Importance of End-to-End Application Performance Monitoring and Workload Analysis at the Exascale

The International Journal of High Performance Computing Applications ◽

10.1177/1094342009347502 ◽

2009 ◽

Vol 23 (4) ◽

pp. 357-360

Author(s):

David Skinner ◽

Alok Choudary

Keyword(s):

Performance Monitoring ◽

Application Performance ◽

Workload Analysis ◽

End To End

Download Full-text

Auto-Tuning Parallel Application Performance

Fundamentals of Multicore Software Development ◽

10.1201/b11417-18 ◽

2011 ◽

pp. 251-276

Keyword(s):

Application Performance ◽

Parallel Application ◽

Auto Tuning

Download Full-text

Resource Provisioning in the Cloud

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Architectural Trends in Service-Driven Computing ◽

10.4018/978-1-4666-6178-3.ch023 ◽

2014 ◽

pp. 589-612

Author(s):

Ming Mao ◽

Marty Humphrey

Keyword(s):

Job Scheduling ◽

Data Transfer ◽

Resource Provisioning ◽

Research Trends ◽

Future Research ◽

Application Performance ◽

Grid Environments ◽

Solution Methods ◽

The One ◽

Cloud Users

It is a challenge to provision and allocate resources in the Cloud so as to meet both the performance and cost goals of Cloud users. For a Cloud consumer, the ability to acquire and release resources dynamically and trivially in the Cloud, while being a powerful and useful aspect, complicates the resource provisioning and allocation task in the Cloud. While on the one hand, resource under-provisioning may hurt application performance and deteriorate service quality; on the other hand, resource over-provisioning could cost users more and offset Cloud advantages. Although resource management and job scheduling have been studied extensively in the Grid environments and the Cloud shares many common features with the Grid, the mapping from user objectives to resource provisioning and allocation in the Cloud has many challenges due to the seemingly unlimited resource pools, virtualization, and isolation features provided by the Cloud. This chapter focuses on surveying the research trends in resource provisioning in the Cloud based on several factors such as the type of the workload, the VM heterogeneity, data transfer requirements, solution methods, and optimization goals and constraints, and attempts to provide guidelines for future research.

Download Full-text

Visuel: A Novel Performance Monitoring and Analysis Toolkit for Cluster and Grid Environments

Distributed and Parallel Computing - Lecture Notes in Computer Science ◽

10.1007/11564621_36 ◽

2005 ◽

pp. 315-325 ◽

Cited By ~ 1

Author(s):

Kuan-Ching Li ◽

Hsiang-Yao Cheng ◽

Chao-Tung Yang ◽

Ching-Hsien Hsu ◽

Hsiao-Hsi Wang ◽

...

Keyword(s):

Performance Monitoring ◽

Grid Environments

Download Full-text

Research on Microservice Application Performance Monitoring Framework and Elastic Scaling Mode

Journal of Physics Conference Series ◽

10.1088/1742-6596/1617/1/012048 ◽

2020 ◽

Vol 1617 ◽

pp. 012048

Author(s):

Zhihui Wang ◽

Ying Xia ◽

Changhua Sun ◽

Lei Cheng

Keyword(s):

Performance Monitoring ◽

Application Performance ◽

Monitoring Framework ◽

Elastic Scaling

Download Full-text

Anomaly Detection in Application Performance Monitoring Data

International Journal of Machine Learning and Computing ◽

10.7763/ijmlc.2014.v4.398 ◽

2014 ◽

Vol 4 (2) ◽

pp. 120-126 ◽

Cited By ~ 5

Author(s):

Thomas J. Veasey ◽

Stephen J. Dodson

Keyword(s):

Anomaly Detection ◽

Performance Monitoring ◽

Monitoring Data ◽

Application Performance

Download Full-text

Application Performance Monitoring and Analyzing Based on Bayesian Network

2014 11th Web Information System and Application Conference ◽

10.1109/wisa.2014.19 ◽

2014 ◽

Cited By ~ 1

Author(s):

Chao Wang ◽

Lili Su ◽

Xue Zhao ◽

Ying Zhang

Keyword(s):

Bayesian Network ◽

Performance Monitoring ◽

Application Performance

Download Full-text

AN MPI PERFORMANCE MONITORING INTERFACE FOR CELL BASED COMPUTE NODES

Parallel Processing Letters ◽

10.1142/s0129626409000407 ◽

2009 ◽

Vol 19 (04) ◽

pp. 535-552

Author(s):

HIKMET DURSUN ◽

KEVIN J. BARKER ◽

DARREN J. KERBYSON ◽

SCOTT PAKIN ◽

RICHARD SEYMOUR ◽

...

Keyword(s):

Performance Monitoring ◽

Parallel Applications ◽

Cell Processor ◽

Worst Case ◽

Application Performance ◽

Cell Broadband Engine ◽

The Family ◽

Function Call ◽

Local Store ◽

And Function

In this paper, we present a methodology for profiling parallel applications executing on the family of architectures commonly referred as the "Cell" processor. Specifically, we examine Cell-centric MPI programs on hybrid clusters containing multiple Opteron and IBM PowerXCell 8i processors per node such as those used in the petascale Roadrunner system. We analyze the performance of our approach on a PlayStation3 console based on Cell Broadband Engine—the CBE—as well as an IBM BladeCenter QS22 based on PowerXCell 8i. Our implementation incurs less than 0.5% overhead and 0.3 µs per profiler call for a typical molecular dynamics code on the Cell BE while efficiently utilizing the limited local store of the Cell's SPE cores. Our worst-case overhead analysis on the PowerXCell 8i costs 3.2 µs per profiler call while using only two 5 KiB buffers. We demonstrate the use of our profiler on a cluster of hybrid nodes running a suite of scientific applications. Our analyses of inter-SPE communication (across the entire cluster) and function call patterns provide valuable information that can be used to optimize application performance.

Download Full-text