scholarly journals Continuous performance monitoring for large-scale parallel applications

Author(s):  
Isaac Dooley ◽  
Chee Wai Lee ◽  
Laxmikant V. Kale
Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


2016 ◽  
Vol 65 (7) ◽  
pp. 2184-2198 ◽  
Author(s):  
Jidong Zhai ◽  
Wenguang Chen ◽  
Weimin Zheng ◽  
Keqin Li

Author(s):  
Elizabeth Larson ◽  
Shani Turke ◽  
Nana Hadiza Miko ◽  
Sani Oumarou ◽  
Souleymane Alzouma ◽  
...  

Abstract Menstrual health and hygiene (MHH) is an emerging public health priority. To support policy and practice, large-scale surveys monitoring water, sanitation, and hygiene and reproductive health have started to incorporate MHH. Insights gained from these surveys are contingent on the quality of the measures used. Performance Monitoring and Accountability 2020 (PMA2020) was one of the first survey programs to include MHH. We undertook four focus group discussions with resident enumerators and one with their female supervisors following the 2018 PMA2020 survey in Niamey, Niger and synthesized their insights on the performance of the MHH measures used. Enumerators reported that questions about menstruation were well tolerated and most were understood conceptually. Discussions identified missing response options for the places used for MHH and suggest that enumerator training should include common brands of menstrual materials to ensure data quality. Further, current questions seeking to capture the privacy and safety of locations used for MHH require modification or more intensive training efforts to consistently capture these concepts. Enumerator perspectives on menstrual needs in Niger highlight topics missing from MHH monitoring. Attending to enumerator expertise has the capacity to strengthen future surveys directed toward understudied health and development challenges such as MHH.


Author(s):  
Gengbin Zheng ◽  
Abhinav Bhatelé ◽  
Esteban Meneses ◽  
Laxmikant V. Kalé

Large parallel machines with hundreds of thousands of processors are becoming more prevalent. Ensuring good load balance is critical for scaling certain classes of parallel applications on even thousands of processors. Centralized load balancing algorithms suffer from scalability problems, especially on machines with a relatively small amount of memory. Fully distributed load balancing algorithms, on the other hand, tend to take longer to arrive at good solutions. In this paper, we present an automatic dynamic hierarchical load balancing method that overcomes the scalability challenges of centralized schemes and longer running times of traditional distributed schemes. Our solution overcomes these issues by creating multiple levels of load balancing domains which form a tree. This hierarchical method is demonstrated within a measurement-based load balancing framework in Charm++. We discuss techniques to deal with scalability challenges of load balancing at very large scale. We present performance data of the hierarchical load balancing method on up to 16,384 cores of Ranger (at the Texas Advanced Computing Center) and 65,536 cores of Intrepid (the Blue Gene/P at Argonne National Laboratory) for a synthetic benchmark. We also demonstrate the successful deployment of the method in a scientific application, NAMD, with results on Intrepid.


Electronics ◽  
2019 ◽  
Vol 8 (9) ◽  
pp. 982 ◽  
Author(s):  
Alberto Cascajo ◽  
David E. Singh ◽  
Jesus Carretero

This work presents a HPC framework that provides new strategies for resource management and job scheduling, based on executing different applications in shared compute nodes, maximizing platform utilization. The framework includes a scalable monitoring tool that is able to analyze the platform’s compute node utilization. We also introduce an extension of CLARISSE, a middleware for data-staging coordination and control on large-scale HPC platforms that uses the information provided by the monitor in combination with application-level analysis to detect performance degradation in the running applications. This degradation, caused by the fact that the applications share the compute nodes and may compete for their resources, is avoided by means of dynamic application migration. A description of the architecture, as well as a practical evaluation of the proposal, shows significant performance improvements up to 20% in the makespan and 10% in energy consumption compared to a non-optimized execution.


1999 ◽  
Vol 103 (1027) ◽  
pp. 443-447 ◽  
Author(s):  
W. McMillan ◽  
M. Woodgate ◽  
B. E. Richards ◽  
B. J. Gribben ◽  
K. J. Badcock ◽  
...  

Abstract Motivated by a lack of sufficient local and national computing facilities for computational fluid dynamics simulations, the Affordable Systems Computing Unit (ASCU) was established to investigate low cost alternatives. The options considered have all involved cluster computing, a term which refers to the grouping of a number of components into a managed system capable of running both serial and parallel applications. The present work aims to demonstrate the utility of commodity processors for dedicated batch processing. The performance of the cluster has proved to be extremely cost effective, enabling large three dimensional flow simulations on a computer costing less than £25k sterling at current market prices. The experience gained on this system in terms of single node performance, message passing and parallel performance will be discussed. In particular, comparisons with the performance of other systems will be made. Several medium-large scale CFD simulations performed using the new cluster will be presented to demonstrate the potential of commodity processor based parallel computers for aerodynamic simulation.


Sign in / Sign up

Export Citation Format

Share Document