scholarly journals Performance and power analysis for high performance computation benchmarks

2013 ◽  
Vol 3 (1) ◽  
pp. 1-16
Author(s):  
Joseph Issa

AbstractPerformance and power consumption analysis and characterization for computational benchmarks is important for processor designers and benchmark developers. In this paper, we characterize and analyze different High Performance Computing workloads. We analyze benchmarks characteristics and behavior on various processors and propose a performance estimation analytical model to predict performance for different processor microarchitecture parameters. Performance model is verified to predict performance within <5% error margin between estimated and measured data for different processors. We also propose a power estimation analytical model to estimate power consumption with low error deviation.

Author(s):  
Chun-Yuan Lin ◽  
Jin Ye ◽  
Che-Lun Hung ◽  
Chung-Hung Wang ◽  
Min Su ◽  
...  

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.


2020 ◽  
Author(s):  
Maria Luiza Mondelli ◽  
Marcelo Monteiro Galheigo ◽  
Vivivan Medeiros ◽  
Bruno F. Bastos ◽  
Antônio Tadeu Azevedo Gomes ◽  
...  

Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do not always have the infrastructure needed to perform these experiments. Our contribution is the integration of scientific workflow management systems and grid-enabled scientific gateways, providing the user with a transparent way to run these workflows in geographically distributed computing resources. The availability of the workflow through the gateway allows for a better usability of these experiments.


2021 ◽  
Vol 14 (3) ◽  
pp. 1-21
Author(s):  
Ryota Yasudo ◽  
José G. F. Coutinho ◽  
Ana-Lucia Varbanescu ◽  
Wayne Luk ◽  
Hideharu Amano ◽  
...  

Next-generation high-performance computing platforms will handle extreme data- and compute-intensive problems that are intractable with today’s technology. A promising path in achieving the next leap in high-performance computing is to embrace heterogeneity and specialised computing in the form of reconfigurable accelerators such as FPGAs, which have been shown to speed up compute-intensive tasks with reduced power consumption. However, assessing the feasibility of large-scale heterogeneous systems requires fast and accurate performance prediction. This article proposes Performance Estimation for Reconfigurable Kernels and Systems (PERKS), a novel performance estimation framework for reconfigurable dataflow platforms. PERKS makes use of an analytical model with machine and application parameters for predicting the performance of multi-accelerator systems and detecting their bottlenecks. Model calibration is automatic, making the model flexible and usable for different machine configurations and applications, including hypothetical ones. Our experimental results show that PERKS can predict the performance of current workloads on reconfigurable dataflow platforms with an accuracy above 91%. The results also illustrate how the modelling scales to large workloads, and how performance impact of architectural features can be estimated in seconds.


Author(s):  
Nenad Korolija ◽  
Jovan Popović ◽  
Miroslav M. Bojović

This chapter presents the possibilities for obtaining significant performance gains based on advanced implementations of algorithms using the dataflow hardware. A framework built on top of the dataflow architecture that provides tools for advanced implementations is also described. In particular, the authors point out to the following issues of interest for accelerating algorithms: (1) the dataflow paradigm appears as suitable for executing certain set of algorithms for high performance computing, namely algorithms that work with big data, as well as algorithms that include a lot of repetitions of the same set of instructions; (2) dataflow architecture could be configured using appropriate programming tools that can define hardware by generating VHDL files; (3) besides accelerating algorithms, dataflow architecture also reduces power consumption, which is an important security factor with edge computing.


2012 ◽  
Vol 2 (4) ◽  
pp. 16-31 ◽  
Author(s):  
Yaser Jararweh ◽  
Salim Hariri

Power consumption in GPUs based cluster became the major obstacle in the adoption of high productivity GPU accelerators in the high performance computing industry. The power consumed by GPU chips represent about 75% of the total GPU based cluster power consumption. This is due to the fact that the GPU cards are often configured at peak performance, and consequently, they will be active all the time. In this paper, the authors present a holistic power and performance management framework that reduces power consumption of the GPU based cluster and maintains the system performance within an acceptable predefined threshold. The framework dynamically scales the GPU cluster to adapt to the variation of incoming workload’s requirements and increase the idleness of the of GPU devices, allowing them to transition to low-power state. The proposed power and performance management framework in GPU cluster demonstrated 46.3% power savings for GPU workload while maintaining the cluster performance. The overhead of the proposed framework is insignificant on the normal application\system operations and services.


2020 ◽  
Vol 10 (4) ◽  
pp. 32
Author(s):  
Sayed Ashraf Mamun ◽  
Alexander Gilday ◽  
Amit Kumar Singh ◽  
Amlan Ganguly ◽  
Geoff V. Merrett ◽  
...  

Servers in a data center are underutilized due to over-provisioning, which contributes heavily toward the high-power consumption of the data centers. Recent research in optimizing the energy consumption of High Performance Computing (HPC) data centers mostly focuses on consolidation of Virtual Machines (VMs) and using dynamic voltage and frequency scaling (DVFS). These approaches are inherently hardware-based, are frequently unique to individual systems, and often use simulation due to lack of access to HPC data centers. Other approaches require profiling information on the jobs in the HPC system to be available before run-time. In this paper, we propose a reinforcement learning based approach, which jointly optimizes profit and energy in the allocation of jobs to available resources, without the need for such prior information. The approach is implemented in a software scheduler used to allocate real applications from the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suite to a number of hardware nodes realized with Odroid-XU3 boards. Experiments show that the proposed approach increases the profit earned by 40% while simultaneously reducing energy consumption by 20% when compared to a heuristic-based approach. We also present a network-aware server consolidation algorithm called Bandwidth-Constrained Consolidation (BCC), for HPC data centers which can address the under-utilization problem of the servers. Our experiments show that the BCC consolidation technique can reduce the power consumption of a data center by up-to 37%.


2018 ◽  
Author(s):  
Maria Luiza Mondelli ◽  
Marcelo Monteiro Galheigo ◽  
V´ıvian Medeiros ◽  
Bruno F. Bastos ◽  
Antônio Tadeu Azevedo Gomes ◽  
...  

Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do not always have the infrastructure needed to perform these experiments. Our contribution is the integration of scientific workflow management systems and grid-enabled scientific gateways, providing the user with a transparent way to run these workflows in geographically distributed computing resources. The availability of the workflow through the gateway allows for a better usability of these experiments.  


2020 ◽  
pp. 629-644
Author(s):  
Chun-Yuan Lin ◽  
Jin Ye ◽  
Che-Lun Hung ◽  
Chung-Hung Wang ◽  
Min Su ◽  
...  

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.


Sign in / Sign up

Export Citation Format

Share Document