Performance and power analysis for high performance computation benchmarks

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.

Download Full-text

Integrating Scientific Workflows with Scientific Gateways: A Bioinformatics Experiment in the Brazilian National High-Performance Computing Network

10.5753/bresci.2016.9124 ◽

2020 ◽

Author(s):

Maria Luiza Mondelli ◽

Marcelo Monteiro Galheigo ◽

Vivivan Medeiros ◽

Bruno F. Bastos ◽

Antônio Tadeu Azevedo Gomes ◽

...

Keyword(s):

High Performance ◽

Workflow Management ◽

Scientific Workflow ◽

Scientific Workflows ◽

Management Systems ◽

Workflow Management Systems ◽

Sequencing Technologies ◽

Geographically Distributed ◽

Performance Computing ◽

High Performance Computation

Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do not always have the infrastructure needed to perform these experiments. Our contribution is the integration of scientific workflow management systems and grid-enabled scientific gateways, providing the user with a transparent way to run these workflows in geographically distributed computing resources. The availability of the workflow through the gateway allows for a better usability of these experiments.

Download Full-text

A performance estimation model for high-performance computing on clouds

4th IEEE International Conference on Cloud Computing Technology and Science Proceedings ◽

10.1109/cloudcom.2012.6427567 ◽

2012 ◽

Cited By ~ 3

Author(s):

Jih-Sheng Chang ◽

Ruay-Shiung Chang

Keyword(s):

High Performance Computing ◽

High Performance ◽

Performance Estimation ◽

Estimation Model ◽

Performance Estimation Model ◽

Performance Computing ◽

A Performance

Download Full-text

A performance model for the communication in fast multipole methods on high-performance computing platforms

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016634819 ◽

2016 ◽

Vol 30 (4) ◽

pp. 423-437 ◽

Cited By ~ 6

Author(s):

Huda Ibeid ◽

Rio Yokota ◽

David Keyes

Keyword(s):

High Performance Computing ◽

High Performance ◽

Performance Model ◽

Fast Multipole ◽

Fast Multipole Methods ◽

Multipole Methods ◽

Computing Platforms ◽

Performance Computing ◽

A Performance

Download Full-text

Analytical Performance Estimation for Large-Scale Reconfigurable Dataflow Platforms

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3452742 ◽

2021 ◽

Vol 14 (3) ◽

pp. 1-21

Author(s):

Ryota Yasudo ◽

José G. F. Coutinho ◽

Ana-Lucia Varbanescu ◽

Wayne Luk ◽

Hideharu Amano ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Heterogeneous Systems ◽

Performance Estimation ◽

Performance Impact ◽

Accurate Performance ◽

Computing Platforms ◽

Reduced Power Consumption ◽

Performance Computing

Next-generation high-performance computing platforms will handle extreme data- and compute-intensive problems that are intractable with today’s technology. A promising path in achieving the next leap in high-performance computing is to embrace heterogeneity and specialised computing in the form of reconfigurable accelerators such as FPGAs, which have been shown to speed up compute-intensive tasks with reduced power consumption. However, assessing the feasibility of large-scale heterogeneous systems requires fast and accurate performance prediction. This article proposes Performance Estimation for Reconfigurable Kernels and Systems (PERKS), a novel performance estimation framework for reconfigurable dataflow platforms. PERKS makes use of an analytical model with machine and application parameters for predicting the performance of multi-accelerator systems and detecting their bottlenecks. Model calibration is automatic, making the model flexible and usable for different machine configurations and applications, including hypothetical ones. Our experimental results show that PERKS can predict the performance of current workloads on reconfigurable dataflow platforms with an accuracy above 91%. The results also illustrate how the modelling scales to large workloads, and how performance impact of architectural features can be estimated in seconds.

Download Full-text

Introduction to Dataflow Computing

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Methodologies and Applications of Supercomputing ◽

10.4018/978-1-7998-7156-9.ch006 ◽

2021 ◽

pp. 96-105

Author(s):

Nenad Korolija ◽

Jovan Popović ◽

Miroslav M. Bojović

Keyword(s):

Big Data ◽

Power Consumption ◽

High Performance Computing ◽

High Performance ◽

Programming Tools ◽

Dataflow Computing ◽

Dataflow Architecture ◽

Significant Performance ◽

Performance Gains ◽

Performance Computing

This chapter presents the possibilities for obtaining significant performance gains based on advanced implementations of algorithms using the dataflow hardware. A framework built on top of the dataflow architecture that provides tools for advanced implementations is also described. In particular, the authors point out to the following issues of interest for accelerating algorithms: (1) the dataflow paradigm appears as suitable for executing certain set of algorithms for high performance computing, namely algorithms that work with big data, as well as algorithms that include a lot of repetitions of the same set of instructions; (2) dataflow architecture could be configured using appropriate programming tools that can define hardware by generating VHDL files; (3) besides accelerating algorithms, dataflow architecture also reduces power consumption, which is an important security factor with edge computing.

Download Full-text

Power and Performance Management of GPUs Based Cluster

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2012100102 ◽

2012 ◽

Vol 2 (4) ◽

pp. 16-31 ◽

Cited By ~ 9

Author(s):

Yaser Jararweh ◽

Salim Hariri

Keyword(s):

Power Consumption ◽

Performance Management ◽

High Performance ◽

Peak Performance ◽

Gpu Cluster ◽

Management Framework ◽

High Productivity ◽

Computing Industry ◽

And Performance ◽

Performance Computing

Power consumption in GPUs based cluster became the major obstacle in the adoption of high productivity GPU accelerators in the high performance computing industry. The power consumed by GPU chips represent about 75% of the total GPU based cluster power consumption. This is due to the fact that the GPU cards are often configured at peak performance, and consequently, they will be active all the time. In this paper, the authors present a holistic power and performance management framework that reduces power consumption of the GPU based cluster and maintains the system performance within an acceptable predefined threshold. The framework dynamically scales the GPU cluster to adapt to the variation of incoming workload’s requirements and increase the idleness of the of GPU devices, allowing them to transition to low-power state. The proposed power and performance management framework in GPU cluster demonstrated 46.3% power savings for GPU workload while maintaining the cluster performance. The overhead of the proposed framework is insignificant on the normal application\system operations and services.

Download Full-text

Intra- and Inter-Server Smart Task Scheduling for Profit and Energy Optimization of HPC Data Centers

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea10040032 ◽

2020 ◽

Vol 10 (4) ◽

pp. 32

Author(s):

Sayed Ashraf Mamun ◽

Alexander Gilday ◽

Amit Kumar Singh ◽

Amlan Ganguly ◽

Geoff V. Merrett ◽

...

Keyword(s):

Energy Consumption ◽

Power Consumption ◽

Data Center ◽

High Performance ◽

Data Centers ◽

Virtual Machines ◽

For Profit ◽

Dynamic Voltage ◽

High Power Consumption ◽

Performance Computing

Servers in a data center are underutilized due to over-provisioning, which contributes heavily toward the high-power consumption of the data centers. Recent research in optimizing the energy consumption of High Performance Computing (HPC) data centers mostly focuses on consolidation of Virtual Machines (VMs) and using dynamic voltage and frequency scaling (DVFS). These approaches are inherently hardware-based, are frequently unique to individual systems, and often use simulation due to lack of access to HPC data centers. Other approaches require profiling information on the jobs in the HPC system to be available before run-time. In this paper, we propose a reinforcement learning based approach, which jointly optimizes profit and energy in the allocation of jobs to available resources, without the need for such prior information. The approach is implemented in a software scheduler used to allocate real applications from the Princeton Application Repository for Shared-Memory Computers (PARSEC) benchmark suite to a number of hardware nodes realized with Odroid-XU3 boards. Experiments show that the proposed approach increases the profit earned by 40% while simultaneously reducing energy consumption by 20% when compared to a heuristic-based approach. We also present a network-aware server consolidation algorithm called Bandwidth-Constrained Consolidation (BCC), for HPC data centers which can address the under-utilization problem of the servers. Our experiments show that the BCC consolidation technique can reduce the power consumption of a data center by up-to 37%.

Download Full-text

Integrating ScientificWorkflows with Scientific Gateways: A Bioinformatics Experiment in the Brazilian National High-Performance Computing Network

10.5753/bresci.2016.10010 ◽

2018 ◽

Author(s):

Maria Luiza Mondelli ◽

Marcelo Monteiro Galheigo ◽

V´ıvian Medeiros ◽

Bruno F. Bastos ◽

Antônio Tadeu Azevedo Gomes ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Workflow Management ◽

Scientific Workflow ◽

Management Systems ◽

Workflow Management Systems ◽

Sequencing Technologies ◽

Geographically Distributed ◽

Performance Computing ◽

High Performance Computation

Bioinformatics experiments are rapidly and constantly evolving due improvements in sequencing technologies. These experiments usually demand high performance computation and produce huge quantities of data. They also require different programs to be executed in a certain order, allowing the experiments to be modeled as workflows. However, users do not always have the infrastructure needed to perform these experiments. Our contribution is the integration of scientific workflow management systems and grid-enabled scientific gateways, providing the user with a transparent way to run these workflows in geographically distributed computing resources. The availability of the workflow through the gateway allows for a better usability of these experiments.

Download Full-text

Constructing a Bioinformatics Platform with Web and Mobile Services Based on NVIDIA Jetson TK1

Data Analytics in Medicine ◽

10.4018/978-1-7998-1204-3.ch035 ◽

2020 ◽

pp. 629-644

Author(s):

Chun-Yuan Lin ◽

Jin Ye ◽

Che-Lun Hung ◽

Chung-Hung Wang ◽

Min Su ◽

...

Keyword(s):

Power Consumption ◽

High Performance Computing ◽

Graphics Processing Units ◽

High Performance ◽

Low Cost ◽

Research Direction ◽

Mobile Services ◽

Computing Platform ◽

The Cost ◽

Performance Computing

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.

Download Full-text