FLASH: F ast Neura l A rchitecture S earch with H ardware Optimization

2021 ◽  
Vol 20 (5s) ◽  
pp. 1-26
Author(s):  
Guihong Li ◽  
Sumit K. Mandal ◽  
Umit Y. Ogras ◽  
Radu Marculescu

Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds.

2019 ◽  
Vol 2019 ◽  
pp. 1-12 ◽  
Author(s):  
Sun Min ◽  
Yufeng Bi ◽  
Mulian Zheng ◽  
Sai Chen ◽  
Jingjing Li

The energy consumption and greenhouse gas emission of asphalt pavement have become a very serious global problem. The high-temperature stability and durability of polyurethane (PU) are very good. It is studied as an alternative binder for asphalt recently. However, the strength-forming mechanism and the mixture structure of the PU mixture are different from the asphalt mixture. This work explored the design and performance evaluation of the PU mixture. The PU content of mixtures was determined by the creep slope (K), tensile strength ratios (TSR), immersion Cantabro loss (ICL), and the volume of air voids (VV) to ensure better water stability. The high- and low-temperature stability, water stability, dynamic mechanical property, and sustainability of the PU mixture were evaluated and compared with those of the stone matrix asphalt mixture (SMA). The test results showed that the dynamic stability and bending strain of the PU mixture were about 7.5 and 2.3 times of SMA. The adhesion level of PU and the basalt aggregate was one level greater than the limestone, and basalt aggregates were proposed to use in the PU mixture to improve water stability. Although the initial TSR and ICL of PU mixture were lower, the long-term values were higher; the PUM had better long-term water damage resistance. The dynamic modulus and phase angles (φ) of the PU mixture were much higher. The energy consumption and CO2 emission of the PU mixture were lower than those of SMA. Therefore, the cold-mixed PU mixture is a sustainable material with excellent performance and can be used as a substitute for asphalt mixture.


Author(s):  
Cameron L. Mock ◽  
Zachary T. Hamilton ◽  
Dustin Carruthers ◽  
John F. O’Brien

Measures to reduce control performance for greater robustness (e.g. reduced bandwidth, shallow loop roll-off) must be enhanced if the plant or actuators are known to have nonlinear characteristics that cause variations in loop transmission. Common causes of these nonlinear behaviors are actuator saturation and friction/stiction in the moving parts of mechanical systems. Systems with these characteristics that also have stringent closed loop performance requirements present the control designer with an extremely challenging problem. A design method for these systems is presented that combines very aggressive Nyquist-stable linear control to provide large negative feedback with nonlinear feedback to compensate for the effects of multiple nonlinearities in the loop that threaten stability and performance. The efficacy of this approach is experimentally verified on a parallel kinematic mechanism with multiple uncertain nonlinearities used for vibration suppression.


Author(s):  
Chad L. Jacoby ◽  
Young Suk Jo ◽  
Jake Jurewicz ◽  
Guillermo Pamanes ◽  
Joshua E. Siegel ◽  
...  

There exists the potential for major simplifications to current hybrid transmission architectures, which can lead to advances in powertrain performance. This paper assesses the technical merits of various hybrid powertrains in the context of high-performance vehicles and introduces a new transmission concept targeted at high performance hybrid applications. While many hybrid transmission configurations have been developed and implemented in mainstream and even luxury vehicles, ultra high performance sports cars have only recently begun to hybridize. The unique performance requirements of such vehicles place novel constraints on their transmissions designs. The goals become less about improved efficiency and smoothness and more centered on weight reduction, complexity reduction, and performance improvement. To identify the most critical aspects of a high performance transmission, a wide range of existing technologies is studied in concert with basic physical performance analysis of electrical motors and an internal combustion engine. The new transmission concepts presented here emphasize a reduction in inertial, frictional, and mechanical losses. A series of conceptual powertrain designs are evaluated against the goals of reducing mechanical complexity and maintaining functionality. The major innovation in these concepts is the elimination of a friction clutch to engage and disengage gears. Instead, the design proposes that the inclusion of a large electric motor enables the gears to be speed-matched and torque-zeroed without the inherent losses associated with a friction clutch. Additionally, these transmission concepts explore the merits of multiple electric motors and their placement as well as the reduction in synchronization interfaces. Ultimately, two strategies for speed-matched gear sets are considered, and a speed-matching prototype of the chosen methodology is presented to validate the feasibility of the proposed concept. The power flow and operational modes of both transmission architectures are studied to ensure required functionality and identify further areas of optimization. While there are still many unanswered questions about this concept, this paper introduces the base analysis and proof of concept for a technology that has great potential to advance hybrid vehicles at all levels.


Processes ◽  
2021 ◽  
Vol 9 (12) ◽  
pp. 2225
Author(s):  
Piotr Hajder ◽  
Łukasz Rauch

Numerical computations are usually associated with the High Performance Computing. Nevertheless, both industry and science tend to involve devices with lower power in computations. This is especially true when the data collecting devices are able to partially process them at place, thus increasing the system reliability. This paradigm is known as Edge Computing. In this paper, we propose the use of devices at the edge, with lower computing power, for multi-scale modelling calculations. A system was created, consisting of a high-power device—a two-processor workstation, 8 RaspberryPi 4B microcomputers and 8 NVidia Jetson Nano units, equipped with GPU processor. As a part of this research, benchmarking was performed, on the basis of which the computational capabilities of the devices were classified. Two parameters were considered: the number and performance of computing units (CPUs and GPUs) and the energy consumption of the loaded machines. Then, using the calculated weak scalability and energy consumption, a min–max-based load optimization algorithm was proposed. The system was tested in laboratory conditions, giving similar computation time with same power consumption for 24 physical workstation cores vs. 8x RaspberryPi 4B and 8x Jetson Nano. The work ends with a proposal to use this solution in industrial processes on example of hot rolling of flat products.


Computation ◽  
2020 ◽  
Vol 8 (2) ◽  
pp. 37
Author(s):  
Kaijie Fan ◽  
Biagio Cosenza ◽  
Ben Juurlink

Energy optimization is an increasingly important aspect of today’s high-performance computing applications. In particular, dynamic voltage and frequency scaling (DVFS) has become a widely adopted solution to balance performance and energy consumption, and hardware vendors provide management libraries that allow the programmer to change both memory and core frequencies manually to minimize energy consumption while maximizing performance. This article focuses on modeling the energy consumption and speedup of GPU applications while using different frequency configurations. The task is not straightforward, because of the large set of possible and uniformly distributed configurations and because of the multi-objective nature of the problem, which minimizes energy consumption and maximizes performance. This article proposes a machine learning-based method to predict the best core and memory frequency configurations on GPUs for an input OpenCL kernel. The method is based on two models for speedup and normalized energy predictions over the default frequency configuration. Those are later combined into a multi-objective approach that predicts a Pareto-set of frequency configurations. Results show that our approach is very accurate at predicting extema and the Pareto set, and finds frequency configurations that dominate the default configuration in either energy or performance.


2020 ◽  
Vol 63 (6) ◽  
pp. 880-899
Author(s):  
Lixia Chen ◽  
Jian Li ◽  
Ruhui Ma ◽  
Haibing Guan ◽  
Hans-Arno Jacobsen

Abstract With energy consumption in high-performance computing clouds growing rapidly, energy saving has become an important topic. Virtualization provides opportunities to save energy by enabling one physical machine (PM) to host multiple virtual machines (VMs). Dynamic voltage and frequency scaling (DVFS) is another technology to reduce energy consumption. However, in heterogeneous cloud environments where DVFS may be applied at the chip level or the core level, it is a great challenge to combine these two technologies efficiently. On per-core DVFS servers, cloud managers should carefully determine VM placements to minimize performance interference. On full-chip DVFS servers, cloud managers further face the choice of whether to combine VMs with different characteristics to reduce performance interference or to combine VMs with similar characteristics to take better advantage of DVFS. This paper presents a novel mechanism combining a VM placement algorithm and a frequency scaling method. We formulate this VM placement problem as an integer programming (IP) to find appropriate placement configurations, and we utilize support vector machines to select suitable frequencies. We conduct detailed experiments and simulations, showing that our scheme effectively reduces energy consumption with modest impact on performance. Particularly, the total energy delay product is reduced by up to 60%.


Author(s):  
Juan P. Silva ◽  
Ernesto Dufrechou ◽  
Pabl Ezzatti ◽  
Enrique S. Quintana-Ortí ◽  
Alfredo Remón ◽  
...  

The high performance computing community has traditionally focused uniquely on the reduction of execution time, though in the last years, the optimization of energy consumption has become a main issue. A reduction of energy usage without a degradation of performance requires the adoption of energy-efficient hardware platforms accompanied by the development of energy-aware algorithms and computational kernels. The solution of linear systems is a key operation for many scientific and engineering problems. Its relevance has motivated an important amount of work, and consequently, it is possible to find high performance solvers for a wide variety of hardware platforms. In this work, we aim to develop a high performance and energy-efficient linear system solver. In particular, we develop two solvers for a low-power CPU-GPU platform, the NVIDIA Jetson TK1. These solvers implement the Gauss-Huard algorithm yielding an efficient usage of the target hardware as well as an efficient memory access. The experimental evaluation shows that the novel proposal reports important savings in both time and energy-consumption when compared with the state-of-the-art solvers of the platform.


2020 ◽  
Vol 184 ◽  
pp. 01102
Author(s):  
P Magudeaswaran. ◽  
C. Vivek Kumar ◽  
Rathod Ravinder

High-Performance Concrete (HPC) is a high-quality concrete that requires special conformity and performance requirements. The objective of this study was to investigate the possibilities of adapting neural expert systems like Adaptive Neuro-Fuzzy Inference System (ANFIS) in the development of a simulator and intelligent system and to predict durability and strength of HPC composites. These soft computing methods emulate the decision-making ability of human expert benefits both the construction industry and the research community. These new methods, if properly utilized, have the potential to increase speed, service life, efficiency, consistency, minimizes errors, saves time and cost which would otherwise be squandered using the conventional approaches.


2020 ◽  
Author(s):  
Lucas Silva ◽  
Michael Canesche ◽  
Ricardo Ferreira ◽  
José Augusto Nacif

Recently, the increasing adoption of domain-specific architectures to execute kernels with high computing density and the exploration of sparse architectures using Systolic Arrays created the ideal scenario for using Coarsegrained reconfigurable architectures (CGRAs) to accelerate applications. Unlike Systolic Array, CGRA can run different kernel sets and keep a good balance between energy consumption and performance. In this work, we present the HPCGRA, an orthogonal designed CGRA generator for high-performance spatial accelerators. Our tool does not require any expertise in Verilog design. In our approach, the CGRA is designed and implemented in an orthogonal fashion, through wrapping the main building blocks: functional units, interconnection patterns, routing, and elastic buffer capabilities, configuration words, and memories. It optimizes and simplifies the process of creating CGRAs architectures using a portable description (JSON file) and generating a generic, scalable, and efficient Verilog RTL code with Veriloggen. The tool automatically generates CGRA with up to 46x66 functional units, reaching 1.2 Tera ops/s.


2015 ◽  
Vol 25 (03) ◽  
pp. 1541005
Author(s):  
Alexandra Vintila Filip ◽  
Ana-Maria Oprescu ◽  
Stefania Costache ◽  
Thilo Kielmann

High-Performance Computing (HPC) systems consume large amounts of energy. As the energy consumption predictions for HPC show increasing numbers, it is important to make users aware of the energy spent for the execution of their applications. Drawing from our experience with exposing cost and performance in public clouds, in this paper we present a generic mechanism to compute fast and accurate estimates for the tradeoffs between the performance (expressed as makespan) and the energy consumption of applications running on HPC clusters. We validate our approach by implementing it in a prototype, called E-BaTS and validating it with a wide variety of HPC bags-of-tasks. Our experiments show that E-BaTS produces conservative estimates with errors below 5%, while requiring at most 12% of the energy and time of an exhaustive search for providing configurations close to the optimal ones in terms of trade-offs between energy consumption and makespan.


Sign in / Sign up

Export Citation Format

Share Document