Exploring Domain-Specific Architectures for Energy-Efficient Wearable Computing

Author(s):  
Dhruv Gajaria ◽  
Tosiron Adegbija
Author(s):  
Soyeon Kim ◽  
Sanghoon Kang ◽  
Donghyeon Han ◽  
Sangjin Kim ◽  
Sangyeob Kim ◽  
...  

Author(s):  
Chien-Ping Lu

Artificial Intelligence (AI) was the inspiration that shaped computing as we know it today. In this article, I explore why and how AI would continue to inspire computing and reinvent it when Moore's Law is running out of steam. At the dawn of computing, Alan Turing proposed that instead of comprising many different specific machines, the computing machinery for AI should be a Universal Digital Computer, modeled after human computers, which carry out calculations with pencil on paper. Based on the belief that a digital computer would be significantly faster, more diligent and patient than a human, he anticipated that AI would be advanced as software. In modern terminology, a universal computer would be designed to understand a language known as an Instruction Set Architecture (ISA), and software would be translated into the ISA. Since then, universal computers have become exponentially faster and more energy efficient through Moore's Law, while software has grown more sophisticated. Even though software has not yet made a machine think, it has been changing how we live fundamentally. The computing revolution started when the software was decoupled from the computing machinery. Since the slowdown of Moore's Law in 2005, the universal computer is no longer improving exponentially in terms of speed and energy efficiency. It has to carry ISA legacy, and cannot be aggressively modified to save energy. Turing's proposition of AI as software is challenged, and the temptation of making many domain-specific AI machines emerges. Thanks to Deep Learning, software can stay decoupled from the computing machinery in the language of linear algebra, which it has in common with supercomputing. A new universal computer for AI understands such language natively to then become a Native Supercomputer. AI has been and will still be the inspiration for computing. The quest to make machines think continues amid the slowdown of Moore's Law. AI might not only maximize the remaining benefits of Moore's Law, but also revive Moore's Law beyond current technology.


Author(s):  
Soyeon Kim ◽  
Sanghoon Kang ◽  
Donghyeon Han ◽  
Sangyeob Kim ◽  
Sangjin Kim ◽  
...  

2021 ◽  
Vol 20 (5s) ◽  
pp. 1-24
Author(s):  
Febin P. Sunny ◽  
Asif Mirza ◽  
Mahdi Nikdast ◽  
Sudeep Pasricha

Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN , which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ∼4 × lower than electronic BNN accelerators and ∼933 × lower than a recently proposed photonic BNN accelerator, while a performance-efficient ROBIN design shows ∼3 × and ∼25 × better performance than electronic and photonic BNN accelerators, respectively.


2021 ◽  
Vol 20 (5s) ◽  
pp. 1-26
Author(s):  
Robert Khasanov ◽  
Julian Robledo ◽  
Christian Menard ◽  
Andrés Goens ◽  
Jeronimo Castrillon

Advancing telecommunication standards continuously push for larger bandwidths, lower latencies, and faster data rates. The receiver baseband unit not only has to deal with a huge number of users expecting connectivity but also with a high workload heterogeneity. As a consequence of the required flexibility, baseband processing has seen a trend towards software implementations in cloud Radio Access Networks (cRANs). The flexibility gained from software implementation comes at the price of impoverished energy efficiency. This paper addresses the trade-off between flexibility and efficiency by proposing a domain-specific hybrid mapping algorithm. Hybrid mapping is an established approach from the model-based design of embedded systems that allows us to retain flexibility while targeting heterogeneous hardware. Depending on the current workload, the runtime system selects the most energy-efficient mapping configuration without violating timing constraints. We leverage the structure of baseband processing, and refine the scheduling methodology, to enable efficient mapping of 100s of tasks at the millisecond granularity, improving upon state-of-the-art hybrid approaches. We validate our approach on an Odroid XU4 and virtual platforms with application-specific accelerators on an open-source prototype. On different LTE workloads, our hybrid approach shows significant improvements both at design time and at runtime. At design-time, mappings of similar quality to those obtained by state-of-the-art methods are generated around four orders of magnitude faster. At runtime, multi-application schedules are computed 37.7% faster than the state-of-the-art without compromising on the quality.


2019 ◽  
Vol 12 (10) ◽  
pp. 4425-4441 ◽  
Author(s):  
Andreas Müller ◽  
Willem Deconinck ◽  
Christian Kühnlein ◽  
Gianmarco Mengaldo ◽  
Michael Lange ◽  
...  

Abstract. In the simulation of complex multi-scale flows arising in weather and climate modelling, one of the biggest challenges is to satisfy strict service requirements in terms of time to solution and to satisfy budgetary constraints in terms of energy to solution, without compromising the accuracy and stability of the application. These simulations require algorithms that minimise the energy footprint along with the time required to produce a solution, maintain the physically required level of accuracy, are numerically stable, and are resilient in case of hardware failure. The European Centre for Medium-Range Weather Forecasts (ECMWF) led the ESCAPE (Energy-efficient Scalable Algorithms for Weather Prediction at Exascale) project, funded by Horizon 2020 (H2020) under the FET-HPC (Future and Emerging Technologies in High Performance Computing) initiative. The goal of ESCAPE was to develop a sustainable strategy to evolve weather and climate prediction models to next-generation computing technologies. The project partners incorporate the expertise of leading European regional forecasting consortia, university research, experienced high-performance computing centres, and hardware vendors. This paper presents an overview of the ESCAPE strategy: (i) identify domain-specific key algorithmic motifs in weather prediction and climate models (which we term Weather & Climate Dwarfs), (ii) categorise them in terms of computational and communication patterns while (iii) adapting them to different hardware architectures with alternative programming models, (iv) analyse the challenges in optimising, and (v) find alternative algorithms for the same scheme. The participating weather prediction models are the following: IFS (Integrated Forecasting System); ALARO, a combination of AROME (Application de la Recherche à l'Opérationnel à Meso-Echelle) and ALADIN (Aire Limitée Adaptation Dynamique Développement International); and COSMO–EULAG, a combination of COSMO (Consortium for Small-scale Modeling) and EULAG (Eulerian and semi-Lagrangian fluid solver). For many of the weather and climate dwarfs ESCAPE provides prototype implementations on different hardware architectures (mainly Intel Skylake CPUs, NVIDIA GPUs, Intel Xeon Phi, Optalysys optical processor) with different programming models. The spectral transform dwarf represents a detailed example of the co-design cycle of an ESCAPE dwarf. The dwarf concept has proven to be extremely useful for the rapid prototyping of alternative algorithms and their interaction with hardware; e.g. the use of a domain-specific language (DSL). Manual adaptations have led to substantial accelerations of key algorithms in numerical weather prediction (NWP) but are not a general recipe for the performance portability of complex NWP models. Existing DSLs are found to require further evolution but are promising tools for achieving the latter. Measurements of energy and time to solution suggest that a future focus needs to be on exploiting the simultaneous use of all available resources in hybrid CPU–GPU arrangements.


Sign in / Sign up

Export Citation Format

Share Document