Polyhedral-Based Compilation Framework for In-Memory Neural Network Accelerators

Jianhui Han; Xiang Fei; Zhaolin Li; Youhui Zhang

doi:10.1145/3469847

Polyhedral-Based Compilation Framework for In-Memory Neural Network Accelerators

ACM Journal on Emerging Technologies in Computing Systems ◽

10.1145/3469847 ◽

2022 ◽

Vol 18 (1) ◽

pp. 1-23

Author(s):

Jianhui Han ◽

Xiang Fei ◽

Zhaolin Li ◽

Youhui Zhang

Keyword(s):

Neural Network ◽

Case Studies ◽

Memory Architecture ◽

Polyhedral Model ◽

Order Of Magnitude ◽

Promising Solution ◽

Memory Bottleneck ◽

High Level ◽

Programming Interfaces ◽

Compilation Framework

Memristor-based processing-in-memory architecture is a promising solution to the memory bottleneck in the neural network ( NN ) processing. A major challenge for the programmability of such architectures is the automatic compilation of high-level NN workloads, from various operators to the memristor-based hardware that may provide programming interfaces with different granularities. This article proposes a source-to-source compilation framework for such memristor-based NN accelerators, which can conduct automatic detection and mapping of multiple NN operators based on the flexible and rich representation capability of the polyhedral model. In contrast to previous studies, it implements support for pipeline generation to exploit the parallelism in the NN loads to leverage hardware resources for higher efficiency. The evaluation based on synthetic kernels and NN benchmarks demonstrates that the proposed framework can reliably detect and map the target operators. Case studies on typical memristor-based architectures also show its generality over various architectural designs. The evaluation further demonstrates that compared with existing polyhedral-based compilation frameworks that do not support the pipelined execution, the performance can upgrade by an order of magnitude with the pipelined execution, which emphasizes the necessity of our improvement.

Download Full-text

UL-CNN: An Ultra-Lightweight Convolutional Neural Network Aiming at Flash-Based Computing-In-Memory Architecture for Pedestrian Recognition

Journal of Circuits System and Computers ◽

10.1142/s0218126621500225 ◽

2020 ◽

pp. 2150022

Author(s):

Chen Yang ◽

Jingyu Zhang ◽

Qi Chen ◽

Yi Xu ◽

Cimang Lu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Design Methodology ◽

Hardware Implementation ◽

State Of The Art ◽

Memory Architecture ◽

Storage Overhead ◽

Speed Up ◽

Memory Bottleneck ◽

On Chip

Pedestrian recognition has achieved the state-of-the-art performance due to the progress of recent convolutional neural network (CNN). However, mainstream CNN models are too complicated to emerging Computing-In-Memory (CIM) architectures for hardware implementation, because enormous parameters and massive intermediate processing results may incur severe “memory bottleneck”. This paper proposed a design methodology of Parameter Substitution with Nodes Compensation (PSNC) to significantly reduce parameters of CNN model without inference accuracy degradation. Based on the PSNC methodology, an ultra-lightweight convolutional neural network (UL-CNN) was designed. The UL-CNN model is a specially optimized convolutional neural network aiming at a flash-based CIM architecture (Conv-Flash) and to apply for recognizing person. The implementation result of running UL-CNN on Conv-Flash shows that the inference accuracy is up to 94.7%. Compared to LeNet-5, on the premise of the similar operations and accuracy, the amounts of UL-CNN’s parameters are less than 37% of LeNet-5 at the same dataset benchmark. Such parameter reduction can dramatically speed up the training process and economize on-chip storage overhead, as well as save the power consumption of the memory access. With the aid of UL-CNN, the Conv-Flash architecture can provide the best energy efficiency compared to other platforms (CPU, GPU, FPGA, etc.), which consumes only 2.2[Formula: see text] 105J to complete pedestrian recognition for one frame.

Download Full-text

Convolutional Neural Network Case Studies: (1) Anomalies in Mortality Rates (2) Image Recognition

SSRN Electronic Journal ◽

10.2139/ssrn.3656210 ◽

2020 ◽

Author(s):

Daniel Meier ◽

Mario V. Wuthrich

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Case Studies ◽

Image Recognition ◽

Mortality Rates

Download Full-text

Implementation of Convolutional Neural Network with Co-design of High-Level Synthesis and Verilog HDL

2020 IEEE 15th International Conference on Solid-State & Integrated Circuit Technology (ICSICT) ◽

10.1109/icsict49897.2020.9278149 ◽

2020 ◽

Author(s):

Hejie Yu ◽

Jun Cheng ◽

Xiangnan Zhang ◽

Yuzhe Gao ◽

Kuizhi Mei

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

High Level Synthesis ◽

Verilog Hdl ◽

High Level

Download Full-text

Technological Design of 3D NAND based Compute-in-Memory Architecture for GB-scale Deep Neural Network

IEEE Electron Device Letters ◽

10.1109/led.2020.3048101 ◽

2020 ◽

pp. 1-1

Author(s):

Wonbo Shim ◽

Shimeng Yu

Keyword(s):

Neural Network ◽

Deep Neural Network ◽

Memory Architecture ◽

Technological Design

Download Full-text

AI-based localization and classification of skin disease with erythema

Scientific Reports ◽

10.1038/s41598-021-84593-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ha Min Son ◽

Wooho Jeon ◽

Jinhyun Kim ◽

Chan Yeong Heo ◽

Hye Jin Yoon ◽

...

Keyword(s):

Neural Network ◽

Skin Diseases ◽

Classification Model ◽

Screening Tests ◽

Sensitivity Score ◽

Common Skin ◽

Novel Method ◽

Improved Performance ◽

High Level

AbstractAlthough computer-aided diagnosis (CAD) is used to improve the quality of diagnosis in various medical fields such as mammography and colonography, it is not used in dermatology, where noninvasive screening tests are performed only with the naked eye, and avoidable inaccuracies may exist. This study shows that CAD may also be a viable option in dermatology by presenting a novel method to sequentially combine accurate segmentation and classification models. Given an image of the skin, we decompose the image to normalize and extract high-level features. Using a neural network-based segmentation model to create a segmented map of the image, we then cluster sections of abnormal skin and pass this information to a classification model. We classify each cluster into different common skin diseases using another neural network model. Our segmentation model achieves better performance compared to previous studies, and also achieves a near-perfect sensitivity score in unfavorable conditions. Our classification model is more accurate than a baseline model trained without segmentation, while also being able to classify multiple diseases within a single image. This improved performance may be sufficient to use CAD in the field of dermatology.

Download Full-text

Binary Precision Neural Network Manycore Accelerator

ACM Journal on Emerging Technologies in Computing Systems ◽

10.1145/3423136 ◽

2021 ◽

Vol 17 (2) ◽

pp. 1-27

Author(s):

Morteza Hosseini ◽

Tinoosh Mohsenin

Keyword(s):

Neural Network ◽

Low Power ◽

Image Classification ◽

Case Studies ◽

Average Power ◽

Total Power ◽

Fabrication Technology ◽

Population Count ◽

Cluster Architecture ◽

Domain Specific

This article presents a low-power, programmable, domain-specific manycore accelerator, Binarized neural Network Manycore Accelerator (BiNMAC), which adopts and efficiently executes binary precision weight/activation neural network models. Such networks have compact models in which weights are constrained to only 1 bit and can be packed several in one memory entry that minimizes memory footprint to its finest. Packing weights also facilitates executing single instruction, multiple data with simple circuitry that allows maximizing performance and efficiency. The proposed BiNMAC has light-weight cores that support domain-specific instructions, and a router-based memory access architecture that helps with efficient implementation of layers in binary precision weight/activation neural networks of proper size. With only 3.73% and 1.98% area and average power overhead, respectively, novel instructions such as Combined Population-Count-XNOR , Patch-Select , and Bit-based Accumulation are added to the instruction set architecture of the BiNMAC, each of which replaces execution cycles of frequently used functions with 1 clock cycle that otherwise would have taken 54, 4, and 3 clock cycles, respectively. Additionally, customized logic is added to every core to transpose 16×16-bit blocks of memory on a bit-level basis, that expedites reshaping intermediate data to be well-aligned for bitwise operations. A 64-cluster architecture of the BiNMAC is fully placed and routed in 65-nm TSMC CMOS technology, where a single cluster occupies an area of 0.53 mm 2 with an average power of 232 mW at 1-GHz clock frequency and 1.1 V. The 64-cluster architecture takes 36.5 mm 2 area and, if fully exploited, consumes a total power of 16.4 W and can perform 1,360 Giga Operations Per Second (GOPS) while providing full programmability. To demonstrate its scalability, four binarized case studies including ResNet-20 and LeNet-5 for high-performance image classification, as well as a ConvNet and a multilayer perceptron for low-power physiological applications were implemented on BiNMAC. The implementation results indicate that the population-count instruction alone can expedite the performance by approximately 5×. When other new instructions are added to a RISC machine with existing population-count instruction, the performance is increased by 58% on average. To compare the performance of the BiNMAC with other commercial-off-the-shelf platforms, the case studies with their double-precision floating-point models are also implemented on the NVIDIA Jetson TX2 SoC (CPU+GPU). The results indicate that, within a margin of ∼2.1%--9.5% accuracy loss, BiNMAC on average outperforms the TX2 GPU by approximately 1.9× (or 7.5× with fabrication technology scaled) in energy consumption for image classification applications. On low power settings and within a margin of ∼3.7%--5.5% accuracy loss compared to ARM Cortex-A57 CPU implementation, BiNMAC is roughly ∼9.7×--17.2× (or 38.8×--68.8× with fabrication technology scaled) more energy efficient for physiological applications while meeting the application deadline.

Download Full-text

PERFECT case studies demonstrating order of magnitude reduction in power consumption

2016 IEEE High Performance Extreme Computing Conference (HPEC) ◽

10.1109/hpec.2016.7761612 ◽

2016 ◽

Author(s):

David K. Wittenberg ◽

Edin Kadric ◽

Andre DeHon ◽

Jonathan Edwards ◽

Jeffrey Smith ◽

...

Keyword(s):

Power Consumption ◽

Case Studies ◽

Order Of Magnitude

Download Full-text

Application of an Artificial Neural Network to Automate the Measurement of Kinematic Characteristics of Punches in Boxing

Applied Sciences ◽

10.3390/app11031223 ◽

2021 ◽

Vol 11 (3) ◽

pp. 1223

Author(s):

Ilshat Khasanshin

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Kinematic Parameters ◽

Constant Control ◽

Measurement Units ◽

Artificial Neural ◽

Artificial Neural Network Ann ◽

Effective Development ◽

High Level ◽

Different Levels

This work aimed to study the automation of measuring the speed of punches of boxers during shadow boxing using inertial measurement units (IMUs) based on an artificial neural network (ANN). In boxing, for the effective development of an athlete, constant control of the punch speed is required. However, even when using modern means of measuring kinematic parameters, it is necessary to record the circumstances under which the punch was performed: The type of punch (jab, cross, hook, or uppercut) and the type of activity (shadow boxing, single punch, or series of punches). Therefore, to eliminate errors and accelerate the process, that is, automate measurements, the use of an ANN in the form of a multilayer perceptron (MLP) is proposed. During the experiments, IMUs were installed on the boxers’ wrists. The input parameters of the ANN were the absolute acceleration and angular velocity. The experiment was conducted for three groups of boxers with different levels of training. The developed model showed a high level of punch recognition for all groups, and it can be concluded that the use of the ANN significantly accelerates the collection of data on the kinetic characteristics of boxers’ punches and allows this process to be automated.

Download Full-text

Critical Issues and Opportunities for Producing Biomethane in Italy

Energies ◽

10.3390/en14092431 ◽

2021 ◽

Vol 14 (9) ◽

pp. 2431

Author(s):

Roberto Murano ◽

Natascia Maisano ◽

Roberta Selvaggi ◽

Gioacchino Pappalardo ◽

Biagio Pecorino

Keyword(s):

Anaerobic Digestion ◽

Case Studies ◽

Regulatory Framework ◽

Biofuel Production ◽

Political Incentives ◽

Critical Issues ◽

Biogas Plants ◽

High Level ◽

By Products ◽

Lack Of Knowledge

Nowadays, most Italian biogas produces electricity even though recent political incentives are promoting biomethane from biogas by “upgrading” it. The aim of this paper is to focus on the regulatory framework for producing biomethane from new or already-existent anaerobic digestion plants. The complexity and lack of knowledge of the regulations on biofuel production and of anaerobic digested biomethane from waste and by-products create difficulties of both interpretation and application. Consequently, the aim of this paper is to analyze the regulations for producing biomethane, underline the critical issues and opportunities, and evaluate whether an electrical plant built in the last 10 years in Italy can really be converted to a biomethane plant, thereby lengthening its lifespan. Three case studies were considered to look more closely into applying Italian biomethane incentives and to simulate the types of incentivization in agriculture with examples based on certain fuel types typical of a standard biomethane plant of 500 standard cubic meter per hour. All the considered cases put in evidence that biomethane is a further opportunity for development with a high level of efficiency for all biogas producers, especially for many biogas plants whose incentivization period is about to finish.

Download Full-text

Real-Time Adversarial Attack Detection with Deep Image Prior Initialized as a High-Level Representation Based Blurring Network

Electronics ◽

10.3390/electronics10010052 ◽

2020 ◽

Vol 10 (1) ◽

pp. 52

Author(s):

Richard Evan Sutanto ◽

Sukho Lee

Keyword(s):

Neural Network ◽

Attack Detection ◽

Detection Methods ◽

Defense System ◽

Image Prior ◽

The Neural Network ◽

Adversarial Examples ◽

Deep Image ◽

Adversarial Attack ◽

High Level

Several recent studies have shown that artificial intelligence (AI) systems can malfunction due to intentionally manipulated data coming through normal channels. Such kinds of manipulated data are called adversarial examples. Adversarial examples can pose a major threat to an AI-led society when an attacker uses them as means to attack an AI system, which is called an adversarial attack. Therefore, major IT companies such as Google are now studying ways to build AI systems which are robust against adversarial attacks by developing effective defense methods. However, one of the reasons why it is difficult to establish an effective defense system is due to the fact that it is difficult to know in advance what kind of adversarial attack method the opponent is using. Therefore, in this paper, we propose a method to detect the adversarial noise without knowledge of the kind of adversarial noise used by the attacker. For this end, we propose a blurring network that is trained only with normal images and also use it as an initial condition of the Deep Image Prior (DIP) network. This is in contrast to other neural network based detection methods, which require the use of many adversarial noisy images for the training of the neural network. Experimental results indicate the validity of the proposed method.

Download Full-text