Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

Hsu-Yu Kao; Xin-Jia Chen; Shih-Hsu Huang

doi:10.3390/s21155081

Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

Sensors ◽

10.3390/s21155081 ◽

2021 ◽

Vol 21 (15) ◽

pp. 5081

Author(s):

Hsu-Yu Kao ◽

Xin-Jia Chen ◽

Shih-Hsu Huang

Keyword(s):

Power Consumption ◽

Low Power ◽

State Of The Art ◽

Clock Cycle ◽

The State ◽

Edge Computing ◽

Convolution Operation ◽

Product Matrix ◽

Unit Design ◽

Overall Performance

Convolution operations have a significant influence on the overall performance of a convolutional neural network, especially in edge-computing hardware design. In this paper, we propose a low-power signed convolver hardware architecture that is well suited for low-power edge computing. The basic idea of the proposed convolver design is to combine all multipliers’ final additions and their corresponding adder tree to form a partial product matrix (PPM) and then to use the reduction tree algorithm to reduce this PPM. As a result, compared with the state-of-the-art approach, our convolver design not only saves a lot of carry propagation adders but also saves one clock cycle per convolution operation. Moreover, the proposed convolver design can be adapted for different dataflows (including input stationary dataflow, weight stationary dataflow, and output stationary dataflow). According to dataflows, two types of convolve-accumulate units are proposed to perform the accumulation of convolution results. The results show that, compared with the state-of-the-art approach, the proposed convolver design can save 15.6% power consumption. Furthermore, compared with the state-of-the-art approach, on average, the proposed convolve-accumulate units can reduce 15.7% power consumption.

Get full-text (via PubEx)

DRAC: a delta recurrent neural network-based arithmetic coding algorithm for edge computing

Complex & Intelligent Systems ◽

10.1007/s40747-021-00455-1 ◽

2021 ◽

Author(s):

Bowei Shan ◽

Yong Fang

Keyword(s):

Neural Network ◽

Power Consumption ◽

Recurrent Neural Network ◽

State Of The Art ◽

The State ◽

Edge Computing ◽

Experimental Results ◽

Arithmetic Coding ◽

Speedup Ratio ◽

Consumption Saving

AbstractThis paper develops an arithmetic coding algorithm based on delta recurrent neural network for edge computing devices called DRAC. Our algorithm is implemented on a Xilinx Zynq 7000 Soc board. We evaluate DRAC with four datasets and compare it with the state-of-the-art compressor DeepZip. The experimental results show that DRAC outperforms DeepZip and achieves 5X speedup ratio and 20X power consumption saving.

Get full-text (via PubEx)

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016722 ◽

2019 ◽

Vol 33 ◽

pp. 6722-6729 ◽

Cited By ~ 4

Author(s):

Ziming Li ◽

Julia Kiseleva ◽

Maarten De Rijke

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Imitation Learning ◽

Local Optimum ◽

Inverse Reinforcement Learning ◽

High Quality ◽

Overall Performance

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.

Get full-text (via PubEx)

Low power network on chip architectures: A survey

Computer Science and Information Technologies ◽

10.11591/csit.v2i3.p158-168 ◽

2020 ◽

Vol 2 (3) ◽

pp. 158-168

Author(s):

Muhammad Raza Naqvi

Keyword(s):

Power Consumption ◽

Low Power ◽

Network Architecture ◽

High Performance ◽

State Of The Art ◽

Network On Chip ◽

System On Chip ◽

Power Network ◽

Network Routers ◽

On Chip

Mostly communication now days is done through SoC (system on chip) models so, NoC (network on chip) architecture is most appropriate solution for better performance. However, one of major flaws in this architecture is power consumption. To gain high performance through this type of architecture it is necessary to confirm power consumption while designing this. Use of power should be diminished in every region of network chip architecture. Lasting power consumption can be lessened by reaching alterations in network routers and other devices used to form that network. This research mainly focusses on state-of-the-art methods for designing NoC architecture and techniques to reduce power consumption in those architectures like, network architecture, network links between nodes, network design, and routers.

Get full-text (via PubEx)

What is the state of the art in commercial EDA tools for low power?

Proceedings of 1996 International Symposium on Low Power Electronics and Design ◽

10.1109/lpe.1996.547503 ◽

2002 ◽

Cited By ~ 4

Author(s):

O. Coudert ◽

R. Haddad ◽

K. Keutzer

Keyword(s):

Low Power ◽

State Of The Art ◽

The State ◽

Eda Tools

Get full-text (via PubEx)

An optimal approach for low-power migraine prediction models in the state-of-the-art wireless monitoring devices

Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 ◽

10.23919/date.2017.7927193 ◽

2017 ◽

Author(s):

Josue Pagan ◽

Ramin Fallahzadeh ◽

Hassan Ghasemzadeh ◽

Jose M. Moya ◽

Jose L. Risco-Martin ◽

...

Keyword(s):

Low Power ◽

Prediction Models ◽

State Of The Art ◽

The State ◽

Wireless Monitoring ◽

Optimal Approach ◽

Monitoring Devices

Get full-text (via PubEx)

Design of Low Power and Efficient Carry Select Adder Using 3-T XOR Gate

Advances in Electronics ◽

10.1155/2014/564613 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6 ◽

Cited By ~ 2

Author(s):

Gagandeep Singh ◽

Chakshu Goel

Keyword(s):

Power Consumption ◽

Low Power ◽

Critical Path ◽

Low Cost ◽

Digital Systems ◽

Xor Gate ◽

Overall Performance ◽

Carry Select Adder ◽

Power Delay Product

In digital systems, mostly adder lies in the critical path that affects the overall performance of the system. To perform fast addition operation at low cost, carry select adder (CSLA) is the most suitable among conventional adder structures. In this paper, a 3-T XOR gate is used to design an 8-bit CSLA as XOR gates are the essential blocks in designing higher bit adders. The proposed CSLA has reduced transistor count and has lesser power consumption as well as power-delay product (PDP) as compared to regular CSLA and modified CSLA.

Get full-text (via PubEx)

Ultra-Low Power and High-Throughput SRAM Design to Enhance AI Computing Ability in Autonomous Vehicles

Electronics ◽

10.3390/electronics10030256 ◽

2021 ◽

Vol 10 (3) ◽

pp. 256

Author(s):

Youngbae Kim ◽

Shreyash Patel ◽

Heekyung Kim ◽

Nandakishor Yadav ◽

Kyuwon Ken Choi

Keyword(s):

Integrated Circuits ◽

Power Consumption ◽

Low Power ◽

High Throughput ◽

Autonomous Vehicles ◽

Field Effect Transistors ◽

State Of The Art ◽

Electrical Performance ◽

Large Area ◽

Ultra Low Power

Power consumption and data processing speed of integrated circuits (ICs) is an increasing concern in many emerging Artificial Intelligence (AI) applications, such as autonomous vehicles and Internet of Things (IoT). Existing state-of-the-art SRAM architectures for AI computing are highly accurate and can provide high throughput. However, these SRAMs have problems that they consume high power and occupy a large area to accommodate complex AI models. A carbon nanotube field-effect transistors (CNFET) device has been reported as a potential candidates for AI devices requiring ultra-low power and high-throughput due to their satisfactory carrier mobility and symmetrical, good subthreshold electrical performance. Based on the CNFET and FinFET device’s electrical performance, we propose novel ultra-low power and high-throughput 8T SRAMs to circumvent the power and the throughput issues in Artificial Intelligent (AI) computation for autonomous vehicles. We propose two types of novel 8T SRAMs, P-Latch N-Access (PLNA) 8T SRAM structure and single-ended (SE) 8T SRAM structure, and compare the performance with existing state-of-the-art 8T SRAM architectures in terms of power consumption and speed. In the SRAM circuits of the FinFET and CNFET, higher tube and fin numbers lead to higher operating speed. However, the large number of tubes and fins can lead to larger area and more power consumption. Therefore, we optimize the area by reducing the number of tubes and fins without compromising the memory circuit speed and power. Most importantly, the decoupled reading and writing of our new SRAMs cell offers better low-power operation due to the stacking of device in the reading part, as well as achieving better readability and writability, while offering read Static Noise Margin (SNM) free because of isolated reading path, writing path, and greater pull up ratio. In addition, the proposed 8T SRAMs show even better performance in delay and power when we combine them with the collaborated voltage sense amplifier and independent read component. The proposed PLNA 8T SRAM can save 96%, while the proposed SE 8T SRAM saves around 99% in writing power consumption compared with the existing state-of-the-art 8T SRAM in FinFET model, as well as 99% for writing operation in CNFET model.

Get full-text (via PubEx)

ULoc

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3478124 ◽

2021 ◽

Vol 5 (3) ◽

pp. 1-31

Author(s):

Minghui Zhao ◽

Tyler Chang ◽

Aditya Arun ◽

Roshan Ayyalasomayajula ◽

Chi Zhang ◽

...

Keyword(s):

Power Consumption ◽

State Of The Art ◽

Tracking System ◽

The State ◽

Location Estimation ◽

High Bandwidth ◽

Time Of Travel ◽

Localization And Tracking ◽

State Of Art ◽

Better Than

A myriad of IoT applications, ranging from tracking assets in hospitals, logistics, and construction industries to indoor tracking in large indoor spaces, demand centimeter-accurate localization that is robust to blockages from hands, furniture, or other occlusions in the environment. With this need, in the recent past, Ultra Wide Band (UWB) based localization and tracking has become popular. Its popularity is driven by its proposed high bandwidth and protocol specifically designed for localization of specialized "tags". This high bandwidth of UWB provides a fine resolution of the time-of-travel of the signal that can be translated to the location of the tag with centimeter-grade accuracy in a controlled environment. Unfortunately, we find that high latency and high-power consumption of these time-of-travel methods are the major culprits which prevent such a system from deploying multiple tags in the environment. Thus, we developed ULoc, a scalable, low-power, and cm-accurate UWB localization and tracking system. In ULoc, we custom build a multi-antenna UWB anchor that enables azimuth and polar angle of arrival (henceforth shortened to '3D-AoA') measurements, with just the reception of a single packet from the tag. By combining multiple UWB anchors, ULoc can localize the tag in 3D space. The single-packet location estimation reduces the latency of the entire system by at least 3×, as compared with state of art multi-packet UWB localization protocols, making UWB based localization scalable. ULoc's design also reduces the power consumption per location estimate at the tag by 9×, as compared to state-of-art time-of-travel algorithms. We further develop a novel 3D-AoA based 3D localization that shows a stationary localization accuracy of 3.6 cm which is 1.8× better than the state-of-the-art two-way ranging (TWR) systems. We further developed a temporal tracking system that achieves a tracking accuracy of 10 cm in mobile conditions which is 4.3× better than the state-of-the-art TWR systems.

Get full-text (via PubEx)

Always-On Sub-Microwatt Spiking Neural Network Based on Spike-Driven Clock- and Power-Gating for an Ultra-Low-Power Intelligent Device

Frontiers in Neuroscience ◽

10.3389/fnins.2021.684113 ◽

2021 ◽

Vol 15 ◽

Author(s):

Pavan Kumar Chundi ◽

Dewei Wang ◽

Sung Justin Kim ◽

Minhao Yang ◽

Joao Pedro Cerqueira ◽

...

Keyword(s):

Neural Network ◽

Power Consumption ◽

Low Power ◽

Power Dissipation ◽

State Of The Art ◽

Spiking Neural Network ◽

Power Gating ◽

Ultra Low Power ◽

Fine Grained ◽

Input Activity

This paper presents a novel spiking neural network (SNN) classifier architecture for enabling always-on artificial intelligent (AI) functions, such as keyword spotting (KWS) and visual wake-up, in ultra-low-power internet-of-things (IoT) devices. Such always-on hardware tends to dominate the power efficiency of an IoT device and therefore it is paramount to minimize its power dissipation. A key observation is that the input signal to always-on hardware is typically sparse in time. This is a great opportunity that a SNN classifier can leverage because the switching activity and the power consumption of SNN hardware can scale with spike rate. To leverage this scalability, the proposed SNN classifier architecture employs event-driven architecture, especially fine-grained clock generation and gating and fine-grained power gating, to obtain very low static power dissipation. The prototype is fabricated in 65 nm CMOS and occupies an area of 1.99 mm2. At 0.52 V supply voltage, it consumes 75 nW at no input activity and less than 300 nW at 100% input activity. It still maintains competitive inference accuracy for KWS and other always-on classification workloads. The prototype achieved a power consumption reduction of over three orders of magnitude compared to the state-of-the-art for SNN hardware and of about 2.3X compared to the state-of-the-art KWS hardware.

Get full-text (via PubEx)

BILIRUBIN TROUBLE IS WORLDWIDE

PEDIATRICS ◽

10.1542/peds.70.1.6 ◽

1982 ◽

Vol 70 (1) ◽

pp. 6-6

Author(s):

George Little

Keyword(s):

New Zealand ◽

Methyl Orange ◽

Spectrophotometric Method ◽

State Of The Art ◽

The State ◽

Regional Survey ◽

Spectrophotometric Methods ◽

Overall Performance ◽

Conjugated Bilirubin

In a small regional survey of direct spectrophotometric methods and a larger Australian and New Zealand survey of paediatric bilirubin analyses, the overall performance of both groups was unsatisfactory with an unacceptable high interlaboratory variation. This inter-laboratory variation was reduced significantly by the use of a spectrophotometric method with a common standard of methyl orange. The Australian and New Zealand survey also examined the "state of the art" for the measurement of conjugated bilirubin and showed that laboratories could not adequately measure conjugated bilirubin.

Get full-text (via PubEx)