scholarly journals Convolver Design and Convolve-Accumulate Unit Design for Low-Power Edge Computing

Sensors ◽  
2021 ◽  
Vol 21 (15) ◽  
pp. 5081
Author(s):  
Hsu-Yu Kao ◽  
Xin-Jia Chen ◽  
Shih-Hsu Huang

Convolution operations have a significant influence on the overall performance of a convolutional neural network, especially in edge-computing hardware design. In this paper, we propose a low-power signed convolver hardware architecture that is well suited for low-power edge computing. The basic idea of the proposed convolver design is to combine all multipliers’ final additions and their corresponding adder tree to form a partial product matrix (PPM) and then to use the reduction tree algorithm to reduce this PPM. As a result, compared with the state-of-the-art approach, our convolver design not only saves a lot of carry propagation adders but also saves one clock cycle per convolution operation. Moreover, the proposed convolver design can be adapted for different dataflows (including input stationary dataflow, weight stationary dataflow, and output stationary dataflow). According to dataflows, two types of convolve-accumulate units are proposed to perform the accumulation of convolution results. The results show that, compared with the state-of-the-art approach, the proposed convolver design can save 15.6% power consumption. Furthermore, compared with the state-of-the-art approach, on average, the proposed convolve-accumulate units can reduce 15.7% power consumption.

Author(s):  
Bowei Shan ◽  
Yong Fang

AbstractThis paper develops an arithmetic coding algorithm based on delta recurrent neural network for edge computing devices called DRAC. Our algorithm is implemented on a Xilinx Zynq 7000 Soc board. We evaluate DRAC with four datasets and compare it with the state-of-the-art compressor DeepZip. The experimental results show that DRAC outperforms DeepZip and achieves 5X speedup ratio and 20X power consumption saving.


Author(s):  
Ziming Li ◽  
Julia Kiseleva ◽  
Maarten De Rijke

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.


2020 ◽  
Vol 2 (3) ◽  
pp. 158-168
Author(s):  
Muhammad Raza Naqvi

Mostly communication now days is done through SoC (system on chip) models so, NoC (network on chip) architecture is most appropriate solution for better performance. However, one of major flaws in this architecture is power consumption. To gain high performance through this type of architecture it is necessary to confirm power consumption while designing this. Use of power should be diminished in every region of network chip architecture. Lasting power consumption can be lessened by reaching alterations in network routers and other devices used to form that network. This research mainly focusses on state-of-the-art methods for designing NoC architecture and techniques to reduce power consumption in those architectures like, network architecture, network links between nodes, network design, and routers.


2014 ◽  
Vol 2014 ◽  
pp. 1-6 ◽  
Author(s):  
Gagandeep Singh ◽  
Chakshu Goel

In digital systems, mostly adder lies in the critical path that affects the overall performance of the system. To perform fast addition operation at low cost, carry select adder (CSLA) is the most suitable among conventional adder structures. In this paper, a 3-T XOR gate is used to design an 8-bit CSLA as XOR gates are the essential blocks in designing higher bit adders. The proposed CSLA has reduced transistor count and has lesser power consumption as well as power-delay product (PDP) as compared to regular CSLA and modified CSLA.


Electronics ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 256
Author(s):  
Youngbae Kim ◽  
Shreyash Patel ◽  
Heekyung Kim ◽  
Nandakishor Yadav ◽  
Kyuwon Ken Choi

Power consumption and data processing speed of integrated circuits (ICs) is an increasing concern in many emerging Artificial Intelligence (AI) applications, such as autonomous vehicles and Internet of Things (IoT). Existing state-of-the-art SRAM architectures for AI computing are highly accurate and can provide high throughput. However, these SRAMs have problems that they consume high power and occupy a large area to accommodate complex AI models. A carbon nanotube field-effect transistors (CNFET) device has been reported as a potential candidates for AI devices requiring ultra-low power and high-throughput due to their satisfactory carrier mobility and symmetrical, good subthreshold electrical performance. Based on the CNFET and FinFET device’s electrical performance, we propose novel ultra-low power and high-throughput 8T SRAMs to circumvent the power and the throughput issues in Artificial Intelligent (AI) computation for autonomous vehicles. We propose two types of novel 8T SRAMs, P-Latch N-Access (PLNA) 8T SRAM structure and single-ended (SE) 8T SRAM structure, and compare the performance with existing state-of-the-art 8T SRAM architectures in terms of power consumption and speed. In the SRAM circuits of the FinFET and CNFET, higher tube and fin numbers lead to higher operating speed. However, the large number of tubes and fins can lead to larger area and more power consumption. Therefore, we optimize the area by reducing the number of tubes and fins without compromising the memory circuit speed and power. Most importantly, the decoupled reading and writing of our new SRAMs cell offers better low-power operation due to the stacking of device in the reading part, as well as achieving better readability and writability, while offering read Static Noise Margin (SNM) free because of isolated reading path, writing path, and greater pull up ratio. In addition, the proposed 8T SRAMs show even better performance in delay and power when we combine them with the collaborated voltage sense amplifier and independent read component. The proposed PLNA 8T SRAM can save 96%, while the proposed SE 8T SRAM saves around 99% in writing power consumption compared with the existing state-of-the-art 8T SRAM in FinFET model, as well as 99% for writing operation in CNFET model.


Author(s):  
Minghui Zhao ◽  
Tyler Chang ◽  
Aditya Arun ◽  
Roshan Ayyalasomayajula ◽  
Chi Zhang ◽  
...  

A myriad of IoT applications, ranging from tracking assets in hospitals, logistics, and construction industries to indoor tracking in large indoor spaces, demand centimeter-accurate localization that is robust to blockages from hands, furniture, or other occlusions in the environment. With this need, in the recent past, Ultra Wide Band (UWB) based localization and tracking has become popular. Its popularity is driven by its proposed high bandwidth and protocol specifically designed for localization of specialized "tags". This high bandwidth of UWB provides a fine resolution of the time-of-travel of the signal that can be translated to the location of the tag with centimeter-grade accuracy in a controlled environment. Unfortunately, we find that high latency and high-power consumption of these time-of-travel methods are the major culprits which prevent such a system from deploying multiple tags in the environment. Thus, we developed ULoc, a scalable, low-power, and cm-accurate UWB localization and tracking system. In ULoc, we custom build a multi-antenna UWB anchor that enables azimuth and polar angle of arrival (henceforth shortened to '3D-AoA') measurements, with just the reception of a single packet from the tag. By combining multiple UWB anchors, ULoc can localize the tag in 3D space. The single-packet location estimation reduces the latency of the entire system by at least 3×, as compared with state of art multi-packet UWB localization protocols, making UWB based localization scalable. ULoc's design also reduces the power consumption per location estimate at the tag by 9×, as compared to state-of-art time-of-travel algorithms. We further develop a novel 3D-AoA based 3D localization that shows a stationary localization accuracy of 3.6 cm which is 1.8× better than the state-of-the-art two-way ranging (TWR) systems. We further developed a temporal tracking system that achieves a tracking accuracy of 10 cm in mobile conditions which is 4.3× better than the state-of-the-art TWR systems.


2021 ◽  
Vol 15 ◽  
Author(s):  
Pavan Kumar Chundi ◽  
Dewei Wang ◽  
Sung Justin Kim ◽  
Minhao Yang ◽  
Joao Pedro Cerqueira ◽  
...  

This paper presents a novel spiking neural network (SNN) classifier architecture for enabling always-on artificial intelligent (AI) functions, such as keyword spotting (KWS) and visual wake-up, in ultra-low-power internet-of-things (IoT) devices. Such always-on hardware tends to dominate the power efficiency of an IoT device and therefore it is paramount to minimize its power dissipation. A key observation is that the input signal to always-on hardware is typically sparse in time. This is a great opportunity that a SNN classifier can leverage because the switching activity and the power consumption of SNN hardware can scale with spike rate. To leverage this scalability, the proposed SNN classifier architecture employs event-driven architecture, especially fine-grained clock generation and gating and fine-grained power gating, to obtain very low static power dissipation. The prototype is fabricated in 65 nm CMOS and occupies an area of 1.99 mm2. At 0.52 V supply voltage, it consumes 75 nW at no input activity and less than 300 nW at 100% input activity. It still maintains competitive inference accuracy for KWS and other always-on classification workloads. The prototype achieved a power consumption reduction of over three orders of magnitude compared to the state-of-the-art for SNN hardware and of about 2.3X compared to the state-of-the-art KWS hardware.


PEDIATRICS ◽  
1982 ◽  
Vol 70 (1) ◽  
pp. 6-6
Author(s):  
George Little

In a small regional survey of direct spectrophotometric methods and a larger Australian and New Zealand survey of paediatric bilirubin analyses, the overall performance of both groups was unsatisfactory with an unacceptable high interlaboratory variation. This inter-laboratory variation was reduced significantly by the use of a spectrophotometric method with a common standard of methyl orange. The Australian and New Zealand survey also examined the "state of the art" for the measurement of conjugated bilirubin and showed that laboratories could not adequately measure conjugated bilirubin.


Sign in / Sign up

Export Citation Format

Share Document