Towards an Efficient CNN Inference Architecture Enabling In-Sensor Processing

The astounding development of optical sensing imaging technology, coupled with the impressive improvements in machine learning algorithms, has increased our ability to understand and extract information from scenic events. In most cases, Convolution neural networks (CNNs) are largely adopted to infer knowledge due to their surprising success in automation, surveillance, and many other application domains. However, the convolution operations’ overwhelming computation demand has somewhat limited their use in remote sensing edge devices. In these platforms, real-time processing remains a challenging task due to the tight constraints on resources and power. Here, the transfer and processing of non-relevant image pixels act as a bottleneck on the entire system. It is possible to overcome this bottleneck by exploiting the high bandwidth available at the sensor interface by designing a CNN inference architecture near the sensor. This paper presents an attention-based pixel processing architecture to facilitate the CNN inference near the image sensor. We propose an efficient computation method to reduce the dynamic power by decreasing the overall computation of the convolution operations. The proposed method reduces redundancies by using a hierarchical optimization approach. The approach minimizes power consumption for convolution operations by exploiting the Spatio-temporal redundancies found in the incoming feature maps and performs computations only on selected regions based on their relevance score. The proposed design addresses problems related to the mapping of computations onto an array of processing elements (PEs) and introduces a suitable network structure for communication. The PEs are highly optimized to provide low latency and power for CNN applications. While designing the model, we exploit the concepts of biological vision systems to reduce computation and energy. We prototype the model in a Virtex UltraScale+ FPGA and implement it in Application Specific Integrated Circuit (ASIC) using the TSMC 90nm technology library. The results suggest that the proposed architecture significantly reduces dynamic power consumption and achieves high-speed up surpassing existing embedded processors’ computational capabilities.

Download Full-text

Design of an Edge-Detection CMOS Image Sensor with Built-in Mask Circuits

Sensors ◽

10.3390/s20133649 ◽

2020 ◽

Vol 20 (13) ◽

pp. 3649

Author(s):

Minhyun Jin ◽

Hyeonseob Noh ◽

Minkyu Song ◽

Soo Youn Kim

Keyword(s):

Power Consumption ◽

Edge Detection ◽

Low Power ◽

High Speed ◽

Cmos Image Sensor ◽

Image Sensor ◽

Frame Rate ◽

Oxide Semiconductor ◽

Total Power ◽

Total Power Consumption

In this paper, we propose a complementary metal-oxide-semiconductor (CMOS) image sensor (CIS) that has built-in mask circuits to selectively capture either edge-detection images or normal 8-bit images for low-power computer vision applications. To detect the edges of images in the CIS, neighboring column data are compared in in-column memories after column-parallel analog-to-digital conversion with the proposed mask. The proposed built-in mask circuits are implemented in the CIS without a complex image signal processer to obtain edge images with high speed and low power consumption. According to the measurement results, edge images were successfully obtained with a maximum frame rate of 60 fps. A prototype sensor with 1920 × 1440 resolution was fabricated with a 90-nm 1-poly 5-metal CIS process. The area of the 4-shared 4T-active pixel sensor was 1.4 × 1.4 µm2, and the chip size was 5.15 × 5.15 mm2. The total power consumption was 9.4 mW at 60 fps with supply voltages of 3.3 V (analog), 2.8 V (pixel), and 1.2 V (digital).

Download Full-text

Designing dual-chirality and multi-Vt repeaters for performance optimization of 32 nm interconnects

Circuit World ◽

10.1108/cw-06-2019-0060 ◽

2020 ◽

Vol 46 (2) ◽

pp. 71-83

Author(s):

Afreen Khursheed ◽

Kavita Khare

Keyword(s):

Carbon Nanotube ◽

Power Consumption ◽

Low Power ◽

High Speed ◽

Power Saving ◽

Leakage Power ◽

Content Type ◽

Dynamic Power ◽

Cu Interconnect ◽

Circuits Vlsi

Purpose This paper is an unprecedented effort to resolve the performance issue of very large scale integrated circuits (VLSI) interconnects encountered because of the scaling of device dimensions. Repeater interpolation technique is an effective approach for enhancing speed of interconnect network. Proposed buffers as repeater are modeled by using dual chirality multi-Vt technology to reduce delay besides mitigating average power consumption. Interconnects modeled with carbon nanotube (CNT) technology are compared with copper interconnect for various lengths. Buffer circuits are designed with both CNT and metal oxide semiconductor technology for comparison by using various combination of (CMOSFET repeater-Cu interconnect) and (CNTFET repeater-CNT interconnect). Compared to conventional buffer, ProposedBuffer1 saves dynamic power by 84.86%, leakage power by 88% and offers reduction in delay by 72%. ProposedBuffer2 brings about dynamic power saving of 99.94%, leakage power saving of 93%, but causes delay penalty. Simulation using Stanford SPICE model for CNT and silicon-field effective transistor berkeley short-channel IGFET Model4 (BSIM4) predictive technology model (PTM) for MOS is done in H simulation program with integrated circuit emphasis for 32 nm. Design/methodology/approach Usually, the dynamic power consumption dominates the total power, while the leakage power has a negligible effect. But with the scaling of device technology, leakage power has become one of the important factors of consideration in low power design techniques. Various strategies are explored to suppress the leakage power in standby mode. The adoption of a multi-threshold design strategy is an effective approach to improve the performance of buffer circuits without compromising on the delay and area overhead. Unlike MOS technology, to implement multi-Vt transistors in case of CNT technology is quite easy. It can be achieved by varying diameter of carbon nanotubes using chirality control. Findings An unprecedented approach is taken for optimizing the delay and power dissipation and hence drastically reducing energy consumption by keeping proper harmony between wire technology and repeater-buffer technology. This paper proposes two novel ultra-low power buffers (PB1 and PB2) as repeaters for high-speed interconnect applications in portable devices. PB1 buffer implemented with high-speed CML technique nested with multi-threshold (Vt) technology sleep transistor so as to improve the speed along with a reduction in standby power consumption. PB2 is judicially implemented by inserting separable sized, dual chirality P type carbon nanotube field effective transistors. The HSpice simulation results justify the correctness of schemes. Originality/value Result analysis points out that compared to conventional Cu interconnect, the CNT interconnects paired with Proposed CNTFET buffer designs are more energy efficient. PB1 saves dynamic power by 84.86%, reduces propagation delay by 72% and leakage power consumption by 88%. PB2 brings about dynamic power saving of 99.4%, leakage power saving of 93%, with improvement in speed by 52%. This is mainly because of the fact that CNT interconnect offers low resistance and CNTFET drivers have high mobility and ballistic mode of operation.

Download Full-text

CMOS Binary Image Sensor Using Double-Tail Comparator with High-Speed and Low-Power Consumption

Journal of Sensor Science and Technology ◽

10.46670/jsst.2021.30.2.82 ◽

2021 ◽

Vol 30 (2) ◽

pp. 82-87

Author(s):

Hyeunwoo Kwen ◽

Junyoung Jang ◽

Pyung Choi ◽

Jang-Kyoo Shin

Keyword(s):

Power Consumption ◽

Low Power ◽

High Speed ◽

Image Sensor ◽

Binary Image ◽

Low Power Consumption

Download Full-text

DESIGN OF 64-BIT SQUARER BASED ON VEDIC MATHEMATICS

Journal of Circuits System and Computers ◽

10.1142/s0218126614500923 ◽

2014 ◽

Vol 23 (07) ◽

pp. 1450092 ◽

Cited By ~ 1

Author(s):

PRABIR SAHA ◽

DEEPAK KUMAR ◽

PARTHA BHATTACHARYYA ◽

ANUP DANDAPAT

Keyword(s):

Power Consumption ◽

High Speed ◽

Cmos Technology ◽

Propagation Delay ◽

Boolean Logic ◽

Dynamic Power ◽

Vedic Mathematics ◽

Vedic Multiplier ◽

Layout Area ◽

And Performance

"Vedic mathematics" is the ancient methodology of mathematics which has a unique technique of calculations based on 16 "sutras" (formulae). A Vedic squarer design (ASIC) using such ancient mathematics is presented in this paper. By employing the Vedic mathematics, an (N × N) bit squarer implementation was transformed into just one small squarer (bit length ≪ N) and one adder which reduces the handling of the partial products significantly, owing to high speed operation. Propagation delay and dynamic power consumption of a squarer were minimized significantly through the reduction of partial products. The functionality of these circuits was checked and performance parameters like propagation delay and dynamic power consumption were calculated by spice spectre using 90-nm CMOS technology. The propagation delay of the proposed 64-bit squarer was ~ 16 ns and consumed ~ 6.79 mW power for a layout area of ~ 5.39 mm2. By combining Boolean logic with ancient Vedic mathematics, substantial amount of partial products were eliminated that resulted in ~ 12% speed improvement (propagation delay) and ~ 22% reduction in power compared with the mostly used Vedic multiplier (Nikhilam Navatascaramam Dasatah) architecture.

Download Full-text

Design and Implementation of an Efficient Parallel LFSR Architecture for Wireless Communication Systems

Current Signal Transduction Therapy ◽

10.2174/1574362414666191016155707 ◽

2019 ◽

Vol 14 ◽

Author(s):

A. Suresh Babu ◽

B. Anand

Keyword(s):

Wireless Communication ◽

Power Consumption ◽

Low Power ◽

Communication Systems ◽

High Speed ◽

Design Tool ◽

Shift Register ◽

Linear Feedback ◽

Wireless Communication Systems ◽

Coverage Area

: A Linear Feedback Shift Register (LFSR) considers a linear function typically an XOR operation of the previous state as an input to the current state. This paper describes in detail the recent Wireless Communication Systems (WCS) and techniques related to LFSR. Cryptographic methods and reconfigurable computing are two different applications used in the proposed shift register with improved speed and decreased power consumption. Comparing with the existing individual applications, the proposed shift register obtained >15 to <=45% of decreased power consumption with 30% of reduced coverage area. Hence this proposed low power high speed LFSR design suits for various low power high speed applications, for example wireless communication. The entire design architecture is simulated and verified in VHDL language. To synthesis a standard cell library of 0.7um CMOS is used. A custom design tool has been developed for measuring the power. From the results, it is obtained that the cryptographic efficiency is improved regarding time and complexity comparing with the existing algorithms. Hence, the proposed LFSR architecture can be used for any wireless applications due to parallel processing, multiple access and cryptographic methods.

Download Full-text

Low Power and High Speed Sequential Circuits Test Architecture

Recent Patents on Computer Science ◽

10.2174/2213275912666191107102512 ◽

2019 ◽

Vol 12 ◽

Author(s):

Ahmed K. Jameil ◽

Yasir Amer Abbas ◽

Saad Al-Azawi

Keyword(s):

Power Consumption ◽

Low Power ◽

Test Generation ◽

High Speed ◽

Fir Filter ◽

Sequential Circuits ◽

Fir Filters ◽

Design Verification ◽

Sequential Circuit ◽

Generation Algorithm

Background: The designed circuits are tested for faults detection in fabrication to determine which devices are defective. The design verification is performed to ensure that the circuit performs the required functions after manufacturing. Design verification is regarded as a test form in both sequential and combinational circuits. The analysis of sequential circuits test is more difficult than in the combinational circuit test. However, algorithms can be used to test any type of sequential circuit regardless of its composition. An important sequential circuit is the finite impulse response (FIR) filters that are widely used in digital signal processing applications. Objective: This paper presented a new design under test (DUT) algorithm for 4-and 8-tap FIR filters. Also, the FIR filter and the proposed DUT algorithm is implemented using field programmable gate arrays (FPGA). Method: The proposed test generation algorithm is implemented in VHDL using Xilinx ISE V14.5 design suite and verified by simulation. The test generation algorithm used FIR filtering redundant faults to obtain a set of target faults for DUT. The fault simulation is used in DUT to assess the benefit of test pattern in fault coverage. Results: The proposed technique provides average reductions of 20 % and 38.8 % in time delay with 57.39 % and 75 % reductions in power consumption and 28.89 % and 28.89 % slices reductions for 4- and 8-tap FIR filter, respectively compared to similar techniques. Conclusions: The results of implementation proved that a high speed and low power consumption design can be achieved. Further, the speed of the proposed architecture is faster than that of existing techniques.

Download Full-text

Modeling the Excess Velocity of Low-Viscous Taylor Droplets in Square Microchannels

Fluids ◽

10.3390/fluids4030162 ◽

2019 ◽

Vol 4 (3) ◽

pp. 162 ◽

Cited By ~ 2

Author(s):

Thorben Helmers ◽

Philip Kemper ◽

Jorg Thöming ◽

Ulrich Mießner

Keyword(s):

High Speed ◽

Process Intensification ◽

Continuous Phase ◽

Channel Cross Section ◽

Optimization Approach ◽

Mean Flow ◽

Bypass Flow ◽

Wall Film ◽

Proposed Model ◽

The Mean

Microscopic multiphase flows have gained broad interest due to their capability to transfer processes into new operational windows and achieving significant process intensification. However, the hydrodynamic behavior of Taylor droplets is not yet entirely understood. In this work, we introduce a model to determine the excess velocity of Taylor droplets in square microchannels. This velocity difference between the droplet and the total superficial velocity of the flow has a direct influence on the droplet residence time and is linked to the pressure drop. Since the droplet does not occupy the entire channel cross-section, it enables the continuous phase to bypass the droplet through the corners. A consideration of the continuity equation generally relates the excess velocity to the mean flow velocity. We base the quantification of the bypass flow on a correlation for the droplet cap deformation from its static shape. The cap deformation reveals the forces of the flowing liquids exerted onto the interface and allows estimating the local driving pressure gradient for the bypass flow. The characterizing parameters are identified as the bypass length, the wall film thickness, the viscosity ratio between both phases and the C a number. The proposed model is adapted with a stochastic, metaheuristic optimization approach based on genetic algorithms. In addition, our model was successfully verified with high-speed camera measurements and published empirical data.

Download Full-text

Ultracompact and low-power-consumption silicon thermo-optic switch for high-speed data

Nanophotonics ◽

10.1515/nanoph-2020-0496 ◽

2020 ◽

Vol 10 (2) ◽

pp. 937-945

Author(s):

Ruihuan Zhang ◽

Yu He ◽

Yong Zhang ◽

Shaohua An ◽

Qingming Zhu ◽

...

Keyword(s):

Power Consumption ◽

Low Power ◽

High Speed ◽

High Performance ◽

Pulse Amplitude ◽

Telecommunication Networks ◽

Low Power Consumption ◽

Power Efficient ◽

High Speed Data ◽

On Chip

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.

Download Full-text

CFD Simulation of a Hyperloop Capsule Inside a Low-Pressure Environment Using an Aerodynamic Compressor as Propulsion and Drag Reduction Method

Applied Sciences ◽

10.3390/app11093934 ◽

2021 ◽

Vol 11 (9) ◽

pp. 3934

Author(s):

Federico Lluesma-Rodríguez ◽

Temoatzin González ◽

Sergio Hoyas

Keyword(s):

Shock Waves ◽

Power Consumption ◽

High Speed ◽

Cfd Simulation ◽

Aerodynamic Drag ◽

Flow Behavior ◽

Critical Conditions ◽

Vehicle Speed ◽

Blockage Ratio ◽

The Cost

One of the most restrictive conditions in ground transportation at high speeds is aerodynamic drag. This is even more problematic when running inside a tunnel, where compressible phenomena such as wave propagation, shock waves, or flow blocking can happen. Considering Evacuated-Tube Trains (ETTs) or hyperloops, these effects appear during the whole route, as they always operate in a closed environment. Then, one of the concerns is the size of the tunnel, as it directly affects the cost of the infrastructure. When the tube size decreases with a constant section of the vehicle, the power consumption increases exponentially, as the Kantrowitz limit is surpassed. This can be mitigated when adding a compressor to the vehicle as a means of propulsion. The turbomachinery increases the pressure of part of the air faced by the vehicle, thus delaying the critical conditions on surrounding flow. With tunnels using a blockage ratio of 0.5 or higher, the reported reduction in the power consumption is 70%. Additionally, the induced pressure in front of the capsule became a negligible effect. The analysis of the flow shows that the compressor can remove the shock waves downstream and thus allows operation above the Kantrowitz limit. Actually, for a vehicle speed of 700 km/h, the case without a compressor reaches critical conditions at a blockage ratio of 0.18, which is a tunnel even smaller than those used for High-Speed Rails (0.23). When aerodynamic propulsion is used, sonic Mach numbers are reached above a blockage ratio of 0.5. A direct effect is that cases with turbomachinery can operate in tunnels with blockage ratios even 2.8 times higher than the non-compressor cases, enabling a considerable reduction in the size of the tunnel without affecting the performance. This work, after conducting bibliographic research, presents the geometry, mesh, and setup. Later, results for the flow without compressor are shown. Finally, it is discussed how the addition of the compressor improves the flow behavior and power consumption of the case.

Download Full-text

Multiview deep learning based on tensor decomposition and its application in fault detection of overhead contact systems

The Visual Computer ◽

10.1007/s00371-021-02080-y ◽

2021 ◽

Author(s):

Xuewu Zhang ◽

Yansheng Gong ◽

Chen Qiao ◽

Wenfeng Jing

Keyword(s):

High Speed ◽

Tensor Decomposition ◽

Detection Methods ◽

Detection Accuracy ◽

Feature Maps ◽

Training Time ◽

Detection Model ◽

Railway Line ◽

Result Show ◽

Deep Layers

AbstractThis article mainly focuses on the most common types of high-speed railways malfunctions in overhead contact systems, namely, unstressed droppers, foreign-body invasions, and pole number-plate malfunctions, to establish a deep-network detection model. By fusing the feature maps of the shallow and deep layers in the pretraining network, global and local features of the malfunction area are combined to enhance the network's ability of identifying small objects. Further, in order to share the fully connected layers of the pretraining network and reduce the complexity of the model, Tucker tensor decomposition is used to extract features from the fused-feature map. The operation greatly reduces training time. Through the detection of images collected on the Lanxin railway line, experiments result show that the proposed multiview Faster R-CNN based on tensor decomposition had lower miss probability and higher detection accuracy for the three types faults. Compared with object-detection methods YOLOv3, SSD, and the original Faster R-CNN, the average miss probability of the improved Faster R-CNN model in this paper is decreased by 37.83%, 51.27%, and 43.79%, respectively, and average detection accuracy is increased by 3.6%, 9.75%, and 5.9%, respectively.

Download Full-text