energy efficiency
Recently Published Documents





2022 ◽  
Vol 85 ◽  
pp. 102402
Lilia Matraeva ◽  
Ekaterina Vasiutina ◽  
Natalia Korolkova ◽  
Aleksander Maloletko ◽  
Olga Kaurova

2022 ◽  
Vol 15 (1) ◽  
pp. 1-35
Vladimir Rybalkin ◽  
Jonas Ney ◽  
Menbere Kina Tekleyohannes ◽  
Norbert Wehn

Multidimensional Long Short-Term Memory (MD-LSTM) neural network is an extension of one-dimensional LSTM for data with more than one dimension. MD-LSTM achieves state-of-the-art results in various applications, including handwritten text recognition, medical imaging, and many more. However, its implementation suffers from the inherently sequential execution that tremendously slows down both training and inference compared to other neural networks. The main goal of the current research is to provide acceleration for inference of MD-LSTM. We advocate that Field-Programmable Gate Array (FPGA) is an alternative platform for deep learning that can offer a solution when the massive parallelism of GPUs does not provide the necessary performance required by the application. In this article, we present the first hardware architecture for MD-LSTM. We conduct a systematic exploration to analyze a tradeoff between precision and accuracy. We use a challenging dataset for semantic segmentation, namely historical document image binarization from the DIBCO 2017 contest and a well-known MNIST dataset for handwritten digit recognition. Based on our new architecture, we implement FPGA-based accelerators that outperform Nvidia Geforce RTX 2080 Ti with respect to throughput by up to 9.9 and Nvidia Jetson AGX Xavier with respect to energy efficiency by up to 48 . Our accelerators achieve higher throughput, energy efficiency, and resource efficiency than FPGA-based implementations of convolutional neural networks (CNNs) for semantic segmentation tasks. For the handwritten digit recognition task, our FPGA implementations provide higher accuracy and can be considered as a solution when accuracy is a priority. Furthermore, they outperform earlier FPGA implementations of one-dimensional LSTMs with respect to throughput, energy efficiency, and resource efficiency.

2022 ◽  
Vol 156 ◽  
pp. 111944
Hong Chen ◽  
Partha Gangopadhyay ◽  
Baljeet Singh ◽  
Sriram Shankar

2022 ◽  
Vol 15 (3) ◽  
pp. 1-20
Christian Lienen ◽  
Marco Platzner

Robotics applications process large amounts of data in real time and require compute platforms that provide high performance and energy efficiency. FPGAs are well suited for many of these applications, but there is a reluctance in the robotics community to use hardware acceleration due to increased design complexity and a lack of consistent programming models across the software/hardware boundary. In this article, we present ReconROS , a framework that integrates the widely used robot operating system (ROS) with ReconOS, which features multithreaded programming of hardware and software threads for reconfigurable computers. This unique combination gives ROS 2 developers the flexibility to transparently accelerate parts of their robotics applications in hardware. We elaborate on the architecture and the design flow for ReconROS and report on a set of experiments that underline the feasibility and flexibility of our approach.

2022 ◽  
Vol 309 ◽  
pp. 118503
Xin Zhou ◽  
Shuai Tian ◽  
Jingjing An ◽  
Da Yan ◽  
Lun Zhang ◽  

2022 ◽  
Vol 27 (2) ◽  
pp. 1-30
Jaechul Lee ◽  
Cédric Killian ◽  
Sebastien Le Beux ◽  
Daniel Chillet

The energy consumption of manycore architectures is dominated by data movement, which calls for energy-efficient and high-bandwidth interconnects. To overcome the bandwidth limitation of electrical interconnects, integrated optics appear as a promising technology. However, it suffers from high power overhead related to low laser efficiency, which calls for the use of techniques and methods to improve its energy costs. Besides, approximate computing is emerging as an efficient method to reduce energy consumption and improve execution speed of embedded computing systems. It relies on allowing accuracy reduction on data at the cost of tolerable application output error. In this context, the work presented in this article exploits both features by defining approximate communications for error-tolerant applications. We propose a method to design realistic and scalable nanophotonic interconnect supporting approximate data transmission and power adaption according to the communication distance to improve the energy efficiency. For this purpose, the data can be sent by mixing low optical power signal and truncation for the Least Significant Bits (LSB) of the floating-point numbers, while the overall power is adapted according to the communication distance. We define two ranges of communications, short and long, which require only four power levels. This reduces area and power overhead to control the laser output power. A transmission model allows estimating the laser power according to the targeted BER and the number of truncated bits, while the optical network interface allows configuring, at runtime, the number of approximated and truncated bits and the laser output powers. We explore the energy efficiency provided by each communication scheme, and we investigate the error resilience of the benchmarks over several approximation and truncation schemes. The simulation results of ApproxBench applications show that, compared to an interconnect involving only robust communications, approximations in the optical transmission led to up to 53% laser power reduction with a limited degradation at the application level with less than 9% of output error. Finally, we show that our solution is scalable and leads to 10% reduction in the total energy consumption, 35× reduction in the laser driver size, and 10× reduction in the laser controller compared to state-of-the-art solution.

2022 ◽  
Vol 85 ◽  
pp. 102412
Svetlana Ratner ◽  
Andrey Berezin ◽  
Konstantin Gomonov ◽  
Apostolos Serletis ◽  
Bruno S. Sergi

2022 ◽  
Vol 43 (01) ◽  
Pierluigi Montalbano ◽  
Silvia Nenci ◽  
Davide Vurchio

Sign in / Sign up

Export Citation Format

Share Document