scholarly journals On-chip trainable hardware-based deep Q-networks approximating a backpropagation algorithm

Author(s):  
Jangsaeng Kim ◽  
Dongseok Kwon ◽  
Sung Yun Woo ◽  
Won-Mook Kang ◽  
Soochang Lee ◽  
...  

AbstractReinforcement learning (RL) using deep Q-networks (DQNs) has shown performance beyond the human level in a number of complex problems. In addition, many studies have focused on bio-inspired hardware-based spiking neural networks (SNNs) given the capabilities of these technologies to realize both parallel operation and low power consumption. Here, we propose an on-chip training method for DQNs applicable to hardware-based SNNs. Because the conventional backpropagation (BP) algorithm is approximated, a performance evaluation based on two simple games shows that the proposed system achieves performance similar to that of a software-based system. The proposed training method can minimize memory usage and reduce power consumption and area occupation levels. In particular, for simple problems, the memory dependency can be significantly reduced given that high performance is achieved without using replay memory. Furthermore, we investigate the effect of the nonlinearity characteristics and two types of variation of non-ideal synaptic devices on the performance outcomes. In this work, thin-film transistor (TFT)-type flash memory cells are used as synaptic devices. A simulation is also conducted using fully connected neural network with non-leaky integrated-and-fire (I&F) neurons. The proposed system shows strong immunity to device variations because an on-chip training scheme is adopted.

Author(s):  
A. Ferrerón Labari ◽  
D. Suárez Gracia ◽  
V. Viñals Yúfera

In the last years, embedded systems have evolved so that they offer capabilities we could only find before in high performance systems. Portable devices already have multiprocessors on-chip (such as PowerPC 476FP or ARM Cortex A9 MP), usually multi-threaded, and a powerful multi-level cache memory hierarchy on-chip. As most of these systems are battery-powered, the power consumption becomes a critical issue. Achieving high performance and low power consumption is a high complexity challenge where some proposals have been already made. Suarez et al. proposed a new cache hierarchy on-chip, the LP-NUCA (Low Power NUCA), which is able to reduce the access latency taking advantage of NUCA (Non-Uniform Cache Architectures) properties. The key points are decoupling the functionality, and utilizing three specialized networks on-chip. This structure has been proved to be efficient for data hierarchies, achieving a good performance and reducing the energy consumption. On the other hand, instruction caches have different requirements and characteristics than data caches, contradicting the low-power embedded systems requirements, especially in SMT (simultaneous multi-threading) environments. We want to study the benefits of utilizing small tiled caches for the instruction hierarchy, so we propose a new design, ID-LP-NUCAs. Thus, we need to re-evaluate completely our previous design in terms of structure design, interconnection networks (including topologies, flow control and routing), content management (with special interest in hardware/software content allocation policies), and structure sharing. In CMP environments (chip multiprocessors) with parallel workloads, coherence plays an important role, and must be taken into consideration.


Nanophotonics ◽  
2020 ◽  
Vol 10 (2) ◽  
pp. 937-945
Author(s):  
Ruihuan Zhang ◽  
Yu He ◽  
Yong Zhang ◽  
Shaohua An ◽  
Qingming Zhu ◽  
...  

AbstractUltracompact and low-power-consumption optical switches are desired for high-performance telecommunication networks and data centers. Here, we demonstrate an on-chip power-efficient 2 × 2 thermo-optic switch unit by using a suspended photonic crystal nanobeam structure. A submilliwatt switching power of 0.15 mW is obtained with a tuning efficiency of 7.71 nm/mW in a compact footprint of 60 μm × 16 μm. The bandwidth of the switch is properly designed for a four-level pulse amplitude modulation signal with a 124 Gb/s raw data rate. To the best of our knowledge, the proposed switch is the most power-efficient resonator-based thermo-optic switch unit with the highest tuning efficiency and data ever reported.


Author(s):  
Mário P. Véstias ◽  
Horácio C. Neto

The recent advances in IC technology have made it possible to implement systems with dozens or even hundreds of cores in a single chip. With such a large number of cores communicating with each other there is a strong pressure over the communication infrastructure to deliver high bandwidth, low latency, low power consumption and quality of service to guarantee real-time functionality. Networks-on-Chip are definitely becoming the only acceptable interconnection structure for today’s multiprocessor systems-on-chip (MPSoC). The first generation of NoC solutions considers a regular topology, typically a 2D mesh. Routers and network interfaces are mainly homogeneous so that they can be easily scaled up and modular design is facilitated. All advantages of a NoC infrastructure were proved with this first generation of NoC solutions. However, NoCs have a relative area and speed overhead. Application specific systems can benefit from heterogeneous communication infrastructures providing high bandwidth in a localized fashion where it is needed with improved area. The efficiency of both homogeneous and heterogeneous solutions can be improved if runtime changes are considered. Dynamically or runtime reconfigurable NoCs are the second generation of NoCs since they represent a new set of benefits in terms of area overhead, performance, power consumption, fault tolerance and quality of service compared to the previous generation where the architecture is decided at design time. This chapter discusses the static and runtime customization of routers and presents results with networks-on-chip with static and adaptive routers. Runtime adaptive techniques are analyzed and compared to each other in terms of area occupation and performance. The results and the discussion presented in this chapter show that dynamically adaptive routers are fundamental in the design of NoCs to satisfy the requirements of today’s systems-on-chip.


2020 ◽  
Vol 2 (3) ◽  
pp. 158-168
Author(s):  
Muhammad Raza Naqvi

Mostly communication now days is done through SoC (system on chip) models so, NoC (network on chip) architecture is most appropriate solution for better performance. However, one of major flaws in this architecture is power consumption. To gain high performance through this type of architecture it is necessary to confirm power consumption while designing this. Use of power should be diminished in every region of network chip architecture. Lasting power consumption can be lessened by reaching alterations in network routers and other devices used to form that network. This research mainly focusses on state-of-the-art methods for designing NoC architecture and techniques to reduce power consumption in those architectures like, network architecture, network links between nodes, network design, and routers.


2020 ◽  
Vol 17 (5) ◽  
pp. 621-632
Author(s):  
Seyyed Javad Seyyed Mahdavi Chabok ◽  
Seyed Amin Alavi

Purpose The routing algorithm is one of the most important components in designing a network-on-chip (NoC). An effective routing algorithm can cause better performance and throughput, and thus, have less latency, lower power consumption and high reliability. Considering the high scalability in networks and fault occurrence on links, the more the packet reaches the destination (i.e. to cross the number of fewer links), the less the loss of packets and information would be. Accordingly, the proposed algorithm is based on reducing the number of passed links to reach the destination. Design/methodology/approach This paper presents a high-performance NoC that increases telecommunication network reliability by passing fewer links to destination. A large NoC is divided into small districts with central routers. In such a system, routing in large routes is performed through these central routers district by district. Findings By reducing the number of links, the number of routers also decreases. As a result, the power consumption is reduced, the performance of the NoC is improved, and the probability of collision with a faulty link and network latency is decreased. Originality/value The simulation is performed using the Noxim simulator because of its ability to manage and inject faults. The proposed algorithm, XY routing, as a conventional algorithm for the NoC, was simulated in a 14 × 14 network size, as the typical network size in the recent works.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1821
Author(s):  
Sandy A. Wasif ◽  
Salma Hesham ◽  
Diana Goehringer ◽  
Klaus Hofmann ◽  
Mohamed A. Abd El Ghany

A network-on-chip (NoC) offers high performance, flexibility and scalability in communication infrastructure within multi-core platforms. However, NoCs contribute significantly to the overall system’s power consumption. The double-layer energy efficient synchronous-asynchronous circuit-switched NoC (CS-NoC) is proposed to enhance the power utilization. To reduce the dynamic power consumption, single-rail asynchronous protocols are utilized. The two-phase and four-phase encoding algorithms are analyzed to determine the most efficient technique. For the data layer, the two asynchronous protocols reduced the power consumption by 80%, with an increase in latency when compared with the fully synchronous protocol. However, the two-phase single-rail protocol had better performance compared with the four-phase protocol by 38%, with the same power consumption and a slight increase in area of 5%. Based on this conducted analysis, the asynchronous two-phase layer had significant power reduction yet operated at a moderate frequency. Therefore, the proposed NoC is divided into two data transfer layers with a single control layer. The data transfer layers are designed using synchronous and asynchronous protocols. The synchronous layer is designated to high-frequency loads, and the asynchronous layer is confined to low-frequency loads. The switching between the layers creates a trade-off between the maximum allowed frequency and the power consumption. The proposed NoC reduces the overall power consumption by 23% when compared with recent previous work. The NoC maintains the same system performance with an 8% area increase over the fully synchronous double-layer in the literature.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-11
Author(s):  
Juan Fang ◽  
Sitong Liu ◽  
Shijian Liu ◽  
Yanjin Cheng ◽  
Lu Yu

Burst growing IoT and cloud computing demand exascale computing systems with high performance and low power consumption to process massive amounts of data. Modern system platforms based on fundamental requirements encounter a performance gap in chasing exponential growth in data speed and amount. To narrow the gap, a heterogamous design gives us a hint. A network-on-chip (NoC) introduces a packet-switched fabric for on-chip communication and becomes the de facto many-core interconnection mechanism; it refers to a vital shared resource for multifarious applications which will notably affect system energy efficiency. Among all the challenges in NoC, unaware application behaviors bring about considerable congestion, which wastes huge amounts of bandwidth and power consumption on the chip. In this paper, we propose a hybrid NoC framework, combining buffered and bufferless NoCs, to make the NoC framework aware of applications’ performance demands. An optimized congestion control scheme is also devised to satisfy the requirement in energy efficiency and the fairness of big data applications. We use a trace-driven simulator to model big data applications. Compared with the classical buffered NoC, the proposed hybrid NoC is able to significantly improve the performance of mixed applications by 17% on average and 24% at the most, decrease the power consumption by 38%, and improve the fairness by 13.3%.


Universal interconnection networks are prime performance tailback for high performance SoCs (Systems-on-Chip). Since shrinking the size of the ICs (Integrated Circuits) is the main aim, NoC (Network-on-Chip), being a segmental and mountable design tactic is a propitious substitute to outmoded bus-mode architectures. NoC combined with 3D-Routers and label switching technique can guarantee low power consumption, QoS along with less latency. In the proposed work, 3D NoCs are proven to be more advantageous by achieving 39.9% reduction in Area, 1.7% reduction in Power Consumption, and 11.3% reduction in Memory usage.


2010 ◽  
Vol 19 (07) ◽  
pp. 1579-1596 ◽  
Author(s):  
XUGUANG GUAN ◽  
ZHANGMING ZHU ◽  
DUAN ZHOU ◽  
YINTANG YANG

To improve the shortcomings of long arbitration–allocation time and large power consumption in conventional network on chips, this paper proposes a high speed multi-resource arbiter with active virtual channel allocation. Through arbitrating the acknowledgement signals of virtual channels, the virtual channels can attend the arbitration–allocation process actively. Thus reduces the virtual channel allocation time. Furthermore, the proposed multi-resource arbiter has integrated the function of virtual channel allocation, which improves the performance and simplifies the design process simultaneously. Null convention logic units are used to make the circuit quasi-delay insensitive and high robustness. The proposed multi-resource arbiter is implemented based on SMIC 0.18 μm standard CMOS technology. Results have shown that the arbitration–allocation time of the arbiter is only 824.1 ps, which reduced 25.1% compared with the conventional one. The power consumption and the area are slightly larger than the conventional ones. The proposed high speed multi-resource arbiter is suitable for high performance and high robustness interconnections of on chip networks.


Author(s):  
Ahmed Jedidi

Multiprocessor system-on-chip (MPSoC) has become an attractive solution for improving the performance of single chip in objective to satisfy the performance growing exponentially of the computer applications as multimedia applications. However, the communication between the different processors’ cores presents the first challenge front the high performance of MPSoC. Besides, Network on Chip (NoC) is among the most prominent solution for handling the on-chip communication. Besides, NoC potential limited by physical limitation, power consumption, latency and bandwidth in the both case: increasing data exchange or scalability of Multicores. Optical communication offers a wider bandwidth and lower power consumption, based on, a new technology named Optical Network-on-Chip (ONoC) has been introduced in MPSoC. However, ONoC components induce the crosstalk noise in the network on both forms intra/inter crosstalk. This serious problem deteriorates the quality of signals and degrades network performance. As a result, detection and monitoring the impairments becoming a challenge to keep the performance in the ONoC. In this article, we propose a new system to detect and monitor the crosstalk noise in ONoC. Particularly, we present an analytic model of intra/inter crosstalk at the optical devices. Then, we evaluate these impairments in objective to present the motivation to detect and monitor crosstalk in ONoC, in which our system has the capability to detect, to localize, and to monitor the crosstalk noise in the whole network. This system offers high reliability, scalability and efficiency with time running time less than 20 ms.


Sign in / Sign up

Export Citation Format

Share Document