scholarly journals Energy characteristic of a processor allocator and a network-on-chip

Author(s):  
Dawid Zydek ◽  
Henry Selvaraj ◽  
Grzegorz Borowik ◽  
Tadeusz Łuba

Energy characteristic of a processor allocator and a network-on-chip Energy consumption in a Chip MultiProcessor (CMP) is one of the most important costs. It is related to design aspects such as thermal and power constrains. Besides efficient on-chip processing elements, a well-designed Processor Allocator (PA) and a Network-on-Chip (NoC) are also important factors in the energy budget of novel CMPs. In this paper, the authors propose an energy model for NoCs with 2D-mesh and 2D-torus topologies. All important NoC architectures are described and discussed. Energy estimation is presented for PAs. The estimation is based on synthesis results for PAs targeting FPGA. The PAs are driven by allocation algorithms that are studied as well. The proposed energy model is employed in a simulation environment, where exhaustive experiments are performed. Simulation results show that a PA with an IFF allocation algorithm for mesh systems and a torus-based NoC with express-virtual-channel flow control are very energy efficient. Combination of these two solutions is a clear choice for modern CMPs.

Sensors ◽  
2018 ◽  
Vol 18 (7) ◽  
pp. 2330 ◽  
Author(s):  
Alberto Scionti ◽  
Somnath Mazumdar ◽  
Antoni Portero

The rapid evolution of Cloud-based services and the growing interest in deep learning (DL)-based applications is putting increasing pressure on hyperscalers and general purpose hardware designers to provide more efficient and scalable systems. Cloud-based infrastructures must consist of more energy efficient components. The evolution must take place from the core of the infrastructure (i.e., data centers (DCs)) to the edges (Edge computing) to adequately support new/future applications. Adaptability/elasticity is one of the features required to increase the performance-to-power ratios. Hardware-based mechanisms have been proposed to support system reconfiguration mostly at the processing elements level, while fewer studies have been carried out regarding scalable, modular interconnected sub-systems. In this paper, we propose a scalable Software Defined Network-on-Chip (SDNoC)-based architecture. Our solution can easily be adapted to support devices ranging from low-power computing nodes placed at the edge of the Cloud to high-performance many-core processors in the Cloud DCs, by leveraging on a modular design approach. The proposed design merges the benefits of hierarchical network-on-chip (NoC) topologies (via fusing the ring and the 2D-mesh topology), with those brought by dynamic reconfiguration (i.e., adaptation). Our proposed interconnect allows for creating different types of virtualised topologies aiming at serving different communication requirements and thus providing better resource partitioning (virtual tiles) for concurrent tasks. To further allow the software layer controlling and monitoring of the NoC subsystem, a few customised instructions supporting a data-driven program execution model (PXM) are added to the processing element’s instruction set architecture (ISA). In general, the data-driven programming and execution models are suitable for supporting the DL applications. We also introduce a mechanism to map a high-level programming language embedding concurrent execution models into the basic functionalities offered by our SDNoC for easing the programming of the proposed system. In the reported experiments, we compared our lightweight reconfigurable architecture to a conventional flattened 2D-mesh interconnection subsystem. Results show that our design provides an increment of the data traffic throughput of 9.5% and a reduction of 2.2× of the average packet latency, compared to the flattened 2D-mesh topology connecting the same number of processing elements (PEs) (up to 1024 cores). Similarly, power and resource (on FPGA devices) consumption is also low, confirming good scalability of the proposed architecture.


The arrangements of nodes in the network identifies the complexity of the network. To reduce the complexity, a structural arrangements of nodes has to be taken care. The mesh topology yields attraction than the other traditional topologies. Making the opposite corner nodes to communicate with less hops and avoiding the centre of the networks traffic, Over-Looped 2D Mesh Topology is proposed. For a homogeneous systems the proposed work can be deployed without altering any of the switch component compositions. By making the flits, travel in the outer corner nodes with the help of looping nodes will make the journey from source to destination with less hops. For smaller network below 4x4 the looping is less responsive. For odd or even number of columns and rows the looping can be done. The number of columns and number of rows need not to be equal. The left over nodes will be looped accordingly. The hop count of the Over-Looped 2D Mesh Topology compared to 2D mesh decreases the journey by 25%. The wiring segmentation and the wiring length of the system more than 10 % from 2D mesh and less than 20% from 2D Torus


2012 ◽  
Vol 13 (01n02) ◽  
pp. 1250001 ◽  
Author(s):  
MOHAMMAD H. AL-TOWAIQ ◽  
KHALED DAY

Network-on-chip multicore architectures with a large number of processing elements are becoming a reality with the recent developments in technology. In these modern systems the processing elements are interconnected with regular network-on-chip (NoC) topologies such as meshes and trees. In this paper we propose a parallel Gauss-Seidel (GS) iterative algorithm for solving large systems of linear equations on a torus NoC architecture. The proposed parallel algorithm is O (Nn2/k2) time complexity for solving a system with matrix of order n on a k × k torus NoC architecture with N iterations assuming n and N are large compared to k (i.e. for large linear systems that require a large number of iterations). We show that under these conditions the proposed parallel GS algorithm has near optimal speedup.


2014 ◽  
Vol 23 (09) ◽  
pp. 1450120 ◽  
Author(s):  
ADEL SOUDANI ◽  
AHMED ALDAMMAS ◽  
ABDULLAH AL-DHELAAN

Embedded distributed multimedia applications based on the use of on-chip networks for communication and messages exchange requires specific and enhanced quality of service (QoS) management. To reach the desired performances at the application level, the network-on-chip (NoC) router should implement per flit handling strategy with wide granularity. This purpose requires an enhanced internal architecture that ensures from one hand a specific management according to a service classification and from the other hand, it enhances the routing process. In this context, this paper proposes a new mechanism for QoS management in NoC. This mechanism is based on the use of central memory where flits are in-queued according to their class of service. This scheme enables an optimal flit scheduling phase and provides more capabilities to drop low important flits when the router shows congestion state symptoms. The paper presents, also, a protocol structure that fills with this architecture and introduces a signaling mechanism to make efficient the QoS management through the proposed architecture. The circuit performances and its adaptability to achieve QoS with low power processing and high bandwidth in on chip multiprocessor systems will be studied in this paper.


2012 ◽  
Vol 630 ◽  
pp. 276-282
Author(s):  
Hao Wang ◽  
Ling Wu

In order to avoid the deadlock and high transmission delay of network on chip in multicast communication, this paper put forward a solution of multicast communication model. First, the author carried out a formalized description for the multicast communication model. Secondly, illustrate the deadlock caused by the loop circuit waiting. To solve this problem, the NOC multicast communication model was proposed based on the 2D Torus topology. In addition, this paper also presented an example to validate its correctness. Finally, simulate and apply this model simulation to the NOC of 2D Torus topology structure by the OPNET Modeler. The test results show that this multicast communication model has lower transmission delay and higher throughput volume compared with the unicast routing strategy using XY routing.


Sign in / Sign up

Export Citation Format

Share Document