A Hexagonal Processor and Interconnect Topology for Many-Core Architecture with Dense On-Chip Networks

Current computing platforms encourage the integration of thousands of processing cores, and their interconnections, into a single chip. Mobile smartphones, IoT, embedded devices, desktops, and data centers use Many-Core Systems-on-Chip (SoCs) to exploit their compute power and parallelism to meet the dynamic workload requirements. Networks-on-Chip (NoCs) lead to scalable connectivity for diverse applications with distinct traffic patterns and data dependencies. However, when the system executes various applications in traditional NoCs—optimized and fixed at synthesis time—the interconnection nonconformity with the different applications’ requirements generates limitations in the performance. In the literature, NoC designs embraced the Software-Defined Networking (SDN) strategy to evolve into an adaptable interconnection solution for future chips. However, the works surveyed implement a partial Software-Defined Network-on-Chip (SDNoC) approach, leaving aside the SDN layered architecture that brings interoperability in conventional networking. This paper explores the SDNoC literature and classifies it regarding the desired SDN features that each work presents. Then, we described the challenges and opportunities detected from the literature survey. Moreover, we explain the motivation for an SDNoC approach, and we expose both SDN and SDNoC concepts and architectures. We observe that works in the literature employed an uncomplete layered SDNoC approach. This fact creates various fertile areas in the SDNoC architecture where researchers may contribute to Many-Core SoCs designs.

Download Full-text

Near-Optimal Thermal Monitoring Framework for Many-Core Systems-on-Chip

IEEE Transactions on Computers ◽

10.1109/tc.2015.2395423 ◽

2015 ◽

Vol 64 (11) ◽

pp. 3197-3209 ◽

Cited By ~ 2

Author(s):

Juri Ranieri ◽

Alessandro Vincenzi ◽

Amina Chebira ◽

David Atienza ◽

Martin Vetterli

Keyword(s):

Thermal Monitoring ◽

Systems On Chip ◽

Monitoring Framework ◽

On Chip ◽

Many Core

Download Full-text

MCVP-NoC: Many-Core Virtual Platform with Networks-on-Chip support

2013 IEEE 10th International Conference on ASIC ◽

10.1109/asicon.2013.6811836 ◽

2013 ◽

Author(s):

Dexue Zhang ◽

Xiaoyang Zeng ◽

Zongyan Wang ◽

Weike Wang ◽

Xinhua Chen

Keyword(s):

Networks On Chip ◽

Virtual Platform ◽

On Chip ◽

Many Core

Download Full-text

FoToNoC: A Folded Torus-Like Network-on-Chip Based Many-Core Systems-on-Chip in the Dark Silicon Era

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2016.2643669 ◽

2017 ◽

Vol 28 (7) ◽

pp. 1905-1918 ◽

Cited By ~ 16

Author(s):

Lei Yang ◽

Weichen Liu ◽

Weiwen Jiang ◽

Mengquan Li ◽

Peng Chen ◽

...

Keyword(s):

Network On Chip ◽

Dark Silicon ◽

Systems On Chip ◽

On Chip ◽

Many Core

Download Full-text

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

The Journal of Supercomputing ◽

10.1007/s11227-021-03853-x ◽

2021 ◽

Author(s):

Xiaohan Tao ◽

Jianmin Pang ◽

Jinlong Xu ◽

Yu Zhu

Keyword(s):

Energy Consumption ◽

High Performance ◽

Scientific Computing ◽

Data Transfer ◽

Performance Model ◽

Experimental Result ◽

Transfer Model ◽

Scratchpad Memory ◽

On Chip ◽

Many Core

AbstractThe heterogeneous many-core architecture plays an important role in the fields of high-performance computing and scientific computing. It uses accelerator cores with on-chip memories to improve performance and reduce energy consumption. Scratchpad memory (SPM) is a kind of fast on-chip memory with lower energy consumption compared with a hardware cache. However, data transfer between SPM and off-chip memory can be managed only by a programmer or compiler. In this paper, we propose a compiler-directed multithreaded SPM data transfer model (MSDTM) to optimize the process of data transfer in a heterogeneous many-core architecture. We use compile-time analysis to classify data accesses, check dependences and determine the allocation of data transfer operations. We further present the data transfer performance model to derive the optimal granularity of data transfer and select the most profitable data transfer strategy. We implement the proposed MSDTM on the GCC complier and evaluate it on Sunway TaihuLight with selected test cases from benchmarks and scientific computing applications. The experimental result shows that the proposed MSDTM improves the application execution time by 5.49$$\times$$ × and achieves an energy saving of 5.16$$\times$$ × on average.

Download Full-text

Hybrid silicon-photonic network-on-chip for future generations of high-performance many-core systems

The Journal of Supercomputing ◽

10.1007/s11227-015-1539-0 ◽

2015 ◽

Vol 71 (12) ◽

pp. 4446-4475 ◽

Cited By ~ 12

Author(s):

Achraf Ben Ahmed ◽

Abderazek Ben Abdallah

Keyword(s):

High Performance ◽

Network On Chip ◽

Future Generations ◽

Photonic Network ◽

Silicon Photonic ◽

Hybrid Silicon ◽

On Chip ◽

Many Core

Download Full-text

Machine learning for design and optimization challenges in multi/many-core network-on-chip

10.1145/3477231.3490427 ◽

2021 ◽

Author(s):

Md Farhadur Reza

Keyword(s):

Machine Learning ◽

Network On Chip ◽

Core Network ◽

Design And Optimization ◽

On Chip ◽

Many Core

Download Full-text

A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures

ACM Transactions on Design Automation of Electronic Systems ◽

10.1145/3462775 ◽

2022 ◽

Vol 27 (1) ◽

pp. 1-31

Author(s):

Sri Harsha Gade ◽

Sujay Deb

Keyword(s):

Lower Energy ◽

Cache Coherence ◽

Network On Chip ◽

Highly Efficient ◽

Wireless Links ◽

Coherence Protocols ◽

High Area ◽

On Chip ◽

Many Core ◽

Clustered Network

Cache coherence ensures correctness of cached data in multi-core processors. Traditional implementations of existing protocols make them unscalable for many core architectures. While snoopy coherence requires unscalable ordered networks, directory coherence is weighed down by high area and energy overheads. In this work, we propose Wireless-enabled Share-aware Hybrid (WiSH) to provide scalable coherence in many core processors. WiSH implements a novel Snoopy over Directory protocol using on-chip wireless links and hierarchical, clustered Network-on-Chip to achieve low-overhead and highly efficient coherence. A local directory protocol maintains coherence within a cluster of cores, while coherence among such clusters is achieved through global snoopy protocol. The ordered network for global snooping is provided through low-latency and low-energy broadcast wireless links. The overheads are further reduced through share-aware cache segmentation to eliminate coherence for private blocks. Evaluations show that WiSH reduces traffic by and runtime by , while requiring smaller storage and lower energy as compared to existing hierarchical and hybrid coherence protocols. Owing to its modularity, WiSH provides highly efficient and scalable coherence for many core processors.

Download Full-text

Optimized System-Level Design Methods for NoC-Based Many Core Embedded Systems

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Handbook of Research on Embedded Systems Design ◽

10.4018/978-1-4666-6194-3.ch007 ◽

2014 ◽

pp. 150-179

Author(s):

Haoyuan Ying ◽

Klaus Hofmann ◽

Thomas Hollstein

Keyword(s):

Embedded Systems ◽

Design Optimization ◽

Optimization Methods ◽

Design Methods ◽

System Level ◽

System Level Design ◽

Level Design ◽

On Chip ◽

Many Core

Due to the growing demand on high performance and low power in embedded systems, many core architectures are proposed the most suitable solutions. While the design concentration of many core embedded systems is switching from computation-centric to communication-centric, Network-on-Chip (NoC) is one of the best interconnect techniques for such architectures because of the scalability and high communication bandwidth. Formalized and optimized system-level design methods for NoC-based many core embedded systems are desired to improve the system performance and to reduce the power consumption. In order to understand the design optimization methods in depth, a case study of optimizing many core embedded systems based on 3-Dimensional (3D) NoC with irregular vertical link distribution topology through task mapping, core placement, routing, and topology generation is demonstrated in this chapter. Results of cycle-accurate simulation experiments prove the validity and efficiency of the design methods. Specific to the case study configuration, in maximum 60% vertical links can be saved while maintaining the system efficiency in comparison to full vertical link connection 3D NoCs by applying the design optimization methods.

Download Full-text

Implementation and Evaluation of Skip-Links

International Journal of Embedded and Real-Time Communication Systems ◽

10.4018/jertcs.2011070102 ◽

2011 ◽

Vol 2 (3) ◽

pp. 21-49 ◽

Cited By ~ 1

Author(s):

Simon J. Hollis ◽

Chris Jackson

Keyword(s):

Long Range ◽

Switching Activity ◽

Traffic Patterns ◽

Hop Count ◽

Core Networks ◽

Mesh Topology ◽

Node Network ◽

On Chip ◽

Many Core ◽

Prior Analysis

The Skip-link architecture dynamically reconfigures Network-on-Chip (NoC) topologies in order to reduce the overall switching activity in many-core systems. The proposed architecture allows the creation of long-range Skip-links at runtime to reduce the logical distance between frequently communicating nodes. This offers a number of advantages over existing methods of creating optimised topologies already present in research, such as the Reconfigurable NoC (ReNoC) architecture and static Long-Range Link (LRL) insertion. This architecture monitors traffic behaviour and optimises the mesh topology without prior analysis of communications behaviour, and is thus applicable to all applications. The technique described here does not utilise a master node, and each router acts independently. The architecture is thus scalable to future many-core networks. The authors evaluate the performance using a cycle-accurate simulator with synthetic traffic patterns and compare the results to a mesh architecture, demonstrating logical hop count reductions of 12-17%. Coupled with this, up to a doubling in critical load is observed, and the potential for 10% energy reductions on a 16×16 node network.

Download Full-text