many core Latest Research Papers

2022 ◽

Vol 27 (1) ◽

pp. 1-31

Author(s):

Sri Harsha Gade ◽

Sujay Deb

Keyword(s):

Lower Energy ◽

Cache Coherence ◽

Network On Chip ◽

Highly Efficient ◽

Wireless Links ◽

Coherence Protocols ◽

High Area ◽

On Chip ◽

Many Core ◽

Clustered Network

Cache coherence ensures correctness of cached data in multi-core processors. Traditional implementations of existing protocols make them unscalable for many core architectures. While snoopy coherence requires unscalable ordered networks, directory coherence is weighed down by high area and energy overheads. In this work, we propose Wireless-enabled Share-aware Hybrid (WiSH) to provide scalable coherence in many core processors. WiSH implements a novel Snoopy over Directory protocol using on-chip wireless links and hierarchical, clustered Network-on-Chip to achieve low-overhead and highly efficient coherence. A local directory protocol maintains coherence within a cluster of cores, while coherence among such clusters is achieved through global snoopy protocol. The ordered network for global snooping is provided through low-latency and low-energy broadcast wireless links. The overheads are further reduced through share-aware cache segmentation to eliminate coherence for private blocks. Evaluations show that WiSH reduces traffic by and runtime by , while requiring smaller storage and lower energy as compared to existing hierarchical and hybrid coherence protocols. Owing to its modularity, WiSH provides highly efficient and scalable coherence for many core processors.

Download Full-text

Brain-inspired global-local learning incorporated with neuromorphic computing

Nature Communications ◽

10.1038/s41467-021-27653-2 ◽

2022 ◽

Vol 13 (1) ◽

Author(s):

Yujie Wu ◽

Rong Zhao ◽

Jun Zhu ◽

Feng Chen ◽

Mingkun Xu ◽

...

Keyword(s):

Great Promise ◽

Local Learning ◽

Global Learning ◽

Neuronal Dynamics ◽

Neuromorphic Computing ◽

Vision Sensors ◽

Learning Capabilities ◽

Meta Learning ◽

Learning Scenarios ◽

Many Core

AbstractThere are two principle approaches for learning in artificial intelligence: error-driven global learning and neuroscience-oriented local learning. Integrating them into one network may provide complementary learning capabilities for versatile learning scenarios. At the same time, neuromorphic computing holds great promise, but still needs plenty of useful algorithms and algorithm-hardware co-designs to fully exploit its advantages. Here, we present a neuromorphic global-local synergic learning model by introducing a brain-inspired meta-learning paradigm and a differentiable spiking model incorporating neuronal dynamics and synaptic plasticity. It can meta-learn local plasticity and receive top-down supervision information for multiscale learning. We demonstrate the advantages of this model in multiple different tasks, including few-shot learning, continual learning, and fault-tolerance learning in neuromorphic vision sensors. It achieves significantly higher performance than single-learning methods. We further implement the model in the Tianjic neuromorphic platform by exploiting algorithm-hardware co-designs and prove that the model can fully utilize neuromorphic many-core architecture to develop hybrid computation paradigm.

Download Full-text

Device Hopping

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3471909 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1-25

Author(s):

Paul Metzger ◽

Volker Seeker ◽

Christian Fensch ◽

Murray Cole

Keyword(s):

Programming Model ◽

Heterogeneous Systems ◽

Code Size ◽

Fine Grained ◽

Scheduling Policy ◽

High Level ◽

Many Core ◽

Execution Models ◽

Current Systems

Existing OS techniques for homogeneous many-core systems make it simple for single and multithreaded applications to migrate between cores. Heterogeneous systems do not benefit so fully from this flexibility, and applications that cannot migrate in mid-execution may lose potential performance. The situation is particularly challenging when a switch of language runtime would be desirable in conjunction with a migration. We present a case study in making heterogeneous CPU + GPU systems more flexible in this respect. Our technique for fine-grained application migration, allows switches between OpenMP, OpenCL, and CUDA execution, in conjunction with migrations from GPU to CPU, and CPU to GPU. To achieve this, we subdivide iteration spaces into slices, and consider migration on a slice-by-slice basis. We show that slice sizes can be learned offline by machine learning models. To further improve performance, memory transfers are made migration-aware. The complexity of the migration capability is hidden from programmers behind a high-level programming model. We present a detailed evaluation of our mid-kernel migration mechanism with the First Come, First Served scheduling policy. We compare our technique in a focused evaluation scenario against idealized kernel-by-kernel scheduling, which is typical for current systems, and makes perfect kernel to device scheduling decisions, but cannot migrate kernels mid-execution. Models show that up to 1.33× speedup can be achieved over these systems by adding fine-grained migration. Our experimental results with all nine applicable SHOC and Rodinia benchmarks achieve speedups of up to 1.30× (1.08× on average) over an implementation of a perfect but kernel-migration incapable scheduler when migrated to a faster device. Our mechanism and slice size choices introduce an average slowdown of only 2.44% if kernels never migrate. Lastly, our programming model reduces the code size by at least 88% if compared to manual implementations of migratable kernels.

Download Full-text

An automatic many-core code generation method and its implementation under Sunway environment

10.1145/3491396.3506545 ◽

2021 ◽

Author(s):

Jiawei Liu ◽

Qiang Guo ◽

Yuan Zhuang ◽

Haihong Zhang ◽

Yunhui Zeng

Keyword(s):

Code Generation ◽

Many Core

Download Full-text

SmartBoost: Lightweight ML-Driven Boosting for Thermally-Constrained Many-Core Processors

10.1109/dac18074.2021.9586287 ◽

2021 ◽

Author(s):

Martin Rapp ◽

Mohammed Bakr Sikal ◽

Heba Khdr ◽

Jorg Henkel

Keyword(s):

Many Core

Download Full-text

Minimizing development costs for efficient many-core visualization using MCD3

Parallel Computing ◽

10.1016/j.parco.2021.102834 ◽

2021 ◽

Vol 108 ◽

pp. 102834

Author(s):

Kenneth Moreland ◽

Robert Maynard ◽

David Pugmire ◽

Abhishek Yenpure ◽

Allison Vacanti ◽

...

Keyword(s):

Development Costs ◽

Many Core

Download Full-text

Work-In-Progress: Reinforcement Learning-based DAG Scheduling Algorithm in Clustered Many-Core Platform

10.1109/rtss52674.2021.00062 ◽

2021 ◽

Author(s):

Atsushi Yano ◽

Takuya Azumi

Keyword(s):

Reinforcement Learning ◽

Scheduling Algorithm ◽

Work In Progress ◽

Dag Scheduling ◽

Many Core

Download Full-text

POET (v0.1): speedup of many-core parallel reactive transport simulations with fast DHT lookups

Geoscientific Model Development ◽

10.5194/gmd-14-7391-2021 ◽

2021 ◽

Vol 14 (12) ◽

pp. 7391-7409

Author(s):

Marco De Lucia ◽

Michael Kühn ◽

Alexander Lindemann ◽

Max Lübke ◽

Bettina Schnor

Keyword(s):

Reactive Transport ◽

Spent Nuclear Fuel ◽

Co2 Storage ◽

Hash Table ◽

Distributed Hash Table ◽

Central Processing ◽

Reaction Fronts ◽

Fuel Storage ◽

Major Bottleneck ◽

Many Core

Abstract. Coupled reactive transport simulations are extremely demanding in terms of required computational power, which hampers their application and leads to coarsened and oversimplified domains. The chemical sub-process represents the major bottleneck: its acceleration is an urgent challenge which gathers increasing interdisciplinary interest along with pressing requirements for subsurface utilization such as spent nuclear fuel storage, geothermal energy and CO2 storage. In this context we developed POET (POtsdam rEactive Transport), a research parallel reactive transport simulator integrating algorithmic improvements which decisively speed up coupled simulations. In particular, POET is designed with a master/worker architecture, which ensures computational efficiency in both multicore and cluster compute environments. POET does not rely on contiguous grid partitions for the parallelization of chemistry but forms work packages composed of grid cells distant from each other. Such scattering prevents particularly expensive geochemical simulations, usually concentrated in the vicinity of a reactive front, from generating load imbalance between the available CPUs (central processing units), as is often the case with classical partitions. Furthermore, POET leverages an original implementation of the distributed hash table (DHT) mechanism to cache the results of geochemical simulations for further reuse in subsequent time steps during the coupled simulation. The caching is hence particularly advantageous for initially chemically homogeneous simulations and for smooth reaction fronts. We tune the rounding employed in the DHT on a 2D benchmark to validate the caching approach, and we evaluate the performance gain of POET's master/worker architecture and the DHT speedup on a 3D benchmark comprising around 650 000 grid elements. The runtime for 200 coupling iterations, corresponding to 960 simulation days, reduced from about 24 h on 11 workers to 29 min on 719 workers. Activating the DHT reduces the runtime further to 2 h and 8 min respectively. Only with these kinds of reduced hardware requirements and computational costs is it possible to realistically perform the long-term complex reactive transport simulations, as well as perform the uncertainty analyses required by pressing societal challenges connected with subsurface utilization.

Download Full-text

ETHNICITY AND VALUES

10.47850/rl.2021.2.4.100-114 ◽

2021 ◽

pp. 100-114

Author(s):

David. C. Lewis

Keyword(s):

Cultural Identity ◽

Ethnic Group ◽

Cultural Values ◽

Core Values ◽

Traditional Values ◽

Many Core ◽

Over Time

Cultures change in many ways but some basic values within the culture tend to remain over longer periods. Compared with some other potential markers of ethnicity, which may apply to only a certain part of an ethnic group, some traditional values are adhered to by most or perhaps even all members of the ethnic group. Over time, many core values of a culture remain relatively strong even though the manifestations of those values might gradually change. As such, certain cultural values can be regarded as stronger markers of ethnic or cultural identity than some other features of the culture which are more transient.

Download Full-text

Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures

10.1109/mascots53633.2021.9614292 ◽

2021 ◽

Author(s):

Poornima Nookala ◽

Peter Dinda ◽

Kyle C. Hale ◽

Kyle Chard ◽

Ioan Raicu

Keyword(s):

Fine Grained ◽

Fine Grained Parallelism ◽

Many Core

Download Full-text

many core
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures

Brain-inspired global-local learning incorporated with neuromorphic computing

Device Hopping

An automatic many-core code generation method and its implementation under Sunway environment

SmartBoost: Lightweight ML-Driven Boosting for Thermally-Constrained Many-Core Processors

Minimizing development costs for efficient many-core visualization using MCD3

Work-In-Progress: Reinforcement Learning-based DAG Scheduling Algorithm in Clustered Many-Core Platform

POET (v0.1): speedup of many-core parallel reactive transport simulations with fast DHT lookups

ETHNICITY AND VALUES

Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures

Export Citation Format

many coreRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Novel Hybrid Cache Coherence with Global Snooping for Many-core Architectures

Brain-inspired global-local learning incorporated with neuromorphic computing

Device Hopping

An automatic many-core code generation method and its implementation under Sunway environment

SmartBoost: Lightweight ML-Driven Boosting for Thermally-Constrained Many-Core Processors

Minimizing development costs for efficient many-core visualization using MCD3

Work-In-Progress: Reinforcement Learning-based DAG Scheduling Algorithm in Clustered Many-Core Platform

POET (v0.1): speedup of many-core parallel reactive transport simulations with fast DHT lookups

ETHNICITY AND VALUES

Enabling Extremely Fine-grained Parallelism via Scalable Concurrent Queues on Modern Many-core Architectures

many core
Recently Published Documents