many core
Recently Published Documents


TOTAL DOCUMENTS

2176
(FIVE YEARS 402)

H-INDEX

41
(FIVE YEARS 6)

2022 ◽  
Vol 27 (1) ◽  
pp. 1-31
Author(s):  
Sri Harsha Gade ◽  
Sujay Deb

Cache coherence ensures correctness of cached data in multi-core processors. Traditional implementations of existing protocols make them unscalable for many core architectures. While snoopy coherence requires unscalable ordered networks, directory coherence is weighed down by high area and energy overheads. In this work, we propose Wireless-enabled Share-aware Hybrid (WiSH) to provide scalable coherence in many core processors. WiSH implements a novel Snoopy over Directory protocol using on-chip wireless links and hierarchical, clustered Network-on-Chip to achieve low-overhead and highly efficient coherence. A local directory protocol maintains coherence within a cluster of cores, while coherence among such clusters is achieved through global snoopy protocol. The ordered network for global snooping is provided through low-latency and low-energy broadcast wireless links. The overheads are further reduced through share-aware cache segmentation to eliminate coherence for private blocks. Evaluations show that WiSH reduces traffic by and runtime by , while requiring smaller storage and lower energy as compared to existing hierarchical and hybrid coherence protocols. Owing to its modularity, WiSH provides highly efficient and scalable coherence for many core processors.


2022 ◽  
Vol 13 (1) ◽  
Author(s):  
Yujie Wu ◽  
Rong Zhao ◽  
Jun Zhu ◽  
Feng Chen ◽  
Mingkun Xu ◽  
...  

AbstractThere are two principle approaches for learning in artificial intelligence: error-driven global learning and neuroscience-oriented local learning. Integrating them into one network may provide complementary learning capabilities for versatile learning scenarios. At the same time, neuromorphic computing holds great promise, but still needs plenty of useful algorithms and algorithm-hardware co-designs to fully exploit its advantages. Here, we present a neuromorphic global-local synergic learning model by introducing a brain-inspired meta-learning paradigm and a differentiable spiking model incorporating neuronal dynamics and synaptic plasticity. It can meta-learn local plasticity and receive top-down supervision information for multiscale learning. We demonstrate the advantages of this model in multiple different tasks, including few-shot learning, continual learning, and fault-tolerance learning in neuromorphic vision sensors. It achieves significantly higher performance than single-learning methods. We further implement the model in the Tianjic neuromorphic platform by exploiting algorithm-hardware co-designs and prove that the model can fully utilize neuromorphic many-core architecture to develop hybrid computation paradigm.


2021 ◽  
Vol 18 (4) ◽  
pp. 1-25
Author(s):  
Paul Metzger ◽  
Volker Seeker ◽  
Christian Fensch ◽  
Murray Cole

Existing OS techniques for homogeneous many-core systems make it simple for single and multithreaded applications to migrate between cores. Heterogeneous systems do not benefit so fully from this flexibility, and applications that cannot migrate in mid-execution may lose potential performance. The situation is particularly challenging when a switch of language runtime would be desirable in conjunction with a migration. We present a case study in making heterogeneous CPU + GPU systems more flexible in this respect. Our technique for fine-grained application migration, allows switches between OpenMP, OpenCL, and CUDA execution, in conjunction with migrations from GPU to CPU, and CPU to GPU. To achieve this, we subdivide iteration spaces into slices, and consider migration on a slice-by-slice basis. We show that slice sizes can be learned offline by machine learning models. To further improve performance, memory transfers are made migration-aware. The complexity of the migration capability is hidden from programmers behind a high-level programming model. We present a detailed evaluation of our mid-kernel migration mechanism with the First Come, First Served scheduling policy. We compare our technique in a focused evaluation scenario against idealized kernel-by-kernel scheduling, which is typical for current systems, and makes perfect kernel to device scheduling decisions, but cannot migrate kernels mid-execution. Models show that up to 1.33× speedup can be achieved over these systems by adding fine-grained migration. Our experimental results with all nine applicable SHOC and Rodinia benchmarks achieve speedups of up to 1.30× (1.08× on average) over an implementation of a perfect but kernel-migration incapable scheduler when migrated to a faster device. Our mechanism and slice size choices introduce an average slowdown of only 2.44% if kernels never migrate. Lastly, our programming model reduces the code size by at least 88% if compared to manual implementations of migratable kernels.


2021 ◽  
Author(s):  
Jiawei Liu ◽  
Qiang Guo ◽  
Yuan Zhuang ◽  
Haihong Zhang ◽  
Yunhui Zeng
Keyword(s):  

2021 ◽  
Author(s):  
Martin Rapp ◽  
Mohammed Bakr Sikal ◽  
Heba Khdr ◽  
Jorg Henkel
Keyword(s):  

2021 ◽  
Vol 108 ◽  
pp. 102834
Author(s):  
Kenneth Moreland ◽  
Robert Maynard ◽  
David Pugmire ◽  
Abhishek Yenpure ◽  
Allison Vacanti ◽  
...  
Keyword(s):  

2021 ◽  
Vol 14 (12) ◽  
pp. 7391-7409
Author(s):  
Marco De Lucia ◽  
Michael Kühn ◽  
Alexander Lindemann ◽  
Max Lübke ◽  
Bettina Schnor

Abstract. Coupled reactive transport simulations are extremely demanding in terms of required computational power, which hampers their application and leads to coarsened and oversimplified domains. The chemical sub-process represents the major bottleneck: its acceleration is an urgent challenge which gathers increasing interdisciplinary interest along with pressing requirements for subsurface utilization such as spent nuclear fuel storage, geothermal energy and CO2 storage. In this context we developed POET (POtsdam rEactive Transport), a research parallel reactive transport simulator integrating algorithmic improvements which decisively speed up coupled simulations. In particular, POET is designed with a master/worker architecture, which ensures computational efficiency in both multicore and cluster compute environments. POET does not rely on contiguous grid partitions for the parallelization of chemistry but forms work packages composed of grid cells distant from each other. Such scattering prevents particularly expensive geochemical simulations, usually concentrated in the vicinity of a reactive front, from generating load imbalance between the available CPUs (central processing units), as is often the case with classical partitions. Furthermore, POET leverages an original implementation of the distributed hash table (DHT) mechanism to cache the results of geochemical simulations for further reuse in subsequent time steps during the coupled simulation. The caching is hence particularly advantageous for initially chemically homogeneous simulations and for smooth reaction fronts. We tune the rounding employed in the DHT on a 2D benchmark to validate the caching approach, and we evaluate the performance gain of POET's master/worker architecture and the DHT speedup on a 3D benchmark comprising around 650 000 grid elements. The runtime for 200 coupling iterations, corresponding to 960 simulation days, reduced from about 24 h on 11 workers to 29 min on 719 workers. Activating the DHT reduces the runtime further to 2 h and 8 min respectively. Only with these kinds of reduced hardware requirements and computational costs is it possible to realistically perform the long-term complex reactive transport simulations, as well as perform the uncertainty analyses required by pressing societal challenges connected with subsurface utilization.


2021 ◽  
pp. 100-114
Author(s):  
David. C. Lewis

Cultures change in many ways but some basic values within the culture tend to remain over longer periods. Compared with some other potential markers of ethnicity, which may apply to only a certain part of an ethnic group, some traditional values are adhered to by most or perhaps even all members of the ethnic group. Over time, many core values of a culture remain relatively strong even though the manifestations of those values might gradually change. As such, certain cultural values can be regarded as stronger markers of ethnic or cultural identity than some other features of the culture which are more transient.


Author(s):  
Poornima Nookala ◽  
Peter Dinda ◽  
Kyle C. Hale ◽  
Kyle Chard ◽  
Ioan Raicu

Sign in / Sign up

Export Citation Format

Share Document