multicore cpu
Recently Published Documents


TOTAL DOCUMENTS

66
(FIVE YEARS 18)

H-INDEX

12
(FIVE YEARS 1)

Displays ◽  
2022 ◽  
Vol 71 ◽  
pp. 102112
Author(s):  
Navin Mani Upadhyay ◽  
Ravi Shankar Singh ◽  
Shri Prakash Dwivedi

2021 ◽  
Vol 184 (1) ◽  
Author(s):  
Roberto D’Autilia ◽  
Louis Nantenaina Andrianaivo ◽  
Alessio Troiani

AbstractWe study the numerical simulation of the shaken dynamics, a parallel Markovian dynamics for spin systems with local interaction and transition probabilities depending on the two parameters q and J that “tune” the geometry of the underlying lattice. The analysis of the mixing time of the Markov chain and the evaluation of the spin-spin correlations as functions of q and J, make it possible to determine in the (q, J) plane a phase transition curve separating the disordered phase from the ordered one. The relation between the equilibrium measure of the shaken dynamics and the Gibbs measure for the Ising model is also investigated. Finally two different coding approaches are considered for the implementation of the dynamics: a multicore CPU approach, coded in Julia, and a GPU approach coded with CUDA.


Author(s):  
August Ernstsson ◽  
Johan Ahlqvist ◽  
Stavroula Zouzoula ◽  
Christoph Kessler

AbstractWe present the third generation of the C++-based open-source skeleton programming framework SkePU. Its main new features include new skeletons, new data container types, support for returning multiple objects from skeleton instances and user functions, support for specifying alternative platform-specific user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the multicore CPU backends, and a new cluster-backend targeting the custom MPI interface provided by the StarPU task-based runtime system. We have also revised the smart data containers’ memory consistency model for automatic data sharing between main and device memory. The new features are the result of a two-year co-design effort collecting feedback from HPC application partners in the EU H2020 project EXA2PRO, and target especially the HPC application domain and HPC platforms. We evaluate the performance effects of the new features on high-end multicore CPU and GPU systems and on HPC clusters.


Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1069
Author(s):  
Minseon Kang ◽  
Yongseok Lee ◽  
Moonju Park

Recently, the application of machine learning on embedded systems has drawn interest in both the research community and industry because embedded systems located at the edge can produce a faster response and reduce network load. However, software implementation of neural networks on Central Processing Units (CPUs) is considered infeasible in embedded systems due to limited power supply. To accelerate AI processing, the many-core Graphics Processing Unit (GPU) has been a preferred device to the CPU. However, its energy efficiency is not still considered to be good enough for embedded systems. Among other approaches for machine learning on embedded systems, neuromorphic processing chips are expected to be less power-consuming and overcome the memory bottleneck. In this work, we implemented a pedestrian image detection system on an embedded device using a commercially available neuromorphic chip, NM500, which is based on NeuroMem technology. The NM500 processing time and the power consumption were measured as the number of chips was increased from one to seven, and they were compared to those of a multicore CPU system and a GPU-accelerated embedded system. The results show that NM500 is more efficient in terms of energy required to process data for both learning and classification than the GPU-accelerated system or the multicore CPU system. Additionally, limits and possible improvement of the current NM500 are identified based on the experimental results.


2020 ◽  
Vol 13 (10) ◽  
pp. 1751-1764
Author(s):  
Yulin Che ◽  
Zhuohang Lai ◽  
Shixuan Sun ◽  
Yue Wang ◽  
Qiong Luo

Truss decomposition is to divide a graph into a hierarchy of subgraphs, or trusses. A subgraph is a k -truss ( k ≥ 2) if each edge is in at least k --- 2 triangles in the subgraph. Existing algorithms work by first counting the number of triangles each edge is in and then iteratively incrementing k to peel off the edges that will not appear in ( k + 1)-truss. Due to the data and computation intensity, truss decomposition on billion-edge graphs takes hours to complete on a commodity computer. We propose to accelerate in-memory truss decomposition by (1) compacting intermediate results to optimize memory access, (2) dynamically adjusting the computation based on data characteristics, and (3) parallelizing the algorithm on both the multicore CPU and the GPU. In particular, we optimize the triangle enumeration with data skew handling, and determine at runtime whether to pursue peeling or direct triangle counting to obtain a certain k -truss. We further develop a CPU-GPU co-processing strategy in which the CPU first computes intermediate results and sends the compacted results to the GPU for further computation. Our experiments on real-world datasets show that our implementations outperform the state of the art by up to an order of magnitude. Our source code is publicly available at https://github.com/RapidsAtHKUST/AccTrussDecomposition.


Sign in / Sign up

Export Citation Format

Share Document