multicore cpu Latest Research Papers

AbstractWe study the numerical simulation of the shaken dynamics, a parallel Markovian dynamics for spin systems with local interaction and transition probabilities depending on the two parameters q and J that “tune” the geometry of the underlying lattice. The analysis of the mixing time of the Markov chain and the evaluation of the spin-spin correlations as functions of q and J, make it possible to determine in the (q, J) plane a phase transition curve separating the disordered phase from the ordered one. The relation between the equilibrium measure of the shaken dynamics and the Gibbs measure for the Ising model is also investigated. Finally two different coding approaches are considered for the implementation of the dynamics: a multicore CPU approach, coded in Julia, and a GPU approach coded with CUDA.

Download Full-text

SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters

International Journal of Parallel Programming ◽

10.1007/s10766-021-00704-3 ◽

2021 ◽

Author(s):

August Ernstsson ◽

Johan Ahlqvist ◽

Stavroula Zouzoula ◽

Christoph Kessler

Keyword(s):

Heterogeneous Systems ◽

Multiple Objects ◽

Automatic Data ◽

Consistency Model ◽

Performance Effects ◽

Smart Data ◽

Multicore Cpu ◽

High Level ◽

Skeleton Programming ◽

The Eu

AbstractWe present the third generation of the C++-based open-source skeleton programming framework SkePU. Its main new features include new skeletons, new data container types, support for returning multiple objects from skeleton instances and user functions, support for specifying alternative platform-specific user functions to exploit e.g. custom SIMD instructions, generalized scheduling variants for the multicore CPU backends, and a new cluster-backend targeting the custom MPI interface provided by the StarPU task-based runtime system. We have also revised the smart data containers’ memory consistency model for automatic data sharing between main and device memory. The new features are the result of a two-year co-design effort collecting feedback from HPC application partners in the EU H2020 project EXA2PRO, and target especially the HPC application domain and HPC platforms. We evaluate the performance effects of the new features on high-end multicore CPU and GPU systems and on HPC clusters.

Download Full-text

Performance Optimization on GPGPU & Multicore CPU Using Roofline Model

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1152/1/012021 ◽

2021 ◽

Vol 1152 (1) ◽

pp. 012021

Author(s):

Noor M. Allayla ◽

Shefa A. Dawwd

Keyword(s):

Performance Optimization ◽

Roofline Model ◽

Multicore Cpu

Download Full-text

Implicit discrete ordinates discontinuous Galerkin method for radiation problems on shared-memory multicore CPU/many-core GPU computation architecture

Numerical Heat Transfer Part B Fundamentals ◽

10.1080/10407790.2020.1819708 ◽

2020 ◽

pp. 1-24

Author(s):

Xiao Xu

Keyword(s):

Galerkin Method ◽

Shared Memory ◽

Discontinuous Galerkin ◽

Discontinuous Galerkin Method ◽

Discrete Ordinates ◽

Multicore Cpu ◽

Many Core

Download Full-text

Energy Efficiency of Machine Learning in Embedded Systems Using Neuromorphic Hardware

Electronics ◽

10.3390/electronics9071069 ◽

2020 ◽

Vol 9 (7) ◽

pp. 1069

Author(s):

Minseon Kang ◽

Yongseok Lee ◽

Moonju Park

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

Embedded Systems ◽

Embedded System ◽

Detection System ◽

Processing Unit ◽

Process Data ◽

Central Processing ◽

Neuromorphic Hardware ◽

Multicore Cpu

Recently, the application of machine learning on embedded systems has drawn interest in both the research community and industry because embedded systems located at the edge can produce a faster response and reduce network load. However, software implementation of neural networks on Central Processing Units (CPUs) is considered infeasible in embedded systems due to limited power supply. To accelerate AI processing, the many-core Graphics Processing Unit (GPU) has been a preferred device to the CPU. However, its energy efficiency is not still considered to be good enough for embedded systems. Among other approaches for machine learning on embedded systems, neuromorphic processing chips are expected to be less power-consuming and overcome the memory bottleneck. In this work, we implemented a pedestrian image detection system on an embedded device using a commercially available neuromorphic chip, NM500, which is based on NeuroMem technology. The NM500 processing time and the power consumption were measured as the number of chips was increased from one to seven, and they were compared to those of a multicore CPU system and a GPU-accelerated embedded system. The results show that NM500 is more efficient in terms of energy required to process data for both learning and classification than the GPU-accelerated system or the multicore CPU system. Additionally, limits and possible improvement of the current NM500 are identified based on the experimental results.

Download Full-text

Accelerating truss decomposition on heterogeneous processors

Proceedings of the VLDB Endowment ◽

10.14778/3401960.3401971 ◽

2020 ◽

Vol 13 (10) ◽

pp. 1751-1764

Author(s):

Yulin Che ◽

Zhuohang Lai ◽

Shixuan Sun ◽

Yue Wang ◽

Qiong Luo

Keyword(s):

State Of The Art ◽

Source Code ◽

Processing Strategy ◽

Heterogeneous Processors ◽

Data Skew ◽

Order Of Magnitude ◽

Triangle Counting ◽

Multicore Cpu ◽

Intermediate Results ◽

Real World Datasets

Truss decomposition is to divide a graph into a hierarchy of subgraphs, or trusses. A subgraph is a k -truss ( k ≥ 2) if each edge is in at least k --- 2 triangles in the subgraph. Existing algorithms work by first counting the number of triangles each edge is in and then iteratively incrementing k to peel off the edges that will not appear in ( k + 1)-truss. Due to the data and computation intensity, truss decomposition on billion-edge graphs takes hours to complete on a commodity computer. We propose to accelerate in-memory truss decomposition by (1) compacting intermediate results to optimize memory access, (2) dynamically adjusting the computation based on data characteristics, and (3) parallelizing the algorithm on both the multicore CPU and the GPU. In particular, we optimize the triangle enumeration with data skew handling, and determine at runtime whether to pursue peeling or direct triangle counting to obtain a certain k -truss. We further develop a CPU-GPU co-processing strategy in which the CPU first computes intermediate results and sends the compacted results to the GPU for further computation. Our experiments on real-world datasets show that our implementations outperform the state of the art by up to an order of magnitude. Our source code is publicly available at https://github.com/RapidsAtHKUST/AccTrussDecomposition.

Download Full-text

multicore cpu
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Temporal and spatial parallel processing of simulated quantum annealing on a multicore CPU

Prediction of multicore CPU performance through parallel data mining on public datasets

Three-dimensional discontinuous deformation analysis with explicit contact formulation and block-wise multicore CPU acceleration

Comparative Analysis of Brain Tumor Segmentation with Fuzzy C-Means Using Multicore CPU and CUDA on GPU

Parallel Simulation of Two-Dimensional Ising Models Using Probabilistic Cellular Automata

SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters

Performance Optimization on GPGPU & Multicore CPU Using Roofline Model

Implicit discrete ordinates discontinuous Galerkin method for radiation problems on shared-memory multicore CPU/many-core GPU computation architecture

Energy Efficiency of Machine Learning in Embedded Systems Using Neuromorphic Hardware

Accelerating truss decomposition on heterogeneous processors

Export Citation Format

multicore cpuRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Temporal and spatial parallel processing of simulated quantum annealing on a multicore CPU

Prediction of multicore CPU performance through parallel data mining on public datasets

Three-dimensional discontinuous deformation analysis with explicit contact formulation and block-wise multicore CPU acceleration

Comparative Analysis of Brain Tumor Segmentation with Fuzzy C-Means Using Multicore CPU and CUDA on GPU

Parallel Simulation of Two-Dimensional Ising Models Using Probabilistic Cellular Automata

SkePU 3: Portable High-Level Programming of Heterogeneous Systems and HPC Clusters

Performance Optimization on GPGPU & Multicore CPU Using Roofline Model

Implicit discrete ordinates discontinuous Galerkin method for radiation problems on shared-memory multicore CPU/many-core GPU computation architecture

Energy Efficiency of Machine Learning in Embedded Systems Using Neuromorphic Hardware

Accelerating truss decomposition on heterogeneous processors

multicore cpu
Recently Published Documents