multicore processors Latest Research Papers

Work-In-Progress: Cooling by Core-idling: Thermal-aware Thread Scheduling for Mobile Multicore Processors

10.1109/rtss52674.2021.00055 ◽

2021 ◽

Author(s):

Srijeeta Maity ◽

Anirban Ghose ◽

Soumyajit Dey ◽

Sangyoung Park ◽

Samarjit Chakrabarty

Keyword(s):

Multicore Processors ◽

Thread Scheduling ◽

Work In Progress

Energy-Aware Task Scheduling Approach Using DVFS and Particle Swarm Optimization for Heterogeneous Multicore Processors

10.1007/978-981-16-1342-5_75 ◽

2021 ◽

pp. 943-955

Author(s):

K. Siddesha ◽

G. V. Jayaramaiah

Keyword(s):

Particle Swarm Optimization ◽

Task Scheduling ◽

Multicore Processors ◽

Particle Swarm ◽

Energy Aware ◽

Swarm Optimization ◽

Heterogeneous Multicore

Model-based configuration of access protection units for multicore processors in embedded systems

Microprocessors and Microsystems ◽

10.1016/j.micpro.2021.104377 ◽

2021 ◽

pp. 104377

Author(s):

Tobias Dörr ◽

Timo Sandmann ◽

Jürgen Becker

Keyword(s):

Embedded Systems ◽

Multicore Processors ◽

Model Based

Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures

PLoS ONE ◽

10.1371/journal.pone.0257047 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257047

Author(s):

Adrián Lamela ◽

Óscar G. Ossorio ◽

Guillermo Vinuesa ◽

Benjamín Sahelices

Keyword(s):

Markov Model ◽

Hidden Markov Model ◽

Hidden Markov ◽

Multicore Processors ◽

Memory Access ◽

Non Volatile Memory ◽

Volatile Memory ◽

Memory Accesses ◽

Access Patterns ◽

Memory Architectures

Non-volatile memory technology is now available in commodity hardware. This technology can be used as a backup memory for an external dram cache memory without needing to modify the software. However, the higher read and write latencies of non-volatile memory may exacerbate the memory wall problem. In this work we present a novel off-chip prefetch technique based on a Hidden Markov Model that specifically deals with the latency problem caused by complexity of off-chip memory access patterns. Firstly, we present a thorough analysis of off-chip memory access patterns to identify its complexity in multicore processors. Based on this study, we propose a prefetching module located in the llc which uses two small tables, and where the computational complexity of which is linear with the number of computing threads. Our Markov-based technique is able to keep track and make clustering of several simultaneous groups of memory accesses coming from multiple simultaneous threads in a multicore processor. It can quickly identify complex address groups and trigger prefetch with very high accuracy. Our simulations show an improvement of up to 76% in the hit ratio of an off-chip dram cache for multicore architecture over the conventional prefetch technique (g/dc). Also, the overhead of prefetch requests (failed prefetches) is reduced by 48% in single core simulations and by 83% in multicore simulations.

Acceleration and Parallelization of a Linear Equation Solver for Crack Growth Simulation Based on the Phase Field Model

Mathematics ◽

10.3390/math9182248 ◽

2021 ◽

Vol 9 (18) ◽

pp. 2248

Author(s):

Gaku Ishii ◽

Yusaku Yamamoto ◽

Takeshi Takaishi

Keyword(s):

Linear Equation ◽

Crack Growth ◽

Phase Field ◽

Field Model ◽

Multicore Processors ◽

Phase Field Model ◽

Growth Simulation ◽

Crack Growth Simulation ◽

Simulation Based ◽

Linear Equation Solver

We aim to accelerate the linear equation solver for crack growth simulation based on the phase field model. As a first step, we analyze the properties of the coefficient matrices and prove that they are symmetric positive definite. This justifies the use of the conjugate gradient method with the efficient incomplete Cholesky preconditioner. We then parallelize this preconditioner using so-called block multi-color ordering and evaluate its performance on multicore processors. The experimental results show that our solver scales well and achieves an acceleration of several times over the original solver based on the diagonally scaled CG method.

Seeds of SEED: Characterizing Enclave-level Parallelism in Secure Multicore Processors

10.1109/seed51797.2021.00031 ◽

2021 ◽

Author(s):

Brandon D'Agostino ◽

Omer Khan

Keyword(s):

Multicore Processors ◽

Level Parallelism

Energy Efficient Greedy Scheduling of Tasks for DVFS Enabled Heterogeneous Multicore Processors

10.1109/rteict52294.2021.9573873 ◽

2021 ◽

Author(s):

K Siddesha ◽

G V Jayaramaiah

Keyword(s):

Energy Efficient ◽

Multicore Processors ◽

Heterogeneous Multicore ◽

Greedy Scheduling

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Electronics ◽

10.3390/electronics10161984 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1984

Author(s):

Wei Zhang ◽

Zihao Jiang ◽

Zhiguang Chen ◽

Nong Xiao ◽

Yang Ou

Keyword(s):

Energy Efficiency ◽

High Performance ◽

Multicore Processors ◽

Matrix Multiplication ◽

Memory Access ◽

Double Precision ◽

Competitive Performance ◽

General Matrix ◽

Remarkable Improvement ◽

Task Independence

Double-precision general matrix multiplication (DGEMM) is an essential kernel for measuring the potential performance of an HPC platform. ARMv8-based system-on-chips (SoCs) have become the candidates for the next-generation HPC systems with their highly competitive performance and energy efficiency. Therefore, it is meaningful to design high-performance DGEMM for ARMv8-based SoCs. However, as ARMv8-based SoCs integrate increasing cores, modern CPU uses non-uniform memory access (NUMA). NUMA restricts the performance and scalability of DGEMM when many threads access remote NUMA domains. This poses a challenge to develop high-performance DGEMM on multi-NUMA architecture. We present a NUMA-aware method to reduce the number of cross-die and cross-chip memory access events. The critical enabler for NUMA-aware DGEMM is to leverage two levels of parallelism between and within nodes in a purely threaded implementation, which allows the task independence and data localization of NUMA nodes. We have implemented NUMA-aware DGEMM in the OpenBLAS and evaluated it on a dual-socket server with 48-core processors based on the Kunpeng920 architecture. The results show that NUMA-aware DGEMM has effectively reduced the number of cross-die and cross-chip memory access, resulting in enhancing the scalability of DGEMM significantly and increasing the performance of DGEMM by 17.1% on average, with the most remarkable improvement being 21.9%.

Dynamic Priority Real-Time Scheduling on Power Asymmetric Multicore Processors

Symmetry ◽

10.3390/sym13081488 ◽

2021 ◽

Vol 13 (8) ◽

pp. 1488

Author(s):

Basharat Mahmood ◽

Naveed Ahmad ◽

Majid Iqbal Khan ◽

Adnan Akhunzada

Keyword(s):

Real Time ◽

Power Efficiency ◽

Multicore Processors ◽

Real Time Systems ◽

Real Time Scheduling ◽

Dynamic Priority ◽

Time Scheduling ◽

Asymmetric Multicore Processors ◽

Asymmetric Multicore ◽

Time Systems

The use of real-time systems is growing at an increasing rate. This raises the power efficiency as the main challenge for system designers. Power asymmetric multicore processors provide a power-efficient platform for building complex real-time systems. The utilization of this efficient platform can be further enhanced by adopting proficient scheduling policies. Unfortunately, the research on real-time scheduling of power asymmetric multicore processors is in its infancy. In this research, we have addressed this problem and added new results. We have proposed a dynamic-priority semi-partitioned algorithm named: Earliest-Deadline First with C=D Task Splitting (EDFwC=D-TS) for scheduling real-time applications on power asymmetric multicore processors. EDFwC=D-TS outclasses its counterparts in terms of system utilization. The simulation results show that EDFwC=D-TS schedules up to 67% more tasks with heavy workloads. Furthermore, it improves the processor utilization up to 11% and on average uses 14% less cores to schedule the given workload.

Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6515 ◽

2021 ◽

Author(s):

José I. Aliaga ◽

Hartwig Anzt ◽

Thomas Grützmacher ◽

Enrique S. Quintana‐Ortí ◽

Andrés E. Tomás

Keyword(s):

Load Balancing ◽

Graphics Processing Units ◽

Sparse Matrix ◽

Multicore Processors ◽

Vector Product ◽

Graphics Processing ◽

Matrix Vector

multicore processors
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Work-In-Progress: Cooling by Core-idling: Thermal-aware Thread Scheduling for Mobile Multicore Processors

Energy-Aware Task Scheduling Approach Using DVFS and Particle Swarm Optimization for Heterogeneous Multicore Processors

Model-based configuration of access protection units for multicore processors in embedded systems

Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures

Acceleration and Parallelization of a Linear Equation Solver for Crack Growth Simulation Based on the Phase Field Model

Seeds of SEED: Characterizing Enclave-level Parallelism in Secure Multicore Processors

Energy Efficient Greedy Scheduling of Tasks for DVFS Enabled Heterogeneous Multicore Processors

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Dynamic Priority Real-Time Scheduling on Power Asymmetric Multicore Processors

Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units

Export Citation Format

multicore processorsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Work-In-Progress: Cooling by Core-idling: Thermal-aware Thread Scheduling for Mobile Multicore Processors

Energy-Aware Task Scheduling Approach Using DVFS and Particle Swarm Optimization for Heterogeneous Multicore Processors

Model-based configuration of access protection units for multicore processors in embedded systems

Off-chip prefetching based on Hidden Markov Model for non-volatile memory architectures

Acceleration and Parallelization of a Linear Equation Solver for Crack Growth Simulation Based on the Phase Field Model

Seeds of SEED: Characterizing Enclave-level Parallelism in Secure Multicore Processors

Energy Efficient Greedy Scheduling of Tasks for DVFS Enabled Heterogeneous Multicore Processors

NUMA-Aware DGEMM Based on 64-Bit ARMv8 Multicore Processors Architecture

Dynamic Priority Real-Time Scheduling on Power Asymmetric Multicore Processors

Compression and load balancing for efficient sparse matrix‐vector product on multicore processors and graphics processing units

multicore processors
Recently Published Documents