multicore cpus Latest Research Papers

Viktor Leis Speaks Out on Concurrency and Parallelism on Multicore CPUs

ACM SIGMOD Record ◽

10.1145/3484622.3484633 ◽

2021 ◽

Vol 50 (2) ◽

pp. 41-43

Author(s):

Marianne Winslett ◽

Vanessa Braganholo

Keyword(s):

Query Processing ◽

Database Systems ◽

University Of Munich ◽

Technical University ◽

The University ◽

Query Processing And Optimization ◽

Multicore Cpus

Welcome to ACM SIGMOD Record's series of interviews with distinguished members of the database community. I'm Marianne Winslett, and today I have here with me Viktor Leis who won the 2018 ACM SIGMOD Jim Gray Dissertation Award for his thesis entitled Query Processing and Optimization in Modern Database Systems. Viktor is now at the University of University of Erlangen-Nuremberg and his Ph.D. is from the Technical University of Munich, where he worked with Thomas Neumann and Alfons Kemper. So, Viktor, welcome.

Download Full-text

Performance and programmability comparison of the thick control flow architecture and current multicore processors

The Journal of Supercomputing ◽

10.1007/s11227-021-03985-0 ◽

2021 ◽

Author(s):

Martti Forsell ◽

Sara Nikula ◽

Jussi Roivainen ◽

Ville Leppänen ◽

Jesper Larsson Träff

Keyword(s):

Multicore Processors ◽

Parallel Execution ◽

Control Flow ◽

Single Chip ◽

Entry Level ◽

Central Processing ◽

Programming Techniques ◽

Access Patterns ◽

Parallel Software Development ◽

Multicore Cpus

AbstractCommercial multicore central processing units (CPU) integrate a number of processor cores on a single chip to support parallel execution of computational tasks. Multicore CPUs can possibly improve performance over single cores for independent parallel tasks nearly linearly as long as sufficient bandwidth is available. Ideal speedup is, however, difficult to achieve when dense intercommunication between the cores or complex memory access patterns is required. This is caused by expensive synchronization and thread switching, and insufficient latency toleration. These facts guide programmers away from straight-forward parallel processing patterns toward complex and error-prone programming techniques. To address these problems, we have introduced the Thick control flow (TCF) Processor Architecture. TCF is an abstraction of parallel computation that combines self-similar threads into computational entities. In this paper, we compare the performance and programmability of an entry-level TCF processor and two Intel Skylake multicore CPUs on commonly used parallel kernels to find out how well our architecture solves these issues that greatly reduce the productivity of parallel software development. Code examples are given and programming experiences recorded.

Download Full-text

Improving the accuracy of energy predictive models for multicore CPUs by combining utilization and performance events model variables

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2021.01.007 ◽

2021 ◽

Vol 151 ◽

pp. 38-51

Author(s):

Arsalan Shahid ◽

Muhammad Fahad ◽

Ravi Reddy Manumachu ◽

Alexey Lastovetsky

Keyword(s):

Predictive Models ◽

Performance Events ◽

And Performance ◽

Multicore Cpus

Download Full-text

Towards Flexible and Compiler-Friendly Layer Fusion for CNNs on Multicore CPUs

10.1007/978-3-030-85665-6_15 ◽

2021 ◽

pp. 232-248

Author(s):

Zhongyi Lin ◽

Evangelos Georganas ◽

John D. Owens

Keyword(s):

Multicore Cpus

Download Full-text

Performance enhancement of a dynamic K-means algorithm through a parallel adaptive strategy on multicore CPUs

Journal of Parallel and Distributed Computing ◽

10.1016/j.jpdc.2020.06.010 ◽

2020 ◽

Vol 145 ◽

pp. 34-41 ◽

Cited By ~ 1

Author(s):

Giuliano Laccetti ◽

Marco Lapegna ◽

Valeria Mele ◽

Diego Romano ◽

Lukasz Szustak

Keyword(s):

Performance Enhancement ◽

Adaptive Strategy ◽

Multicore Cpus

Download Full-text

Model-Based Parallelization for Simulink Models on Multicore CPUs and GPUs

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v20i.8533 ◽

2020 ◽

Vol 20 ◽

pp. 1-13

Author(s):

Zhaoqian Zhong ◽

Masato Edahiro

Keyword(s):

Image Processing ◽

Critical Path ◽

Directed Acyclic Graphs ◽

Parallel Execution ◽

Communication Behaviors ◽

Model Based ◽

Simulink Model ◽

Acyclic Graphs ◽

Block Level ◽

Multicore Cpus

In this paper we propose a model-based approach to parallelize Simulink models of image processing algorithms on homogeneous multicore CPUs and NVIDIA GPUs at the block level and generate CUDA C codes for parallel execution on the target hardware. In the proposed approach, the Simulink models are converted to directed acyclic graphs (DAGs) based on their block diagrams, wherein the nodes represent tasks of grouped blocks or subsystems in the model and the edges represent the communication behaviors between blocks. Next, a path analysis is conducted on the DAGs to extract all execution paths and calculate their respective lengths, which comprises the execution times of tasks and the communication times of edges on the path. Then, an integer linear programming (ILP) formulation is used to minimize the length of the critical path of the DAG, which represents the execution time of the Simulink model. The ILP formulation also balances workloads on each CPU core for optimized hardware utilization. We parallelized image processing models on a platform of two homogeneous CPU cores and two GPUs with our approach and observed a speedup performance between 8.78x and 15.71x.

Download Full-text

A Comparative Study of Techniques for Energy Predictive Modeling Using Performance Monitoring Counters on Modern Multicore CPUs

IEEE Access ◽

10.1109/access.2020.3013812 ◽

2020 ◽

Vol 8 ◽

pp. 143306-143332

Author(s):

Arsalan Shahid ◽

Muhammad Fahad ◽

Ravi Reddy Manumachu ◽

Alexey Lastovetsky

Keyword(s):

Comparative Study ◽

Predictive Modeling ◽

Performance Monitoring ◽

Multicore Cpus

Download Full-text

Multiplication of medium-density matrices using TensorFlow on multicore CPUs

Tehnički glasnik ◽

10.31803/tg-20191104183930 ◽

2019 ◽

Vol 13 (4) ◽

pp. 286-290

Author(s):

Siraphob Theeracheep ◽

Jaruloj Chongstitvatana

Keyword(s):

Machine Learning ◽

Image Processing ◽

Linear Algebra ◽

Matrix Multiplication ◽

Medium Density ◽

Density Matrices ◽

Input Matrix ◽

Programming Paradigm ◽

Dataflow Programming ◽

Multicore Cpus

Matrix multiplication is an essential part of many applications, such as linear algebra, image processing and machine learning. One platform used in such applications is TensorFlow, which is a machine learning library whose structure is based on dataflow programming paradigm. In this work, a method for multiplication of medium-density matrices on multicore CPUs using TensorFlow platform is proposed. This method, called tbt_matmul, utilizes TensorFlow built-in methods tf.matmul and tf.sparse_matmul. By partitioning each input matrix into four smaller sub-matrices, called tiles, and applying an appropriate multiplication method to each pair depending on their density, the proposed method outperforms the built-in methods for matrices of medium density and matrices of significantly uneven distribution of non-zeros.

Download Full-text