multicore cpus
Recently Published Documents


TOTAL DOCUMENTS

69
(FIVE YEARS 13)

H-INDEX

11
(FIVE YEARS 2)

2021 ◽  
Vol 50 (2) ◽  
pp. 41-43
Author(s):  
Marianne Winslett ◽  
Vanessa Braganholo

Welcome to ACM SIGMOD Record's series of interviews with distinguished members of the database community. I'm Marianne Winslett, and today I have here with me Viktor Leis who won the 2018 ACM SIGMOD Jim Gray Dissertation Award for his thesis entitled Query Processing and Optimization in Modern Database Systems. Viktor is now at the University of University of Erlangen-Nuremberg and his Ph.D. is from the Technical University of Munich, where he worked with Thomas Neumann and Alfons Kemper. So, Viktor, welcome.


Author(s):  
Martti Forsell ◽  
Sara Nikula ◽  
Jussi Roivainen ◽  
Ville Leppänen ◽  
Jesper Larsson Träff

AbstractCommercial multicore central processing units (CPU) integrate a number of processor cores on a single chip to support parallel execution of computational tasks. Multicore CPUs can possibly improve performance over single cores for independent parallel tasks nearly linearly as long as sufficient bandwidth is available. Ideal speedup is, however, difficult to achieve when dense intercommunication between the cores or complex memory access patterns is required. This is caused by expensive synchronization and thread switching, and insufficient latency toleration. These facts guide programmers away from straight-forward parallel processing patterns toward complex and error-prone programming techniques. To address these problems, we have introduced the Thick control flow (TCF) Processor Architecture. TCF is an abstraction of parallel computation that combines self-similar threads into computational entities. In this paper, we compare the performance and programmability of an entry-level TCF processor and two Intel Skylake multicore CPUs on commonly used parallel kernels to find out how well our architecture solves these issues that greatly reduce the productivity of parallel software development. Code examples are given and programming experiences recorded.


2021 ◽  
pp. 232-248
Author(s):  
Zhongyi Lin ◽  
Evangelos Georganas ◽  
John D. Owens
Keyword(s):  

2020 ◽  
Vol 145 ◽  
pp. 34-41 ◽  
Author(s):  
Giuliano Laccetti ◽  
Marco Lapegna ◽  
Valeria Mele ◽  
Diego Romano ◽  
Lukasz Szustak

2020 ◽  
Vol 20 ◽  
pp. 1-13
Author(s):  
Zhaoqian Zhong ◽  
Masato Edahiro

In this paper we propose a model-based approach to parallelize Simulink models of image processing algorithms on homogeneous multicore CPUs and NVIDIA GPUs at the block level and generate CUDA C codes for parallel execution on the target hardware. In the proposed approach, the Simulink models are converted to directed acyclic graphs (DAGs) based on their block diagrams, wherein the nodes represent tasks of grouped blocks or subsystems in the model and the edges represent the communication behaviors between blocks. Next, a path analysis is conducted on the DAGs to extract all execution paths and calculate their respective lengths, which comprises the execution times of tasks and the communication times of edges on the path. Then, an integer linear programming (ILP) formulation is used to minimize the length of the critical path of the DAG, which represents the execution time of the Simulink model. The ILP formulation also balances workloads on each CPU core for optimized hardware utilization. We parallelized image processing models on a platform of two homogeneous CPU cores and two GPUs with our approach and observed a speedup performance between 8.78x and 15.71x.


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 143306-143332
Author(s):  
Arsalan Shahid ◽  
Muhammad Fahad ◽  
Ravi Reddy Manumachu ◽  
Alexey Lastovetsky

2019 ◽  
Vol 13 (4) ◽  
pp. 286-290
Author(s):  
Siraphob Theeracheep ◽  
Jaruloj Chongstitvatana

Matrix multiplication is an essential part of many applications, such as linear algebra, image processing and machine learning. One platform used in such applications is TensorFlow, which is a machine learning library whose structure is based on dataflow programming paradigm. In this work, a method for multiplication of medium-density matrices on multicore CPUs using TensorFlow platform is proposed. This method, called tbt_matmul, utilizes TensorFlow built-in methods tf.matmul and tf.sparse_matmul. By partitioning each input matrix into four smaller sub-matrices, called tiles, and applying an appropriate multiplication method to each pair depending on their density, the proposed method outperforms the built-in methods for matrices of medium density and matrices of significantly uneven distribution of non-zeros.


2019 ◽  
Vol 31 (7) ◽  
pp. 1239-1252 ◽  
Author(s):  
Son T. Mai ◽  
Sihem Amer-Yahia ◽  
Ira Assent ◽  
Mathias Skovgaard Birk ◽  
Martin Storgaard Dieu ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document