scholarly journals An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Author(s):  
Sunpyo Hong ◽  
Hyesoon Kim
2018 ◽  
Vol 15 (1) ◽  
pp. 1-21 ◽  
Author(s):  
Zhen Lin ◽  
Michael Mantor ◽  
Huiyang Zhou

2015 ◽  
Vol 2015 ◽  
pp. 1-10
Author(s):  
Jianliang Ma ◽  
Jinglei Meng ◽  
Tianzhou Chen ◽  
Minghui Wu

Ultra high thread-level parallelism in modern GPUs usually introduces numerous memory requests simultaneously. So there are always plenty of memory requests waiting at each bank of the shared LLC (L2 in this paper) and global memory. For global memory, various schedulers have already been developed to adjust the request sequence. But we find few work has ever focused on the service sequence on the shared LLC. We measured that a big number of GPU applications always queue at LLC bank for services, which provide opportunity to optimize the service order on LLC. Through adjusting the GPU memory request service order, we can improve the schedulability of SM. So we proposed a critical-aware shared LLC request scheduling algorithm (CaLRS) in this paper. The priority representative of memory request is critical for CaLRS. We use the number of memory requests that originate from the same warp but have not been serviced when they arrive at the shared LLC bank to represent the criticality of each warp. Experiments show that the proposed scheme can boost the SM schedulability effectively by promoting the scheduling priority of the memory requests with high criticality and improves the performance of GPU indirectly.


Author(s):  
Ramon Amela ◽  
Cristian Ramon-Cortes ◽  
Jorge Ejarque ◽  
Javier Conejero ◽  
Rosa M. Badia

Python is a popular programming language due to the simplicity of its syntax, while still achieving a good performance even being an interpreted language. The adoption from multiple scientific communities has evolved in the emergence of a large number of libraries and modules, which has helped to put Python on the top of the list of the programming languages [1]. Task-based programming has been proposed in the recent years as an alternative parallel programming model. PyCOMPSs follows such approach for Python, and this paper presents its extensions to combine task-based parallelism and thread-level parallelism. Also, we present how PyCOMPSs has been adapted to support heterogeneous architectures, including Xeon Phi and GPUs. Results obtained with linear algebra benchmarks demonstrate that significant performance can be obtained with a few lines of Python.


Author(s):  
Cao Gao ◽  
Anthony Gutierrez ◽  
Ronald G. Dreslinski ◽  
Trevor Mudge ◽  
Krisztian Flautner ◽  
...  

Computing ◽  
2014 ◽  
Vol 96 (6) ◽  
pp. 545-564 ◽  
Author(s):  
John Ye ◽  
Hui Yan ◽  
Honglun Hou ◽  
Tianzhou Chen

Sign in / Sign up

Export Citation Format

Share Document