thread mapping
Recently Published Documents


TOTAL DOCUMENTS

40
(FIVE YEARS 2)

H-INDEX

7
(FIVE YEARS 0)

2021 ◽  
Vol 11 (14) ◽  
pp. 6486
Author(s):  
Mei-Ling Chiang ◽  
Wei-Lun Su

NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load between cores occurs, the kernel scheduler’s load balancing mechanism then migrates threads between cores or across NUMA nodes. Remote memory access is required for a thread to access memory on the previous node, which degrades performance. Threads to be migrated must be selected effectively and efficiently since the related operations run in the critical path of the kernel scheduler. This study focuses on improving inter-node load balancing for multithreaded applications. We propose a thread-aware selection policy that considers the distribution of threads on nodes for each thread group while migrating one thread for inter-node load balancing. The thread is selected for which its thread group has the least exclusive thread distribution, and thread members are distributed more evenly on nodes. This has less influence on data mapping and thread mapping for the thread group. We further devise several enhancements to eliminate superfluous evaluations for multithreaded processes, so the selection procedure is more efficient. The experimental results for the commonly used PARSEC 3.0 benchmark suite show that the modified Linux kernel with the proposed selection policy increases performance by 10.7% compared with the unmodified Linux kernel.



Author(s):  
Matheus W. Camargo ◽  
Matheus S. Serpa ◽  
Danilo Carastan-Santos ◽  
Alexandre Carissimi ◽  
Philippe O. A. Navaux


2020 ◽  
Vol 113 ◽  
pp. 158-169
Author(s):  
Cristobál A. Navarro ◽  
Felipe A. Quezada ◽  
Nancy Hitschfeld ◽  
Raimundo Vega ◽  
Benjamin Bustos
Keyword(s):  




Energies ◽  
2019 ◽  
Vol 12 (7) ◽  
pp. 1346 ◽  
Author(s):  
Tao Ju ◽  
Yan Zhang ◽  
Xuejun Zhang ◽  
Xiaogang Du ◽  
Xiaoshe Dong

Improving computing performance and reducing energy consumption are a major concern in heterogeneous many-core systems. The thread count directly influences the computing performance and energy consumption for a multithread application running on a heterogeneous many-core system. For this work, we studied the interrelation between the thread count and the performance of applications to improve total energy efficiency. A prediction model of the optimum thread count, hereafter the thread count prediction model (TCPM), was designed by using regression analysis based on the program running behaviors and heterogeneous many-core architecture feature. Subsequently, a dynamic predictive thread mapping (DPTM) framework was proposed. DPTM uses the prediction model to estimate the optimum thread count and dynamically adjusts the number of active hardware threads according to the phase changes of the running program in order to achieve the optimal energy efficiency. Experimental results show that DPTM obtains a nearly 49% improvement in performance and a 59% reduction in energy consumption on average. Moreover, DPTM introduces about 2% additional overhead compared with traditional thread mapping for PARSEC(The Princeton Application Repository for Shared-Memory Computers) benchmark programs running on an Intel MIC (Many integrated core)heterogeneous many-core system.





2019 ◽  
Vol 17 (02) ◽  
pp. 270-279
Author(s):  
Amanda Maria Pinho Amorim ◽  
Henrique Cota de Freitas
Keyword(s):  




Computers ◽  
2018 ◽  
Vol 7 (4) ◽  
pp. 66
Author(s):  
Iulia Știrb

The paper presents a Non-Uniform Memory Access (NUMA)-aware compiler optimization for task-level parallel code. The optimization is based on Non-Uniform Memory Access—Balanced Task and Loop Parallelism (NUMA-BTLP) algorithm Ştirb, 2018. The algorithm gets the type of each thread in the source code based on a static analysis of the code. After assigning a type to each thread, NUMA-BTLP Ştirb, 2018 calls NUMA-BTDM mapping algorithm Ştirb, 2016 which uses PThreads routine pthread_setaffinity_np to set the CPU affinities of the threads (i.e., thread-to-core associations) based on their type. The algorithms perform an improve thread mapping for NUMA systems by mapping threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that PThreads based task-level parallel code which is optimized by NUMA-BTLP Ştirb, 2018 and NUMA-BTDM Ştirb, 2016 at compile-time, is running time and energy efficiently on NUMA systems. The results show that the energy is optimized with up to 5% at the same execution time for one of the tested real benchmarks and up to 15% for another benchmark running in infinite loop. The algorithms can be used on real-time control systems such as client/server based applications which require efficient access to shared resources. Most often, task parallelism is used in the implementation of the server and loop parallelism is used for the client.



Sign in / Sign up

Export Citation Format

Share Document