thread mapping Latest Research Papers

Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems

Applied Sciences ◽

10.3390/app11146486 ◽

2021 ◽

Vol 11 (14) ◽

pp. 6486

Author(s):

Mei-Ling Chiang ◽

Wei-Lun Su

Keyword(s):

Load Balancing ◽

Critical Path ◽

Selection Procedure ◽

Remote Memory ◽

Data Mapping ◽

Linux Kernel ◽

Access Memory ◽

Selection Policy ◽

Benchmark Suite ◽

Thread Mapping

NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load between cores occurs, the kernel scheduler’s load balancing mechanism then migrates threads between cores or across NUMA nodes. Remote memory access is required for a thread to access memory on the previous node, which degrades performance. Threads to be migrated must be selected effectively and efficiently since the related operations run in the critical path of the kernel scheduler. This study focuses on improving inter-node load balancing for multithreaded applications. We propose a thread-aware selection policy that considers the distribution of threads on nodes for each thread group while migrating one thread for inter-node load balancing. The thread is selected for which its thread group has the least exclusive thread distribution, and thread members are distributed more evenly on nodes. This has less influence on data mapping and thread mapping for the thread group. We further devise several enhancements to eliminate superfluous evaluations for multithreaded processes, so the selection procedure is more efficient. The experimental results for the commonly used PARSEC 3.0 benchmark suite show that the modified Linux kernel with the proposed selection policy increases performance by 10.7% compared with the unmodified Linux kernel.

Accelerating Machine Learning Algorithms with TensorFlow Using Thread Mapping Policies

Communications in Computer and Information Science - High Performance Computing ◽

10.1007/978-3-030-68035-0_5 ◽

2021 ◽

pp. 62-70

Author(s):

Matheus W. Camargo ◽

Matheus S. Serpa ◽

Danilo Carastan-Santos ◽

Alexandre Carissimi ◽

Philippe O. A. Navaux

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Thread Mapping

Efficient GPU thread mapping on embedded 2D fractals

Future Generation Computer Systems ◽

10.1016/j.future.2020.07.006 ◽

2020 ◽

Vol 113 ◽

pp. 158-169

Author(s):

Cristobál A. Navarro ◽

Felipe A. Quezada ◽

Nancy Hitschfeld ◽

Raimundo Vega ◽

Benjamin Bustos

Keyword(s):

Thread Mapping

Online Sharing-Aware Thread Mapping in Software Transactional Memory

2020 IEEE 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) ◽

10.1109/sbac-pad49847.2020.00016 ◽

2020 ◽

Author(s):

Douglas Pereira Pasqualin ◽

Matthias Diener ◽

Andre Rauber Du Bois ◽

Mauricio Lima Pilla

Keyword(s):

Transactional Memory ◽

Software Transactional Memory ◽

Thread Mapping

Energy-Efficient Thread Mapping for Heterogeneous Many-Core Systems via Dynamically Adjusting the Thread Count

Energies ◽

10.3390/en12071346 ◽

2019 ◽

Vol 12 (7) ◽

pp. 1346 ◽

Cited By ~ 1

Author(s):

Tao Ju ◽

Yan Zhang ◽

Xuejun Zhang ◽

Xiaogang Du ◽

Xiaoshe Dong

Keyword(s):

Energy Efficiency ◽

Energy Consumption ◽

Prediction Model ◽

Phase Changes ◽

Core System ◽

Computing Performance ◽

Many Core ◽

Thread Mapping ◽

Intel Mic ◽

Thread Count

Improving computing performance and reducing energy consumption are a major concern in heterogeneous many-core systems. The thread count directly influences the computing performance and energy consumption for a multithread application running on a heterogeneous many-core system. For this work, we studied the interrelation between the thread count and the performance of applications to improve total energy efficiency. A prediction model of the optimum thread count, hereafter the thread count prediction model (TCPM), was designed by using regression analysis based on the program running behaviors and heterogeneous many-core architecture feature. Subsequently, a dynamic predictive thread mapping (DPTM) framework was proposed. DPTM uses the prediction model to estimate the optimum thread count and dynamically adjusts the number of active hardware threads according to the phase changes of the running program in order to achieve the optimal energy efficiency. Experimental results show that DPTM obtains a nearly 49% improvement in performance and a 59% reduction in energy consumption on average. Moreover, DPTM introduces about 2% additional overhead compared with traditional thread mapping for PARSEC(The Princeton Application Repository for Shared-Memory Computers) benchmark programs running on an Intel MIC (Many integrated core)heterogeneous many-core system.

The Impacts of Locality and Memory Congestion-aware Thread Mapping on Energy Consumption of Modern NUMA Systems

2019 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS) ◽

10.1109/coolchips.2019.8721346 ◽

2019 ◽

Author(s):

Mulya Agung ◽

Muhammad Alfian Amrizal ◽

Ryusuke Egawa ◽

Hiroyuki Takizawa

Keyword(s):

Energy Consumption ◽

Thread Mapping ◽

Congestion Aware

Assessing Parallel Thread Mapping Approaches on Shared Memory SMT Architectures

IEEE Latin America Transactions ◽

10.1109/tla.2019.8863173 ◽

2019 ◽

Vol 17 (02) ◽

pp. 270-279

Author(s):

Amanda Maria Pinho Amorim ◽

Henrique Cota de Freitas

Keyword(s):

Shared Memory ◽

Thread Mapping

RaceR: A Thread Mapping Algorithm for Race Reduction in Multi-Level Shared Caches

2019 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP) ◽

10.1109/empdp.2019.8671576 ◽

2019 ◽

Author(s):

Pezhman Shojaa Sahneh ◽

Amin Sarihi ◽

Benjamin Warburton ◽

Ahmad Patooghy

Keyword(s):

Mapping Algorithm ◽

Shared Caches ◽

Multi Level ◽

Thread Mapping

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Computers ◽

10.3390/computers7040066 ◽

2018 ◽

Vol 7 (4) ◽

pp. 66

Author(s):

Iulia Știrb

Keyword(s):

Compiler Optimization ◽

Memory Access ◽

Task Parallelism ◽

Real Time Control ◽

Shared Resources ◽

Mapping Algorithm ◽

Time Control ◽

Parallel Code ◽

Thread Mapping ◽

Task Level

The paper presents a Non-Uniform Memory Access (NUMA)-aware compiler optimization for task-level parallel code. The optimization is based on Non-Uniform Memory Access—Balanced Task and Loop Parallelism (NUMA-BTLP) algorithm Ştirb, 2018. The algorithm gets the type of each thread in the source code based on a static analysis of the code. After assigning a type to each thread, NUMA-BTLP Ştirb, 2018 calls NUMA-BTDM mapping algorithm Ştirb, 2016 which uses PThreads routine pthread_setaffinity_np to set the CPU affinities of the threads (i.e., thread-to-core associations) based on their type. The algorithms perform an improve thread mapping for NUMA systems by mapping threads that share data on the same core(s), allowing fast access to L1 cache data. The paper proves that PThreads based task-level parallel code which is optimized by NUMA-BTLP Ştirb, 2018 and NUMA-BTDM Ştirb, 2016 at compile-time, is running time and energy efficiently on NUMA systems. The results show that the energy is optimized with up to 5% at the same execution time for one of the tested real benchmarks and up to 15% for another benchmark running in infinite loop. The algorithms can be used on real-time control systems such as client/server based applications which require efficient access to shared resources. Most often, task parallelism is used in the implementation of the server and loop parallelism is used for the client.

Dynamic Thread Mapping for Maximizing Performance in Power-Efficient Multi-core Systems

2018 13th International Conference on Computer Engineering and Systems (ICCES) ◽

10.1109/icces.2018.8639212 ◽

2018 ◽

Author(s):

Veronia Iskandar ◽

Cherif Salama ◽

Mohamed Taher

Keyword(s):

Power Efficient ◽

Thread Mapping

thread mapping
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems

Accelerating Machine Learning Algorithms with TensorFlow Using Thread Mapping Policies

Efficient GPU thread mapping on embedded 2D fractals

Online Sharing-Aware Thread Mapping in Software Transactional Memory

Energy-Efficient Thread Mapping for Heterogeneous Many-Core Systems via Dynamically Adjusting the Thread Count

The Impacts of Locality and Memory Congestion-aware Thread Mapping on Energy Consumption of Modern NUMA Systems

Assessing Parallel Thread Mapping Approaches on Shared Memory SMT Architectures

RaceR: A Thread Mapping Algorithm for Race Reduction in Multi-Level Shared Caches

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Dynamic Thread Mapping for Maximizing Performance in Power-Efficient Multi-core Systems

Export Citation Format

thread mappingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems

Accelerating Machine Learning Algorithms with TensorFlow Using Thread Mapping Policies

Efficient GPU thread mapping on embedded 2D fractals

Online Sharing-Aware Thread Mapping in Software Transactional Memory

Energy-Efficient Thread Mapping for Heterogeneous Many-Core Systems via Dynamically Adjusting the Thread Count

The Impacts of Locality and Memory Congestion-aware Thread Mapping on Energy Consumption of Modern NUMA Systems

Assessing Parallel Thread Mapping Approaches on Shared Memory SMT Architectures

RaceR: A Thread Mapping Algorithm for Race Reduction in Multi-Level Shared Caches

Extending NUMA-BTLP Algorithm with Thread Mapping Based on a Communication Tree

Dynamic Thread Mapping for Maximizing Performance in Power-Efficient Multi-core Systems

thread mapping
Recently Published Documents