thread level parallelism
Recently Published Documents


TOTAL DOCUMENTS

92
(FIVE YEARS 11)

H-INDEX

14
(FIVE YEARS 1)

2021 ◽  
Vol 23 (06) ◽  
pp. 840-849
Author(s):  
Nagendra Kumar Jamadagni ◽  
◽  
Aniruddh M ◽  
Dr. Govinda Raju M ◽  
Dr. Usha Rani K. R ◽  
...  

All modern-day computers and smartphones come with multi-core CPUs. The multicore architecture is generally heterogeneous in nature to maximize computational throughput. These multicore systems exploit thread-level parallelism to deliver higher performance, but they are limited by the requirement of good scheduling algorithms that maximize CPU utility and minimize wasted and idle cycles. With the rise in streaming services and multimedia capabilities of smartphones, it is necessary to have efficient heterogeneous cores which are capable of performing multimedia processing at a fast pace. It is also needed that they utilize efficient scheduling algorithms to achieve this task. This paper compares some heterogeneous multi-core scheduling algorithms available and determines which is the most optimal scheduling algorithm given various codecs.


2021 ◽  
Author(s):  
Anita Tino

As the multi-core computing era continues to progress, the need to increase single- thread performance, throughput, and seemingly adapt to thread-level parallelism (TLP) remain important issues. Though the number of cores on each processor continues to increase, expected performance gains have lagged. Accordingly, com- puting systems often include Simultaneously Multi-Threaded (SMT) processors as a compromise between sequential and parallel performance on a single core. These processors effectively improve the throughput and utilization of a core, however often at the expense of single-thread performance as threads per core scale. Accordingly, applications which require higher single-thread performance must often resort to single-thread core multi-processor systems which incur additional area overhead and power dissipation. In attempts to improve single- and multi-thread core efficiency, this work introduces the concept of a Configurable Simultaneously Single-Threaded (Multi-)Engine Processor (ConSSTEP). ConSSTEP is a nuanced approach to multi- threaded processors, achieving performance gains and energy efficiency by invoking low overhead reconfigurable properties with full software compatibility. Experimen- tal results demonstrate that ConSSTEP is able to increase single-thread Instruc- tions Per Cycle (IPC) up to 1.39x and 2.4x for 2-thread and 4-thread workloads, respectively, improving throughput and providing up to 2x energy efficiency when compared to a conventional SMT processor.


2021 ◽  
Author(s):  
Anita Tino

As the multi-core computing era continues to progress, the need to increase single- thread performance, throughput, and seemingly adapt to thread-level parallelism (TLP) remain important issues. Though the number of cores on each processor continues to increase, expected performance gains have lagged. Accordingly, com- puting systems often include Simultaneously Multi-Threaded (SMT) processors as a compromise between sequential and parallel performance on a single core. These processors effectively improve the throughput and utilization of a core, however often at the expense of single-thread performance as threads per core scale. Accordingly, applications which require higher single-thread performance must often resort to single-thread core multi-processor systems which incur additional area overhead and power dissipation. In attempts to improve single- and multi-thread core efficiency, this work introduces the concept of a Configurable Simultaneously Single-Threaded (Multi-)Engine Processor (ConSSTEP). ConSSTEP is a nuanced approach to multi- threaded processors, achieving performance gains and energy efficiency by invoking low overhead reconfigurable properties with full software compatibility. Experimen- tal results demonstrate that ConSSTEP is able to increase single-thread Instruc- tions Per Cycle (IPC) up to 1.39x and 2.4x for 2-thread and 4-thread workloads, respectively, improving throughput and providing up to 2x energy efficiency when compared to a conventional SMT processor.


Author(s):  
Jeckson Dellagostin Souza ◽  
Madhavan Manivannan ◽  
Miquel Pericas ◽  
Antonio Carlos Schneider Beck

2020 ◽  
Author(s):  
Gustavo Berned ◽  
Arthur Lorenzon

A exploração do paralelismo em nível de threads (TLP - Thread Level Parallelism) tem sido amplamente utilizada para melhorar o desempenho de aplicações de diferentes domínios. Entretanto, muitas aplicações não escalam conforme o número de threads aumenta, ou seja, executar uma aplicação utilizando o máximo de threads não trará, necessariamente, o melhor resultado para tempo, energia ou EDP(Energy Delay Product), devido a questões relacionadas à hardware e Software [Raasch and Reinhardt 2003],[Lorenzon and Filho 2019]. Portanto, é preciso utilizar metodologias que consigam buscar um número ideal de threads para tais aplicações, sejam estas, online (busca enquanto a aplicação é executada) ou offline (busca antes da execução da aplicação). Entretanto, metodologias online acabam adicionando uma sobrecarga na execução da aplicação, o que não acontece nas abordagens offline [Lorenzon et al. 2018]. Com base nisto, este trabalho apresenta uma metodologia genérica para reduzir significativamente o tempo de busca pelo número de threads ideal para aplicações paralelas que utilizam a metodologia offline, inferindo o ambiente de execução das aplicações paralelas utilizando apenas pequenos conjuntos de entrada de dados.


2020 ◽  
Vol 21 (1) ◽  
pp. 47-56
Author(s):  
K Indragandhi ◽  
Jawahar P K

The recent advent of the embedded devices is equipped with multicore processor as it significantly improves the system performance. In order to utilize all the core in multicore processor in an efficient manner, application programs need to be parallelized. An efficient thread level parallelism (ETLP) scheme is proposed in this paper and uses computationally intensive edge detection algorithm for evaluation. Edge detection is the important process in various real time applications namely vehicle detection in traffic control, medical image processing etc. The main objective of ETLP scheme is to reduce the execution time and increase the CPU core utilization. The performance of ETLP scheme is evaluated with basic edge detection scheme (BEDS) for different image size. The experimental results reveal that the proposed ETLP scheme achieves efficiency of 49% and 72% for the image size 300 x 256 and 1024 x 1024 respectively. Furthermore an ETLP scheme reducing 66% execution time for image size 1024 x 1024 when compared with BEDS.


In computational biology, motifs are short, recurring patterns of biological sequences that possess the principal character for the analysis and interpretation of various biological issues like human disease, gene function, drug design, etc. The major objectives of the motif search problem are the management, analysis, and interpretation of huge biological sequences using computational techniques from computer science and mathematics. However, detection of the motif leads to computational problems whose solutions require a substantial amount of time in one uniprocessor machine and thus, remains as one challenging problem. In this chapter, two parallel algorithms are proposed, along with its implementation detail which crucially enhances the performance of the PMSP motif search algorithm. The first approach enhances the existing algorithm by eliminating the redundant process of the computation and also, minimizes the execution time by the use of both process-level and thread-level parallelism in the implementation. The second approach is the improvement over the first one, where not only the time of computation is reduced further but also the best space utilization is achieved..


2019 ◽  
Vol 27 (2) ◽  
pp. 55-66

Parallelization of Non-Serial Polyadic Dynamic Programming (NPDP) on high-throughput manycore architectures, such as NVIDIA GPUs, suffers from load imbalance, i.e. non-optimal mapping between the sub-problems of NPDP and the processing elements of the GPU. NPDP exhibits non-uniformity in the number of subproblems as well as computational complexity across the phases. In NPDP parallelization, phases are computed sequentially whereas subproblems of each phase are computed concurrently. Therefore, it is essential to effectively map the subproblems of each phase to the processing elements while implementing thread level parallelism. We propose an adaptive Generalized Mapping Method (GMM) for NPDP parallelization that utilizes the GPU for efficient mapping of subproblems onto processing threads in each phase. Input-size and targeted GPU decide the computing power and the best mapping for each phase in NPDP parallelization. The performance of GMM is compared with different conventional parallelization approaches. For sufficiently large inputs, our technique outperforms the state-of-the-art conventional parallelization approach and achieves a significant speedup of a factor 30. We also summarize the general heuristics for achieving better gain in the NPDP parallelization.


Sign in / Sign up

Export Citation Format

Share Document