level parallelism
Recently Published Documents


TOTAL DOCUMENTS

581
(FIVE YEARS 81)

H-INDEX

29
(FIVE YEARS 4)

2022 ◽  
Vol 27 (3) ◽  
pp. 1-23
Author(s):  
Mari-Liis Oldja ◽  
Jangryul Kim ◽  
Dowhan Jeong ◽  
Soonhoi Ha

Although dataflow models are known to thrive at exploiting task-level parallelism of an application, it is difficult to exploit the parallelism of data, represented well with loop structures, since these structures are not explicitly specified in existing dataflow models. SDF/L model overcomes this shortcoming by specifying the loop structures explicitly in a hierarchical fashion. We introduce a scheduling technique of an application represented by the SDF/L model onto heterogeneous processors. In the proposed method, we explore the mapping of tasks using an evolutionary meta-heuristic and schedule hierarchically in a bottom-up fashion, creating parallel loop schedules at lower levels first and then re-using them when constructing the schedule at a higher level. The efficiency of the proposed scheduling methodology is verified with benchmark examples and randomly generated SDF/L graphs.


Author(s):  
Tiago Knorst ◽  
Julio Vicenzi ◽  
Michael G. Jordan ◽  
Jonathan H. de Almeida ◽  
Guilherme Korol ◽  
...  

2021 ◽  
Vol 18 (4) ◽  
pp. 1-26
Author(s):  
Aninda Manocha ◽  
Tyler Sorensen ◽  
Esin Tureci ◽  
Opeoluwa Matthews ◽  
Juan L. Aragón ◽  
...  

Graph structures are a natural representation of important and pervasive data. While graph applications have significant parallelism, their characteristic pointer indirect loads to neighbor data hinder scalability to large datasets on multicore systems. A scalable and efficient system must tolerate latency while leveraging data parallelism across millions of vertices. Modern Out-of-Order (OoO) cores inherently tolerate a fraction of long latencies, but become clogged when running severely memory-bound applications. Combined with large power/area footprints, this limits their parallel scaling potential and, consequently, the gains that existing software frameworks can achieve. Conversely, accelerator and memory hierarchy designs provide performant hardware specializations, but cannot support diverse application demands. To address these shortcomings, we present GraphAttack, a hardware-software data supply approach that accelerates graph applications on in-order multicore architectures. GraphAttack proposes compiler passes to (1) identify idiomatic long-latency loads and (2) slice programs along these loads into data Producer/ Consumer threads to map onto pairs of parallel cores. Each pair shares a communication queue; the Producer asynchronously issues long-latency loads, whose results are buffered in the queue and used by the Consumer. This scheme drastically increases memory-level parallelism (MLP) to mitigate latency bottlenecks. In equal-area comparisons, GraphAttack outperforms OoO cores, do-all parallelism, prefetching, and prior decoupling approaches, achieving a 2.87× speedup and 8.61× gain in energy efficiency across a range of graph applications. These improvements scale; GraphAttack achieves a 3× speedup over 64 parallel cores. Lastly, it has pragmatic design principles; it enhances in-order architectures that are gaining increasing open-source support.


2021 ◽  
Vol 64 (12) ◽  
pp. 36-38
Author(s):  
Mark D. Hill ◽  
Vijay Janapa Reddi

Charging computer scientists to develop the science needed to best achieve the performance and cost goals of accelerator-level parallelism hardware and software.


2021 ◽  
Author(s):  
Vincent Dumont ◽  
Casey Garner ◽  
Anuradha Trivedi ◽  
Chelsea Jones ◽  
Vidya Ganapati ◽  
...  

2021 ◽  
Author(s):  
Stijn Eyerman ◽  
Wim Heirman ◽  
Sam Van Den Steen ◽  
Ibrahim Hur
Keyword(s):  

Author(s):  
Krishan Kumar ◽  
Renu

Multithreading is ability of a central processing unit (CPU) or a single core within a multi-core processor to execute multiple processes or threads concurrently, appropriately supported by operating system. This approach differs from multiprocessing, as with multithreading processes & threads have to share resources of a single or multiple cores: computing units, CPU caches, & translation lookaside buffer (TLB). Multiprocessing systems include multiple complete processing units, multithreading aims to increase utilization of a single core by using thread-level as well as instruction-level parallelism. Objective of research is increase efficiency of scheduling dependent task using enhanced multithreading. gang scheduling of parallel implicit-deadline periodic task systems upon identical multiprocessor platforms is considered. In this scheduling problem, parallel tasks use several processors simultaneously. first algorithm is based on linear programming & is first one to be proved optimal for considered gang scheduling problem. Furthermore, it runs in polynomial time for a fixed number m of processors & an efficient implementation is fully detailed. Second algorithm is an approximation algorithm based on a fixed-priority rule that is competitive under resource augmentation analysis in order to compute an optimal schedule pattern. Precisely, its speedup factor is bounded by (2?1/m). Both algorithms are also evaluated through intensive numerical experiments. In our research we have enhanced capability of Gang Scheduling by integration of multi core processor & Cache & make simulation of performance in MATLAB.


2021 ◽  
Vol 150 (4) ◽  
pp. A169-A169
Author(s):  
Andrew S. Wixom ◽  
Kyle Myers ◽  
Micah Shepherd

Sign in / Sign up

Export Citation Format

Share Document