parallel applications Latest Research Papers

ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3484199 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-25

Author(s):

Muhammad Aditya Sasongko ◽

Milind Chabbi ◽

Mandana Bagheri Marzijarani ◽

Didem Unat

Keyword(s):

Performance Monitoring ◽

State Of The Art ◽

Data Locality ◽

Parallel Applications ◽

Use Case ◽

Memory Location ◽

Reuse Distance ◽

Shared Caches ◽

Code Refactoring ◽

Cache Line

One widely used metric that measures data locality is reuse distance —the number of unique memory locations that are accessed between two consecutive accesses to a particular memory location. State-of-the-art techniques that measure reuse distance in parallel applications rely on simulators or binary instrumentation tools that incur large performance and memory overheads. Moreover, the existing sampling-based tools are limited to measuring reuse distances of a single thread and discard interactions among threads in multi-threaded programs. In this work, we propose ReuseTracker —a fast and accurate reuse distance analyzer that leverages existing hardware features in commodity CPUs. ReuseTracker is designed for multi-threaded programs and takes cache-coherence effects into account. By utilizing hardware features like performance monitoring units and debug registers, ReuseTracker can accurately profile reuse distance in parallel applications with much lower overheads than existing tools. It introduces only 2.9× runtime and 2.8× memory overheads. Our tool achieves 92% accuracy when verified against a newly developed configurable benchmark that can generate a variety of different reuse distance patterns. We demonstrate the tool’s functionality with two use-case scenarios using PARSEC, Rodinia, and Synchrobench benchmark suites where ReuseTracker guides code refactoring in these benchmarks by detecting spatial reuses in shared caches that are also false sharing and successfully predicts whether some benchmarks in these suites can benefit from adjacent cache line prefetch optimization.

Download Full-text

Temperature Dependence of The Thermo-Optic Coefficient In 4H-SiC and GaN Slabs At The Wavelength of 1550 nm

10.21203/rs.3.rs-1213409/v1 ◽

2022 ◽

Author(s):

Sandro Rao ◽

Elisa Demetra Mallemace ◽

Giuseppe Cocorullo ◽

Giuliana Faggio ◽

Giacomo Messina ◽

...

Keyword(s):

Building Materials ◽

Single Crystal Silicon ◽

Optical Power ◽

Parallel Applications ◽

Optical Parameters ◽

Free Space Optical ◽

Crystal Silicon ◽

High Temperature Electronics ◽

Fabry Perot ◽

Near Future

Abstract The refractive index and its variation with temperature, i.e. the thermo-optic coefficient, are basic optical parameters for all those semiconductors that are used in the fabrication of linear and non-linear opto-electronic devices and systems. Recently, 4H single-crystal Silicon Carbide (4H-SiC) and Gallium Nitride (GaN) have emerged as excellent building materials for high power and high temperature electronics, and wide parallel applications in photonics can be consequently forecasted in the near future, in particular in the infrared telecommunication band of λ=1500-1600 nm.In this paper, the thermo-optic coefficient (dn/dT) is experimentally measured in 4H-SiC and GaN substrates, from room temperature to 480 K, at the wavelength of 1550 nm. Specifically, the substrates, forming natural Fabry-Perot etalons, are exploited within a simple hybrid fiber–free space optical interferometric system to take accurate measurements of the transmitted optical power in the said temperature range. It is found that, for both semiconductors, dn/dT is itself remarkably temperature dependent, in particular quadratically for GaN and almost linearly for 4H-SiC.

Download Full-text

Mitigating execution unit contention in parallel applications using instruction‐aware mapping

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6819 ◽

2021 ◽

Author(s):

Matheus S. Serpa ◽

Eduardo H. M. Cruz ◽

Matthias Diener ◽

Arthur F. Lorenzon ◽

Antonio C. S. Beck ◽

...

Keyword(s):

Parallel Applications ◽

Execution Unit

Download Full-text

User-defined Tools for Characterizing Task-Parallel Applications and Predicting Load Imbalance

10.1109/acomp53746.2021.00020 ◽

2021 ◽

Author(s):

Minh Thanh Chung ◽

Dieter Kranzlmuller

Keyword(s):

Parallel Applications ◽

Load Imbalance ◽

Task Parallel

Download Full-text

HEA-PAS: A hybrid energy allocation strategy for parallel applications scheduling on heterogeneous computing systems

Journal of Systems Architecture ◽

10.1016/j.sysarc.2021.102329 ◽

2021 ◽

pp. 102329

Author(s):

Jiwu Peng ◽

Kenli Li ◽

Jianguo Chen ◽

Keqin Li

Keyword(s):

Heterogeneous Computing ◽

Energy Allocation ◽

Parallel Applications ◽

Allocation Strategy ◽

Computing Systems ◽

Hybrid Energy ◽

Heterogeneous Computing Systems

Download Full-text

Performance Data Visualization of Linux Events on Multicores

10.5753/wscad.2021.18516 ◽

2021 ◽

Author(s):

Claudio Scheer ◽

Renato B. Hoffmann ◽

Dalvan Griebler ◽

Isabel H. Manssour ◽

Luiz G. Fernandes

Keyword(s):

Data Visualization ◽

Large Volume ◽

Real World ◽

Interactive Visualization ◽

Performance Data ◽

Optimization Process ◽

Parallel Applications ◽

Storage Space ◽

Tool Chain ◽

Real World Application

Profiling tools are essential to understand the behavior of parallel applications and assist in the optimization process. However, tools such as Perf generate a large amount of data. This way, they require significant storage space, which also complicates reasoning about this large volume of data. Therefore, we propose VisPerf: a tool-chain and an interactive visualization dashboard for Perf data. The VisPerf tool-chain profiles the application and pre-processes the data, reducing the storage space required by about 50 times. Moreover, we used the visualization dashboard to quickly understand the performance of different events and visualize specific threads and functions of a real-world application.

Download Full-text

Bounding the execution time of parallel applications on unrelated multiprocessors

Real-Time Systems ◽

10.1007/s11241-021-09375-2 ◽

2021 ◽

Author(s):

Petros Voudouris ◽

Per Stenström ◽

Risat Pathan

Keyword(s):

High Performance ◽

Parallel Applications ◽

Worst Case ◽

Energy Expenditures ◽

Application Model ◽

Main Challenge ◽

Scheduling Method ◽

Benchmark Suite ◽

Hard Real Time ◽

Time Systems

AbstractHeterogeneous multiprocessors can offer high performance at low energy expenditures. However, to be able to use them in hard real-time systems, timing guarantees need to be provided, and the main challenge is to determine the worst-case schedule length (also known as makespan) of an application. Previous works that estimate the makespan focus mainly on the independent-task application model or the related multiprocessor model that limits the applicability of the makespan. On the other hand, the directed acyclic graph (DAG) application model and the unrelated multiprocessor model are general and can cover most of today’s platforms and applications. In this work, we propose a simple work-conserving scheduling method of the tasks in a DAG and two new approaches to finding the makespan. A set of representative OpenMP task-based parallel applications from the BOTS benchmark suite and synthetic DAGs are used to evaluate the proposed method. Based on the empirical results, the proposed approach calculates the makespan close to the exhaustive method and with low pessimism compared to a lower bound of the actual makespan calculation.

Download Full-text

Specifying and testing GPU workgroup progress models

Proceedings of the ACM on Programming Languages ◽

10.1145/3485508 ◽

2021 ◽

Vol 5 (OOPSLA) ◽

pp. 1-30

Author(s):

Tyler Sorensen ◽

Lucas F. Salvador ◽

Harmit Raval ◽

Hugues Evrard ◽

John Wickerson ◽

...

Keyword(s):

Parallel Applications ◽

Gpu Programming ◽

Concurrent Programs ◽

Large Set ◽

Prior Work ◽

Fine Grained ◽

Experimental Campaign ◽

Correct Execution ◽

Programming Support ◽

Litmus Test

As GPU availability has increased and programming support has matured, a wider variety of applications are being ported to these platforms. Many parallel applications contain fine-grained synchronization idioms; as such, their correct execution depends on a degree of relative forward progress between threads (or thread groups). Unfortunately, many GPU programming specifications (e.g. Vulkan and Metal) say almost nothing about relative forward progress guarantees between workgroups. Although prior work has proposed a spectrum of plausible progress models for GPUs, cross-vendor specifications have yet to commit to any model. This work is a collection of tools and experimental data to aid specification designers when considering forward progress guarantees in programming frameworks. As a foundation, we formalize a small parallel programming language that captures the essence of fine-grained synchronization. We then provide a means of formally specifying a progress model, and develop a termination oracle that decides whether a given program is guaranteed to eventually terminate with respect to a given progress model. Next, we formalize a set of constraints that describe concurrent programs that require forward progress to terminate. This allows us to synthesize a large set of 483 progress litmus tests. Combined with the termination oracle, we can determine the expected status of each litmus test -- i.e. whether it is guaranteed to eventually terminate -- under various progress models. We present a large experimental campaign running the litmus tests across 8 GPUs from 5 different vendors. Our results highlight that GPUs have significantly different termination behaviors under our test suite. Most notably, we find that Apple and ARM GPUs do not support the linear occupancy-bound model, as was hypothesized by prior work.

Download Full-text

Smart resource allocation of concurrent execution of parallel applications

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6600 ◽

2021 ◽

Author(s):

Vinicius S. da Silva ◽

Angelo G. D. Nogueira ◽

Everton Camargo Lima ◽

Hiago M. G. A. Rocha ◽

Matheus S. Serpa ◽

...

Keyword(s):

Resource Allocation ◽

Parallel Applications ◽

Concurrent Execution

Download Full-text

Cloak-Reduce Load Balancing Strategy for Mapreduce

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2021.13403 ◽

2021 ◽

Vol 13 (4) ◽

Author(s):

Mamadou Diarra ◽

Telesphore Tiendrebeogo

Keyword(s):

Load Balancing ◽

Response Time ◽

Large Scale ◽

Distributed Processing ◽

Parallel Applications ◽

Design Load ◽

Load Regulation ◽

New Processing ◽

Processing And Storage ◽

And Storage

The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.

Download Full-text

parallel applications
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer

Temperature Dependence of The Thermo-Optic Coefficient In 4H-SiC and GaN Slabs At The Wavelength of 1550 nm

Mitigating execution unit contention in parallel applications using instruction‐aware mapping

User-defined Tools for Characterizing Task-Parallel Applications and Predicting Load Imbalance

HEA-PAS: A hybrid energy allocation strategy for parallel applications scheduling on heterogeneous computing systems

Performance Data Visualization of Linux Events on Multicores

Bounding the execution time of parallel applications on unrelated multiprocessors

Specifying and testing GPU workgroup progress models

Smart resource allocation of concurrent execution of parallel applications

Cloak-Reduce Load Balancing Strategy for Mapreduce

Export Citation Format

parallel applicationsRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

ReuseTracker : Fast Yet Accurate Multicore Reuse Distance Analyzer

Temperature Dependence of The Thermo-Optic Coefficient In 4H-SiC and GaN Slabs At The Wavelength of 1550 nm

Mitigating execution unit contention in parallel applications using instruction‐aware mapping

User-defined Tools for Characterizing Task-Parallel Applications and Predicting Load Imbalance

HEA-PAS: A hybrid energy allocation strategy for parallel applications scheduling on heterogeneous computing systems

Performance Data Visualization of Linux Events on Multicores

Bounding the execution time of parallel applications on unrelated multiprocessors

Specifying and testing GPU workgroup progress models

Smart resource allocation of concurrent execution of parallel applications

Cloak-Reduce Load Balancing Strategy for Mapreduce

parallel applications
Recently Published Documents