Impact study of data locality on task-based applications through the Heteroprio scheduler

10.7287/peerj.preprints.27616v1 ◽

2019 ◽

Author(s):

Bérenger Bramas

Keyword(s):

Heterogeneous Computing ◽

Fast Multipole Method ◽

Data Locality ◽

Processing Unit ◽

Memory Transfer ◽

Task Distribution ◽

Significant Performance ◽

The Right ◽

Dynamic Scheduler ◽

Simple Heuristics

The task-based approach has gained much attention to use modern heterogeneous computing nodes. It allows parallelizing with an abstraction of the hardware by delegating task distribution and load balancing to a dynamic scheduler. In this organization, the scheduler is the most critical component that solves the DAG-scheduling problem in order to select the right processing unit for the computation of each task. In this work, we extend our Heteroprio scheduler that was originally created to execute the fast multipole method on multi-GPUs nodes. We improve Heteroprio by taking into account data locality during task assignation. The main principle is to use different task-lists for the different memory nodes and to investigate how locality affinity between the tasks and the different memory nodes can be evaluated without looking at the tasks' dependencies. The interest of the present method was evaluated on two linear algebra applications and a stencil code. It was deduced that simple heuristics can provide significant performance improvement and cut by more than half the total memory transfer of an execution.

Download Full-text

Impact study of data locality on task-based applications through the Heteroprio scheduler

PeerJ Computer Science ◽

10.7717/peerj-cs.190 ◽

2019 ◽

Vol 5 ◽

pp. e190 ◽

Cited By ~ 3

Author(s):

Bérenger Bramas

Keyword(s):

Heterogeneous Computing ◽

Fast Multipole Method ◽

Data Locality ◽

Parallel Applications ◽

Processing Unit ◽

Task Distribution ◽

Significant Performance ◽

The Right ◽

Dynamic Scheduler ◽

Simple Heuristics

The task-based approach has emerged as a viable way to effectively use modern heterogeneous computing nodes. It allows the development of parallel applications with an abstraction of the hardware by delegating task distribution and load balancing to a dynamic scheduler. In this organization, the scheduler is the most critical component that solves the DAG scheduling problem in order to select the right processing unit for the computation of each task. In this work, we extend our Heteroprio scheduler that was originally created to execute the fast multipole method on multi-GPUs nodes. We improve Heteroprio by taking into account data locality during task distribution. The main principle is to use different task-lists for the different memory nodes and to investigate how locality affinity between the tasks and the different memory nodes can be evaluated without looking at the tasks’ dependencies. We evaluate the benefit of our method on two linear algebra applications and a stencil code. We show that simple heuristics can provide significant performance improvement and cut by more than half the total memory transfer of an execution.

Download Full-text

Comparison of Two Methods to Estimate the Maximal Velocity of a Ball during an Overhand Throw

Proceedings ◽

10.3390/proceedings2020049043 ◽

2020 ◽

Vol 49 (1) ◽

pp. 43

Author(s):

Alanna Weisberg ◽

Julie Le Gall ◽

Pro Stergiou ◽

Larry Katz

Keyword(s):

Motion Analysis ◽

Doppler Radar ◽

Performance Indicator ◽

Maximal Velocity ◽

3D Motion ◽

3D Motion Analysis ◽

Ball Velocity ◽

Significant Performance ◽

The Right ◽

The Impact

Maximal ball velocity is a significant performance indicator in many sports, such as baseball. Doppler radar guns are widely assumed to underestimate velocity. Accuracy increases as the cosine angle between the radar gun and the object decreases. The purpose of this study was to investigate the impact of player handedness and the location of the radar gun on the accuracy of ball velocity. Throws were analyzed in four conditions: the radar gun on the right side, throwing with the right arm, then with the left arm; and the radar gun on the left side, throwing with the right arm, then with the left arm. The Cronbach’s alpha for all four conditions showed α-values above 0.97; however, a paired t-test indicated significant differences between the 3D motion analysis and the radar gun. Bland–Altman plots show a high degree of scatter in all conditions. Results suggest that the radar gun measurements can be highly inconsistent when compared to 3D motion analysis.

Download Full-text

A massively scalable distributed multigrid framework for nonlinear marine hydrodynamics

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019826662 ◽

2019 ◽

Vol 33 (5) ◽

pp. 855-868 ◽

Cited By ~ 1

Author(s):

Stefan Lemvig Glimberg ◽

Allan Peter Engsig-Karup ◽

Luke N Olson

Keyword(s):

Potential Flow ◽

Large Scale ◽

Data Locality ◽

High Order ◽

Domain Model ◽

Processing Unit ◽

Parallel Scalability ◽

Marine Hydrodynamics ◽

Practical Applications ◽

Oak Ridge

The focus of this article is on the parallel scalability of a distributed multigrid framework, known as the DTU Compute GPUlab Library, for execution on graphics processing unit (GPU)-accelerated supercomputers. We demonstrate near-ideal weak scalability for a high-order fully nonlinear potential flow (FNPF) time domain model on the Oak Ridge Titan supercomputer, which is equipped with a large number of many-core CPU-GPU nodes. The high-order finite difference scheme for the solver is implemented to expose data locality and scalability, and the linear Laplace solver is based on an iterative multilevel preconditioned defect correction method designed for high-throughput processing and massive parallelism. In this work, the FNPF discretization is based on a multi-block discretization that allows for large-scale simulations. In this setup, each grid block is based on a logically structured mesh with support for curvilinear representation of horizontal block boundaries to allow for an accurate representation of geometric features such as surface-piercing bottom-mounted structures—for example, mono-pile foundations as demonstrated. Unprecedented performance and scalability results are presented for a system of equations that is historically known as being too expensive to solve in practical applications. A novel feature of the potential flow model is demonstrated, being that a modest number of multigrid restrictions is sufficient for fast convergence, improving overall parallel scalability as the coarse grid problem diminishes. In the numerical benchmarks presented, we demonstrate using 8192 modern Nvidia GPUs enabling large-scale and high-resolution nonlinear marine hydrodynamics applications.

Download Full-text

Homology sequence analysis using GPU acceleration

10.32469/10355/66808 ◽

2018 ◽

Author(s):

◽

Huan Truong

Keyword(s):

Parallel Computing ◽

Heterogeneous Computing ◽

Computational Models ◽

Operational Taxonomic Unit ◽

Classification Problem ◽

Processing Unit ◽

Sequence Motifs ◽

Biological Sequence ◽

Sequencing Data ◽

Central Processing

A number of problems in bioinformatics, systems biology and computational biology field require abstracting physical entities to mathematical or computational models. In such studies, the computational paradigms often involve algorithms that can be solved by the Central Processing Unit (CPU). Historically, those algorithms benefit from the advancements of computing power in the serial processing capabilities of individual CPU cores. However, the growth has slowed down over recent years, as scaling out CPU has been shown to be both cost-prohibitive and insecure. To overcome this problem, parallel computing approaches that employ the Graphics Processing Unit (GPU) have gained attention as complementing or replacing traditional CPU approaches. The premise of this research is to investigate the applicability of various parallel computing platforms to several problems in the detection and analysis of homology in biological sequence. I hypothesize that by exploiting the sheer amount of computation power and sequencing data, it is possible to deduce information from raw sequences without supplying the underlying prior knowledge to come up with an answer. I have developed such tools to perform analysis at scales that are traditionally unattainable with general-purpose CPU platforms. I have developed a method to accelerate sequence alignment on the GPU, and I used the method to investigate whether the Operational Taxonomic Unit (OTU) classification problem can be improved with such sheer amount of computational power. I have developed a method to accelerate pairwise k-mer comparison on the GPU, and I used the method to further develop PolyHomology, a framework to scaffold shared sequence motifs across large numbers of genomes to illuminate the structure of the regulatory network in yeasts. The results suggest that such approach to heterogeneous computing could help to answer questions in biology and is a viable path to new discoveries in the present and the future.

Download Full-text

Smart Irrigation System Using Internet of Things

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1388 ◽

2021 ◽

pp. 302-307

Author(s):

Prof. Vanita Babanne ◽

Amol Kajale ◽

Gaurav Menaria ◽

Manish Kamble ◽

Pranav Mundada

Keyword(s):

Data Storage ◽

Water Loss ◽

Crop Yields ◽

Irrigation System ◽

Irrigation Management ◽

Remote Access ◽

Soil Conditions ◽

Processing Unit ◽

Sensor Unit ◽

The Right

Irrigation forms one of the mainstays of agriculture and food production. As a result of outdated strategies in developing and developing countries, much water is wasted in this process. In this article, we have established a regulatory model of irrigation management to put a check on this waste of water by providing a good irrigation system for farming. The prototype Smart Automatic Irrigation Controller (SAIC) has two operating units, viz. Wireless Sensor Unit and Wireless Information Processing Unit . The purpose of the sensor unit is to measure climate and soil conditions and to calculate the actual water loss due to evapotranspiration. Processing unit considers this calculation and performs the regulatory action required to control workers by delivering the right amount of water to the farm. A combination of basic rules is included in the decision-making table. The model is initially developed and validated in the process of testing the effectiveness. The results obtained showed the potential to compensate for water loss by almost 100%. The regulator experienced a 27% reduction in water use and a 40% increase in crop yields. The prototype is connected to a cloud server for data storage and remote access to control. The device is efficient, inexpensive, and usable so that end users can use it easily and comfortably. The model is new and unique in the sense that it can plan irrigation for all types of crops, in all climatic conditions of all soil types while feeding the right combination of soil growth stage in the inference engine.

Download Full-text

Memory Access Behavior Analysis of NUMA-Based Shared Memory Programs

Scientific Programming ◽

10.1155/2002/790749 ◽

2002 ◽

Vol 10 (1) ◽

pp. 45-53 ◽

Cited By ~ 3

Author(s):

Jie Tao ◽

Wolfgang Karl ◽

Martin Schulz

Keyword(s):

Shared Memory ◽

Data Locality ◽

Memory Access ◽

Remote Memory ◽

Data Layout ◽

Performance Improvements ◽

Significant Performance ◽

Working Set ◽

Memory Accesses ◽

Memory Applications

Shared memory applications running transparently on top of NUMA architectures often face severe performance problems due to bad data locality and excessive remote memory accesses. Optimizations with respect to data locality are therefore necessary, but require a fundamental understanding of an application's memory access behavior. The information necessary for this cannot be obtained using simple code instrumentation due to the implicit nature of the communication handled by the NUMA hardware, the large amount of traffic produced at runtime, and the fine access granularity in shared memory codes. In this paper an approach to overcome these problems and thereby to enable an easy and efficient optimization process is presented. Based on a low-level hardware monitoring facility in coordination with a comprehensive visualization tool, it enables the generation of memory access histograms capable of showing all memory accesses across the complete address space of an application's working set. This information can be used to identify access hot spots, to understand the dynamic behavior of shared memory applications, and to optimize applications using an application specific data layout resulting in significant performance improvements.

Download Full-text

Graphics processing unit accelerated Fast Multipole Method - Fast Fourier Transform

2013 IEEE Antennas and Propagation Society International Symposium (APSURSI) ◽

10.1109/aps.2013.6711599 ◽

2013 ◽

Cited By ~ 1

Author(s):

Quang Nguyen ◽

Vinh Dang ◽

Ozlem Kilic

Keyword(s):

Fourier Transform ◽

Fast Fourier Transform ◽

Graphics Processing Unit ◽

Fast Multipole Method ◽

Processing Unit ◽

Fast Multipole ◽

Multipole Method ◽

Graphics Processing

Download Full-text

Big Data Workflows: Locality-Aware Orchestration Using Software Containers

Sensors ◽

10.3390/s21248212 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8212

Author(s):

Andrei-Alin Corodescu ◽

Nikolay Nikolov ◽

Akif Quddus Khan ◽

Ahmet Soylu ◽

Mihhail Matskin ◽

...

Keyword(s):

Big Data ◽

Data Processing ◽

Data Locality ◽

Computing Paradigm ◽

Limited Support ◽

Execution Speed ◽

Significant Performance ◽

Geographically Distributed ◽

Workflow Orchestration ◽

Data Centres

The emergence of the edge computing paradigm has shifted data processing from centralised infrastructures to heterogeneous and geographically distributed infrastructures. Therefore, data processing solutions must consider data locality to reduce the performance penalties from data transfers among remote data centres. Existing big data processing solutions provide limited support for handling data locality and are inefficient in processing small and frequent events specific to the edge environments. This article proposes a novel architecture and a proof-of-concept implementation for software container-centric big data workflow orchestration that puts data locality at the forefront. The proposed solution considers the available data locality information, leverages long-lived containers to execute workflow steps, and handles the interaction with different data sources through containers. We compare the proposed solution with Argo workflows and demonstrate a significant performance improvement in the execution speed for processing the same data units. Finally, we carry out experiments with the proposed solution under different configurations and analyze individual aspects affecting the performance of the overall solution.

Download Full-text

The role of institutions in enhancing farmer motivation to carryout corn seed production under corporation system

IOP Conference Series Earth and Environmental Science ◽

10.1088/1755-1315/911/1/012082 ◽

2021 ◽

Vol 911 (1) ◽

pp. 012082

Author(s):

Bahtiar ◽

Muhammad Aqil ◽

Muhammad Azrai

Keyword(s):

Seed Production ◽

Production System ◽

Seed Quality ◽

Processing Unit ◽

Technology Application ◽

Corn Seed ◽

Maize Seeds ◽

The Right ◽

Hybrid Maize

Abstract A farmercorporation-based seed production system is needed to bring seeds closer to farmers. Its developmen requires support from several institutions related to the policy of providing seeds for farmers. This study aims to determine the role of the institution in the farmer-based hybrid maize production system. Five institutions evaluated for their role, namely Indonesian Cereals Research Institute (ICERI) as a source of seeds and technology, Assessment for Agricultural Technology (AIAT) as aassistence for application technology in the field, Agricultural Service Office as a policymaker in available seed for farmer, Seed Certification and Inspection Center (SSO) as the supervisor of seed quality, and the grower local as an absorber of the results by farmer groups. The role of the institution is observed through: socialization activity, field observations, and discussions. The results showed that all institutions have gave various supports. ICERI has provided seeds in a timelymanner, the right quality and the right quantity, and also conducts regular training and monitoring to provide instructions for implementing good technology application to farmers and by farmers it is considered very good. AIAT provides field assistance for the application of production technology, but the activities are considered by farmers to be inadequate, the staff of Agriculture Service continue to motivate farmers and farmers are deemed adequate. SCIC as a seed supervisor continues to assist farmers in the field, in addition to monitoring irregularities, it also continues to provide guidance to farmers in accordance withthe standar operational procedure of hybrid maize seeds and is considered very good by farmers. Then the seed producers who absorb the results of the farmers have carried out excellent guidance such as placing quality control personnel in the field to control implementation, absorbing farmers’ products at an agreed price, building a processing unit in the site area, and by the farmers it is considered very good. With adequate support from related institutions, farmers continue to be motivated to produce hybrid maize seeds, which were originally only 100 ha in Minahasa district, then expanded to surrounding districts, sothat the area production increase from 253,4 ha in 2019 to 480 ha in 2020.

Download Full-text