PROVABLY CONSISTENT DISTRIBUTED DELAUNAY TRIANGULATION

Abstract. This paper deals with the distributed computation of Delaunay triangulations of massive point sets, mainly motivated by the needs of a scalable out-of-core surface reconstruction workflow from massive urban LIDAR datasets. Such a data often corresponds to a huge point cloud represented through a set of tiles of relatively homogeneous point sizes. This will be the input of our algorithm which will naturally partition this data across multiple processing elements. The distributed computation and communication between processing elements is orchestrated efficiently through an uncentralized model to represent, manage and locally construct the triangulation corresponding to each tile. Initially inspired by the star splaying approach, we review the Tile& Merge algorithm for computing Distributed Delaunay Triangulations on the cloud, provide a theoretical proof of correctness of this algorithm, and analyse the performance of our Spark implementation in terms of speedup and strong scaling in both synthetic and real use case datasets. A HPC implementation (e.g. using MPI), left for future work, would benefit from its more efficient message passing paradigm but lose the robustness and failure resilience of our Spark approach.

Download Full-text

Reducing communication in algebraic multigrid with multi-step node aware communication

The International Journal of High Performance Computing Applications ◽

10.1177/1094342020925535 ◽

2020 ◽

Vol 34 (5) ◽

pp. 547-561

Author(s):

Amanda Bienz ◽

William D Gropp ◽

Luke N Olson

Keyword(s):

Message Passing ◽

Message Passing Interface ◽

Parallel Implementation ◽

Algebraic Multigrid ◽

Sparse Linear Systems ◽

Parallel Scalability ◽

Strong Scaling ◽

The Cost ◽

Communication Schedule ◽

Inter Process Communication

Algebraic multigrid (AMG) is often viewed as a scalable [Formula: see text] solver for sparse linear systems. Yet, AMG lacks parallel scalability due to increasingly large costs associated with communication, both in the initial construction of a multigrid hierarchy and in the iterative solve phase. This work introduces a parallel implementation of AMG that reduces the cost of communication, yielding improved parallel scalability. It is common in Message Passing Interface (MPI), particularly in the MPI-everywhere approach, to arrange inter-process communication, so that communication is transported regardless of the location of the send and receive processes. Performance tests show notable differences in the cost of intra- and internode communication, motivating a restructuring of communication. In this case, the communication schedule takes advantage of the less costly intra-node communication, reducing both the number and the size of internode messages. Node-centric communication extends to the range of components in both the setup and solve phase of AMG, yielding an increase in the weak and strong scaling of the entire method.

Download Full-text

Comparing Approaches for Evaluating Digital Interventions on the Shop Floor

Technologies ◽

10.3390/technologies6040116 ◽

2018 ◽

Vol 6 (4) ◽

pp. 116 ◽

Cited By ~ 1

Author(s):

Francisco Lacueva-Pérez ◽

Lea Hannola ◽

Jan Nierhoff ◽

Stelios Damalas ◽

Soumyajit Chatterjee ◽

...

Keyword(s):

Industry 4.0 ◽

Shop Floor ◽

Use Case ◽

Individual Level ◽

Ict System ◽

The One ◽

Comparison Of The Results ◽

Future Work ◽

Self Learning ◽

Smart Factories

The introduction of innovative digital tools for supporting manufacturing processes has far-reaching effects at an organizational and individual level due to the development of Industry 4.0. The FACTS4WORKERS project funded by H2020, i.e., Worker-Centric Workplaces in Smart Factories, aims to develop user-centered assistance systems in order to demonstrate their impact and applicability at the shop floor. To achieve this, understanding how to develop such tools is as important as assessing if advantages can be derived from the ICT system created. This study introduces the technology of a workplace solution linked to the industrial challenge of self-learning manufacturing workplaces. Subsequently, a two-step approach to evaluate the presented system is discussed, consisting of the one used in FACTS4WORKERS and the one used in the “Heuristics for Industry 4.0” project. Both approaches and the use case are introduced as a base for presenting the comparison of the results collected in this paper. The comparison of the results for the presented use case is extended with the results for the rest of the FACTS4WORKERS use cases and with future work in the framework.

Download Full-text

Distributed CCS

DAIMI Report Series ◽

10.7146/dpb.v20i356.6586 ◽

1991 ◽

Vol 20 (356) ◽

Author(s):

Padmanabhan Krishnan

Keyword(s):

Message Passing ◽

Operational Semantics ◽

Distributed Computation ◽

Virtual Node

In this paper we describe a technique to extend a process language such as CCS which does not model many aspects of distributed computation to one which does. The idea is to use a concept of location which represents a virtual node. Processes at different locations can evolve independently. Furthermore, communication between the processes at different locations occurs via explicit message passing. We extend CCS with locations and message passing primitives and present its operational semantics. We show that the equivalences induced by the new semantics and its properties are similar to the equivalences in CCS. We also show how the semantics of configuration and routing can be handled.

Download Full-text

Parallel Computing for Rolling Mill Process with a Numerical Treatment of the LQR Problem

Computer and Electronic Sciences: Theory and Applications ◽

10.17981/cesta.01.01.2020.02 ◽

2020 ◽

Vol 1 (1) ◽

pp. 11-30

Author(s):

John Anderson Gómez Múnera ◽

Alejandro Giraldo Quintero

Keyword(s):

Parallel Computing ◽

Rolling Mill ◽

Application Programming Interface ◽

Processing Elements ◽

Lqr Problem ◽

Application Programming ◽

Multiple Processing ◽

Mill Process ◽

Programming Interface

The considerable increase in computation of the optimal control problems has in many cases overflowed the computing capacity available to handle complex systems in real time. For this reason, alternatives such as parallel computing are studied in this article, where the problem is worked out by distributing the tasks among several processors in order to accelerate the computation and to analyze and investigate the reduction of the total time of calculation the incremental gradually the processors used in it. We explore the use of these methods with a case study represented in a rolling mill process, and in turn making use of the strategy of updating the Phase Finals values for the construction of the final penalty matrix for the solution of the differential Riccati Equation. In addition, the order of the problem studied is increasing gradually for compare the improvements achieved in the models with major dimension. Parallel computing alternatives are also studied through multiple processing elements within a single machine or in a cluster via OpenMP, which is an application programming interface (API) that allows the creation of shared memory programs.

Download Full-text

MANIPULATIONS OF OCTREES AND QUADTREES ON MULTIPROCESSORS

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001494000218 ◽

1994 ◽

Vol 08 (02) ◽

pp. 439-455

Author(s):

VIPIN CHAUDHARY ◽

K. KUMARI ◽

P. ARUNACHALAM ◽

J.K. AGGARWAL

Keyword(s):

Load Balancing ◽

Shared Memory ◽

Data Structures ◽

Hierarchical Data ◽

Memory Architecture ◽

Uniform Load ◽

New Approach ◽

Processing Elements ◽

Powerful Means ◽

Multiple Processing

Octrees offer a powerful means for representing and manipulating 3-D objects. This paper presents an implementation of octree manipulations using a new approach on a shared memory architecture. Octrees are hierarchical data structures used to model 3-D objects. The manipulation of these data structures involves performing independent computations on each node of the octree. Octrees are much easier to deal with than other forms of representations used to model 3-D objects especially where extensive manipulations are involved. When these operations are distributed among multiple processing elements (PEs) and executed simultaneously, a significant speedup may be achieved. Manipulations such as a complement, a union, an intersection and other operations such as finding the volume and centroid which this paper describes are implemented on the Sequent Balance multiprocessor. In this approach the PEs are allocated dynamically, resulting in a uniform load balancing among them. The experimental results presented illustrate the feasibility of the approach. Although this evaluation has been originally done for shared memory machines, it will provide insight for the evaluation of other architectures.

Download Full-text

Parallel and Scalable Precise Clustering for Homologous Protein Discovery

10.1101/751214 ◽

2019 ◽

Cited By ~ 1

Author(s):

Stuart Byma ◽

Akash Dhasade ◽

Adrian Altenhoff ◽

Christophe Dessimoz ◽

James R. Larus

Keyword(s):

Parallel Implementation ◽

Protein Sequences ◽

Distributed Computation ◽

Amino Acid Sequences ◽

Homologous Proteins ◽

Homologous Protein ◽

Strong Scaling ◽

Protein Discovery ◽

Similar Elements ◽

Similar Amino Acid

AbstractThis paper presents a new, parallel implementation of clustering and demonstrates its utility in greatly speeding up the process of identifying homologous proteins. Clustering is a technique to reduce the number of comparison needed to find similar pairs in a set of n elements such as protein sequences. Precise clustering ensures that each pair of similar elements appears together in at least one cluster, so that similarities can be identified by all-to-all comparison in each cluster rather than on the full set. This paper introduces ClusterMerge, a new algorithm for precise clustering that uses transitive relationships among the elements to enable parallel and scalable implementations of this approach.We apply ClusterMerge to the important problem of finding similar amino acid sequences in a collection of proteins. ClusterMerge identifies 99.8% of similar pairs found by a full O (n2) comparison, with only half as many operations. More importantly, ClusterMerge is highly amenable to parallel and distributed computation. Our implementation achieves a speedup of 604 × on 768 cores (1400 × faster than a comparable single-threaded clustering implementation), a strong scaling efficiency of 90%, and a weak scaling efficiency of nearly 100%.

Download Full-text

Checking Application Level Properties Using Assertion Synthesis

Volume 9: 15th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications ◽

10.1115/detc2019-97950 ◽

2019 ◽

Author(s):

Matthias Wenzl ◽

Peter Roessler ◽

Andreas Puhm

Keyword(s):

Embedded System ◽

Integrated Circuit ◽

High Speed ◽

Logic Synthesis ◽

Automatic Generation ◽

Use Case ◽

Property Specification ◽

Digital Hardware ◽

Circuit Description ◽

Future Work

Abstract This work presents a proof-of-concept of a new approach on automatic generation of digital hardware that is able to check application-level properties of an embedded system such as a faulty system behavior at runtime. The approach makes use of assertion-based verification setups that today are very common in the area of digital hardware design with, however, the sole focus on logic simulation. Thus, a PSL-to-VHDL compiler is introduced that generates VHDL (Very High Speed Integrated Circuit Description Language) code out of PSL (Property Specification Language) assertions which can be further processed by a traditional digital logic synthesis tool. That way, runtime checker units can be automatically generated with little effort because of the already existing assertion-based test benches. Furthermore, a model railway demonstrator is presented herein as an example for a safety-critical application to prove the proposed tool flow on a use case. Implementation results based on that use case are discussed. Finally, the paper concludes with a brief outlook on related future work of the authors.

Download Full-text

Parallelizing a serial code: open–source module, EZ Parallel 1.0, and geophysics examples

10.5194/gmd-2020-257 ◽

2020 ◽

Author(s):

Jason Louis Turner ◽

Samuel N. Stechmann

Keyword(s):

Parallel Computing ◽

Finite Difference ◽

Message Passing ◽

Message Passing Interface ◽

Three Dimensional ◽

Computer Architectures ◽

Strong Scaling ◽

Pseudo Spectral ◽

Fortran Programming Language ◽

Single Processor

Abstract. Parallel computing can offer substantial speedup of numerical simulations in comparison to serial computing, as parallel computing uses many processors simultaneously rather than a single processor. However, it typically also requires substantial time and effort to convert a serial code into a parallel code. Here, a new module is developed to reduce the time and effort required to parallelize a serial code. The tested version of the module is written in the Fortran programming language,while the framework could also be extended to other languages (C++, Python, Julia, etc.). The Message Passing Interface is used to allow for either shared-memory or distributed-memory computer architectures. The software is designed for solving partial differential equations on a rectangular two-dimensional or three-dimensional domain, using finite difference, finite volume, pseudo-spectral, or other similar numerical methods. Examples are provided for two idealized models of atmospheric and oceanic fluid dynamics: the two-level quasi-geostrophic equations, and the stochastic heat equation as a model for turbulent advection–diffusion of either water vapor and clouds or sea surface height variability. In tests of the parallelized code, the strong scaling efficiency for the finite difference code is seen to be roughly 80 % to 90 %, which is achieved by adding roughly only 10 new lines to the serial code. Therefore, EZ Parallel provides great benefits with minimal additional effort.

Download Full-text

XENet: Using a new graph convolution to accelerate the timeline for protein design on quantum computers

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009037 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009037

Author(s):

Jack B. Maguire ◽

Daniele Grattarola ◽

Vikram Khipple Mulligan ◽

Eugene Klyshko ◽

Hans Melo

Keyword(s):

Protein Design ◽

Message Passing ◽

Quantum Algorithms ◽

Protein Structures ◽

Solution Space ◽

Search Space ◽

Side Chain ◽

Quantum Computers ◽

Use Case ◽

Sequence Design

Graph representations are traditionally used to represent protein structures in sequence design protocols in which the protein backbone conformation is known. This infrequently extends to machine learning projects: existing graph convolution algorithms have shortcomings when representing protein environments. One reason for this is the lack of emphasis on edge attributes during massage-passing operations. Another reason is the traditionally shallow nature of graph neural network architectures. Here we introduce an improved message-passing operation that is better equipped to model local kinematics problems such as protein design. Our approach, XENet, pays special attention to both incoming and outgoing edge attributes. We compare XENet against existing graph convolutions in an attempt to decrease rotamer sample counts in Rosetta’s rotamer substitution protocol, used for protein side-chain optimization and sequence design. This use case is motivating because it both reduces the size of the search space for classical side-chain optimization algorithms, and allows larger protein design problems to be solved with quantum algorithms on near-term quantum computers with limited qubit counts. XENet outperformed competing models while also displaying a greater tolerance for deeper architectures. We found that XENet was able to decrease rotamer counts by 40% without loss in quality. This decreased the memory consumption for classical pre-computation of rotamer energies in our use case by more than a factor of 3, the qubit consumption for an existing sequence design quantum algorithm by 40%, and the size of the solution space by a factor of 165. Additionally, XENet displayed an ability to handle deeper architectures than competing convolutions.

Download Full-text

Point Cloud Video Streaming: Challenges and Solutions

10.36227/techrxiv.13138940.v2 ◽

2021 ◽

Author(s):

Zhi Liu ◽

Qiyue Li ◽

Xianfu Chen ◽

Celimuge Wu ◽

susumu ishihara ◽

...

Keyword(s):

Video Streaming ◽

Point Cloud ◽

Wireless Transmission ◽

Future Research ◽

Use Case ◽

Viewing Angle ◽

Video Technology ◽

Research Directions ◽

Future Research Directions ◽

Preliminary Study

<div>Volumetric video (or hologram video), the medium for representing natural content in VR/AR/MR, is presumably</div><div>the next generation of video technology and a typical use case for 5G and beyond wireless communications. To realize volumetric video applications, efficient volumetric video streaming is in critical demand. This article responds to the challenges of and propose solutions to wireless transmission systems of point cloud video, which is the most popular and favored way to represent volumetric media and significantly differs from the other types of videos. In particular, we first introduce point cloud video technology and its applications, and then discuss the challenges of</div><div>and solutions to point cloud video streaming, including encoding, tiling, viewing angle prediction, decoding, quality assessment and transmission optimization. Furthermore, we explain a prototype of MPEG DASH-based point cloud video streaming system as a preliminary study, along with more simulation results to verify its performance. Finally, we identify future research directions for providing high-quality point cloud video streaming.</div>

Download Full-text