New communication strategies in NEMO

One of the main bottlenecks for NEMO scalability is the time spent performing communications. Two complementary strategies are here proposed to reduce the communication frequency and the communication time: the MPI3 neighbourhood collective communications instead of multiple point to point exchanges and the increasing of the halo region size.NEMO performs Lateral Boundaries Conditions update by using four point to point MPI communications at north, south, east and west for each MPI domain. The model completes east-west exchange before performing north-south communications. The order of the exchanges allows us to preserve both 5-point and 9-point stencils. MPI3 neighbourhood collectives provide a way to have sub-communicators used to perform collective communications. Two different sub-communicators can be defined in order to support the two different stencils. A single MPI message is needed to be built for all neighbours instead of 4 different messages before calling the collective communication, while the received message is used to update the halo region, following the order of the neighbours in the sub-communicator.The new communication strategy has been tested on two computational kernels (i.e. one for 5-point stencil and one for 9-point stencil), selected among the main relevant routines from the computational point of view. Preliminary tests, performed on a domain size of 3000x2000x31 grid points on the Zeus Intel Xeon Gold 6154 machine, available at CMCC, show a gain in communication time for the 5-point stencil use case up to 31% on 2016 cores. The improvement is reduced when communications with processes on the diagonal are activated. However, a modest gain is still achieved, depending on the number of cores.On the other side, the analysis of some NEMO routines shows how the exchange of more than one row/column of halo would allow to move communications outside the routine, preserving data dependencies. A wider halo size reduces the frequency of message exchanges whilst increases the message size at each exchange. It allows us to adopt some optimisation strategies (i.e. loop fusion, tiling, etc.) to improve the data locality. Nevertheless, the use of a wider halo introduces itself some improvements for some kernels like for the MUSCL advection scheme which shows a gain of ~23% in the execution time comparing the original version and the new one with halo extended to 2 lines and the communication moved outside the computing region.The current work has been performed according to the NEMO development strategy plan, defined by the NEMO Consortium, which establish the priorities of the design strategies to reduce the bottlenecks to the scalability and the time to solution.&#160;AcknowledgmentsThis work is co-funded by the EU H2020 IS-ENES project Phase 3 (ISENES3) under Grant Agreement number 824084.

Download Full-text

On the solvability of routing multiple point-to-point paths in manhattan meshes

Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion ◽

10.1145/3377929.3398098 ◽

2020 ◽

Author(s):

Reitze Jansen ◽

Yannick Vinkesteijn ◽

Daan van den Berg

Keyword(s):

Multiple Point ◽

Point To Point

Download Full-text

Coupling of HV Distributions Systems through multiple point-to-point-DC-connections

15th IET International Conference on AC and DC Power Transmission (ACDC 2019) ◽

10.1049/cp.2019.0091 ◽

2019 ◽

Cited By ~ 2

Author(s):

S. Schlegel ◽

D. Westermann

Keyword(s):

Multiple Point ◽

Point To Point

Download Full-text

MeDLey: from point-to-point to collective communications

Proceedings Fourth International Conference/Exhibition on High Performance Computing in the Asia-Pacific Region ◽

10.1109/hpc.2000.846526 ◽

2000 ◽

Author(s):

T. Es-Sqalli ◽

E. Fleury ◽

E. Dillon ◽

J. Guyard

Keyword(s):

Collective Communications ◽

Point To Point

Download Full-text

Motion planning for a mobile manipulator to execute a multiple point-to-point task

Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems. IROS '96 ◽

10.1109/iros.1996.571044 ◽

2002 ◽

Author(s):

Jae-Kyung Lee ◽

Seung Ho Kim ◽

Hyung Suck Cho

Keyword(s):

Motion Planning ◽

Mobile Manipulator ◽

Multiple Point ◽

Point To Point

Download Full-text

Distributed Architecture for Unmanned Vehicle Services

Sensors ◽

10.3390/s21041477 ◽

2021 ◽

Vol 21 (4) ◽

pp. 1477

Author(s):

João Ramos ◽

Roberto Ribeiro ◽

David Safadinho ◽

João Barroso ◽

Carlos Rabadão ◽

...

Keyword(s):

Remote Control ◽

Mobile Networks ◽

Modular Design ◽

Unmanned Vehicles ◽

Local Network ◽

The Internet ◽

Multiple Point ◽

Distributed Architecture ◽

Point To Point ◽

Technical Effort

The demand for online services is increasing. Services that would require a long time to understand, use and master are becoming as transparent as possible to the users, that tend to focus only on the final goals. Combined with the advantages of the unmanned vehicles (UV), from the unmanned factor to the reduced size and costs, we found an opportunity to bring to users a wide variety of services supported by UV, through the Internet of Unmanned Vehicles (IoUV). Current solutions were analyzed and we discussed scalability and genericity as the principal concerns. Then, we proposed a solution that combines several services and UVs, available from anywhere at any time, from a cloud platform. The solution considers a cloud distributed architecture, composed by users, services, vehicles and a platform, interconnected through the Internet. Each vehicle provides to the platform an abstract and generic interface for the essential commands. Therefore, this modular design makes easier the creation of new services and the reuse of the different vehicles. To confirm the feasibility of the solution we implemented a prototype considering a cloud-hosted platform and the integration of custom-built small-sized cars, a custom-built quadcopter, and a commercial Vertical Take-Off and Landing (VTOL) aircraft. To validate the prototype and the vehicles’ remote control, we created several services accessible via a web browser and controlled through a computer keyboard. We tested the solution in a local network, remote networks and mobile networks (i.e., 3G and Long-Term Evolution (LTE)) and proved the benefits of decentralizing the communications into multiple point-to-point links for the remote control. Consequently, the solution can provide scalable UV-based services, with low technical effort, for anyone at anytime and anywhere.

Download Full-text

Iterative Learning Control for Multiple Point-to-Point Tracking Application

IEEE Transactions on Control Systems Technology ◽

10.1109/tcst.2010.2051670 ◽

2011 ◽

Vol 19 (3) ◽

pp. 590-600 ◽

Cited By ~ 60

Author(s):

Chris T. Freeman ◽

Zhonglun Cai ◽

Eric Rogers ◽

Paul L. Lewin

Keyword(s):

Iterative Learning Control ◽

Learning Control ◽

Iterative Learning ◽

Multiple Point ◽

Point Tracking ◽

Point To Point

Download Full-text

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019860184 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1240-1254 ◽

Cited By ~ 1

Author(s):

Alexandre Denis ◽

Julien Jaeger ◽

Emmanuel Jeannot ◽

Marc Pérache ◽

Hugo Taboada

Keyword(s):

The Other ◽

Trade Off ◽

Manycore Processors ◽

Narrow Part ◽

Collective Communications ◽

Computing Framework ◽

A Chain ◽

Point To Point ◽

Mpi Implementation ◽

The Cost

To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when running on cores dedicated to communications to get communication/computation overlap. We first show that even simple collective operations, such as those based on a chain topology, are not straightforward to make progress in background on a dedicated core. Then, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented both algorithms in the multiprocessor computing framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for both performance and overlap.

Download Full-text

Filamentous active matter: Band formation, bending, buckling, and defects

Science Advances ◽

10.1126/sciadv.aaw9975 ◽

2020 ◽

Vol 6 (30) ◽

pp. eaaw9975 ◽

Cited By ~ 1

Author(s):

Gerard A. Vliegenthart ◽

Arvind Ravichandran ◽

Marisol Ripoll ◽

Thorsten Auth ◽

Gerhard Gompper

Keyword(s):

Molecular Motors ◽

Persistence Length ◽

Topological Defects ◽

Domain Size ◽

Self Organization ◽

Design Strategies ◽

Band Formation ◽

Microscopic Interactions ◽

Large Length ◽

Microscopy Techniques

Motor proteins drive persistent motion and self-organization of cytoskeletal filaments. However, state-of-the-art microscopy techniques and continuum modeling approaches focus on large length and time scales. Here, we perform component-based computer simulations of polar filaments and molecular motors linking microscopic interactions and activity to self-organization and dynamics from the filament level up to the mesoscopic domain level. Dynamic filament cross-linking and sliding and excluded-volume interactions promote formation of bundles at small densities and of active polar nematics at high densities. A buckling-type instability sets the size of polar domains and the density of topological defects. We predict a universal scaling of the active diffusion coefficient and the domain size with activity, and its dependence on parameters like motor concentration and filament persistence length. Our results provide a microscopic understanding of cytoplasmic streaming in cells and help to develop design strategies for novel engineered active materials.

Download Full-text

Iterative Learning Control for multiple point-to-point tracking

Proceedings of the 48h IEEE Conference on Decision and Control (CDC) held jointly with 2009 28th Chinese Control Conference ◽

10.1109/cdc.2009.5399918 ◽

2009 ◽

Cited By ~ 3

Author(s):

Chris T Freeman ◽

Zhonglun Cai ◽

Paul L Lewin ◽

Eric Rogers

Keyword(s):

Iterative Learning Control ◽

Learning Control ◽

Iterative Learning ◽

Multiple Point ◽

Point Tracking ◽

Point To Point

Download Full-text

Probability waves: pattern-based p-value correction in mass univariate analysis between two event-related potential waves

10.1101/2019.12.12.873570 ◽

2019 ◽

Author(s):

Dimitri Marques Abramov

Keyword(s):

Type I Error ◽

Univariate Analysis ◽

Event Related Potential ◽

New Method ◽

Multiple Point ◽

P Value ◽

Type I ◽

P Values ◽

Two Samples ◽

Point To Point

AbstractBackgroundMethods for p-value correction are criticized for either increasing Type II error or improperly reducing Type I error. This problem is worse when dealing with hundreds or thousands of paired comparisons between waves or images which are performed point-to-point. This text considers patterns in probability vectors resulting from multiple point-to-point comparisons between two ERP waves (mass univariate analysis) to correct p-values. These patterns (probability waves) mirror ERP waveshapes and might be indicators of consistency in statistical differences.New methodIn order to compute and analyze these patterns, we convoluted the decimal logarithm of the probability vector (p’) using a Gaussian vector with size compatible to the ERP periods observed. For verify consistency of this method, we also calculated mean amplitudes of late ERPs from Pz (P300 wave) and O1 electrodes in two samples, respectively of typical and ADHD subjects.Resultsthe present method reduces the range of p’-values that did not show covariance with neighbors (that is, that are likely random differences, type I errors), while preserving the amplitude of probability waves, in accordance to difference between respective mean amplitudes.Comparison with existing methodsthe positive-FDR resulted in a different profile of corrected p-values, which is not consistent with expected results or differences between mean amplitudes of the analyzed ERPs.Conclusionthe present new method seems to be biological and statistically more suitable to correct p-values in mass univariate analysis of ERP waves.

Download Full-text