Resiliency in numerical algorithm design for extreme scale simulations

This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.

Download Full-text

PREVENTIVE MIGRATION VS. PREVENTIVE CHECKPOINTING FOR EXTREME SCALE SUPERCOMPUTERS

Parallel Processing Letters ◽

10.1142/s0129626411000126 ◽

2011 ◽

Vol 21 (02) ◽

pp. 111-132 ◽

Cited By ~ 15

Author(s):

FRANCK CAPPELLO ◽

HENRI CASANOVA ◽

YVES ROBERT

Keyword(s):

Fault Tolerance ◽

Large Scale ◽

Fault Tolerant ◽

Preventive Measure ◽

Mean Time Between Failures ◽

Future Technology ◽

Good Utilization ◽

Time Between Failures ◽

Failure Avoidance ◽

Extreme Scale

An alternative to classical fault-tolerant approaches for large-scale clusters is failure avoidance, by which the occurrence of a fault is predicted and a preventive measure is taken. We develop analytical performance models for two types of preventive measures: preventive checkpointing and preventive migration. We instantiate these models for platform scenarios representative of current and future technology trends. We find that preventive migration is the better approach in the short term by orders of magnitude. However, in the longer term, both approaches have comparable merit with a marginal advantage for preventive checkpointing. We also develop an analytical model of the performance for fault tolerance based on periodic checkpointing and compare this approach to both failure avoidance techniques. We find that this comparison is sensitive to the nature of the stochastic distribution of the time between failures, and that failure avoidance is likely inferior to fault tolerance in the long term. Regardless, our result show that each approach is likely to achieve poor utilization for large-scale platforms (e.g., 220 nodes) unless the mean time between failures is large. We show how bounding parallel job size improves utilization, but conclude that achieving good utilization in future large-scale platforms will require a combination of techniques.

Download Full-text

Towards large scale automated algorithm design by integrating modular benchmarking frameworks

Proceedings of the Genetic and Evolutionary Computation Conference Companion ◽

10.1145/3449726.3463155 ◽

2021 ◽

Author(s):

Amine Aziz-Alaoui ◽

Carola Doerr ◽

Johann Dreo

Keyword(s):

Large Scale ◽

Algorithm Design ◽

Automated Algorithm

Download Full-text

Working towards sustainable urban water management: the vulnerability blind spot

Water Science & Technology ◽

10.2166/wst.2011.774 ◽

2011 ◽

Vol 64 (12) ◽

pp. 2362-2369 ◽

Cited By ~ 14

Author(s):

L. Werbeloff ◽

R. Brown

Keyword(s):

Water Management ◽

Water Scarcity ◽

Large Scale ◽

Holistic Approach ◽

Blind Spot ◽

Response Strategy ◽

Urban Water ◽

Urban Water Management ◽

Sustainable Urban Water Management ◽

Diversity Strategy

The unprecedented water scarcity in Australia coincides with the adoption of a new urban water rhetoric. The ‘Security through Diversity’ strategy has been adopted in a number of Australian cities as a new and innovative approach to urban water management. Although this strategy offers a more holistic approach to urban water management, in practice, the Security through Diversity strategy is largely being interpreted and implemented in a way that maintains the historical dependence on large scale, centralised water infrastructure and therefore perpetuates existing urban water vulnerabilities. This research explores the implementation of Security through Diversity as the new water scarcity response strategy in the cities of Perth and Melbourne. Through a qualitative study with over sixty-five urban water practitioners, the results reveal that the practitioners have absorbed the new Security through Diversity language whilst maintaining the existing problem and solution framework for urban water management. This can be explained in terms of an entrenched technological path dependency and cognitive lock-in that is preventing practitioners from more comprehensively engaging with the complexities of the Security through Diversity strategy, which is ultimately perpetuating the existing vulnerability of our cities. This paper suggests that greater engagement with the underlying purpose of the security though diversity strategy is a necessary first step to overcome the constraints of the traditional technological paradigm and more effectively reduce the continued vulnerability of Australian cities.

Download Full-text

On (post) modern economy and (non) freedom

Economic Annals ◽

10.2298/eka0566193m ◽

2005 ◽

Vol 50 (166) ◽

pp. 193-217

Author(s):

Krstan Malesevic

Keyword(s):

Large Scale ◽

Human Life ◽

General Internal ◽

Modern Economy ◽

Summum Bonum ◽

Essential Questions ◽

Human Liberty ◽

Fundamental Value ◽

The Impact ◽

Modern Technologies

The (post)modern economy finds itself undoubtedly in the center of a large scale, radical contradictory, and uncertain current transformation of the world. Together with the (post)modern technologies it composes the dominant core of the globalizing processes, often referred to as globalization. The key features and especially the accumulated consequences of these processes pose a challenge for scientific and theoretical thought in the form of essential questions and dilemmas which are in the last instance tied to the impact of globalization on the quality and meaning of human life. This problem relates as much to individuals as it does to different social groups and human communities, that is to the entire humanity as such. This paper attempts to problematise these contradictory relationships between global corporative, economy as an instrumental value and the human liberty as a substantive i.e. the highest, value in itself (summum bonum), which gives meaning and dignity to human life. Therefore if economy in one form or another covers most of human practical activity then it is certain that it can have decisive impact on the most fundamental value of human life, that is the value of freedom (individual, general, internal and external). Of course the impact of economy can act either way - as an encouragement or, as it often happens, as a deterrent to expansion of the human freedom. This paper aims to briefly indicate some causes, characteristics and consequences of global economic processes which, in a way paradoxically, contribute more to narrowing than to opening spaces of human liberty, or simply generate proliferation of "hedonism of unfreedom". Is this another case of "surplus of knowledge" and "deficit of wisdom" that so strongly characterize our age, or something else?.

Download Full-text

HOLISTIC PARAMETRIC RECONSTRUCTION OF BUILDING MODELS FROM POINT CLOUDS

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xliii-b2-2020-689-2020 ◽

2020 ◽

Vol XLIII-B2-2020 ◽

pp. 689-695

Author(s):

Z. Li ◽

W. Zhang ◽

J. Shan

Keyword(s):

Large Scale ◽

Traditional Approach ◽

Holistic Approach ◽

Point Clouds ◽

Airborne Lidar ◽

Reconstruction Method ◽

Optimization Strategy ◽

Building Roof ◽

Building Model ◽

Building Models

Abstract. Building models are conventionally reconstructed by building roof points via planar segmentation and then using a topology graph to group the planes together. Roof edges and vertices are then mathematically represented by intersecting segmented planes. Technically, such solution is based on sequential local fitting, i.e., the entire data of one building are not simultaneously participating in determining the building model. As a consequence, the solution is lack of topological integrity and geometric rigor. Fundamentally different from this traditional approach, we propose a holistic parametric reconstruction method which means taking into consideration the entire point clouds of one building simultaneously. In our work, building models are reconstructed from predefined parametric (roof) primitives. We first use a well-designed deep neural network to segment and identify primitives in the given building point clouds. A holistic optimization strategy is then introduced to simultaneously determine the parameters of a segmented primitive. In the last step, the optimal parameters are used to generate a watertight building model in CityGML format. The airborne LiDAR dataset RoofN3D with predefined roof types is used for our test. It is shown that PointNet++ applied to the entire dataset can achieve an accuracy of 83% for primitive classification. For a subset of 910 buildings in RoofN3D, the holistic approach is then used to determine the parameters of primitives and reconstruct the buildings. The achieved overall quality of reconstruction is 0.08 meters for point-surface-distance or 0.7 times RMSE of the input LiDAR points. This study demonstrates the efficiency and capability of the proposed approach and its potential to handle large scale urban point clouds.

Download Full-text

Large-scale holistic approach to Web block classification: assembling the jigsaws of a Web page puzzle

World Wide Web ◽

10.1007/s11280-018-0634-6 ◽

2018 ◽

Vol 22 (5) ◽

pp. 1999-2015

Author(s):

Andrey Kravchenko

Keyword(s):

Large Scale ◽

Holistic Approach ◽

Web Page ◽

Block Classification

Download Full-text

Sustainability of digital humanities projects as a publication and documentation challenge

Journal of Documentation ◽

10.1108/jd-12-2019-0232 ◽

2020 ◽

Vol 76 (5) ◽

pp. 1019-1031 ◽

Cited By ~ 1

Author(s):

Jennifer Edmond ◽

Francesca Morselli

Keyword(s):

Digital Humanities ◽

Large Scale ◽

Explicit Knowledge ◽

Public Investment ◽

Holistic Approach ◽

Research Projects ◽

Content Type ◽

Sustainability Planning ◽

Project Sustainability ◽

New Perspective

PurposeThis paper proposes a new perspective on the enormous and unresolved challenge to existing practices of publication and documentation posed by the outputs of digital research projects in the humanities, where much good work is being lost due to resource or technical challenges.Design/methodology/approachThe paper documents and analyses both the existing literature on promoting sustainability for the outputs of digital humanities projects and the innovative approach of a single large-scale project.FindingsThe findings of the research presented show that sustainability planning for large-scale research projects needs to consider data and technology but also community, communications and process knowledge simultaneously. In addition, it should focus not only on a project as a collection of tangible and intangible assets, but also on the potential user base for these assets and what these users consider valuable about them.Research limitations/implicationsThe conclusions of the paper have been formulated in the context of one specific project. As such, it may amplify the specificities of this project in its results.Practical implicationsAn approach to project sustainability following the recommendations outlined in this paper would include a number of uncommon features, such as a longer development horizon, wider perspective on project results, and an audit of tacit and explicit knowledge.Social ImplicationsThese results can ultimately preserve public investment in projects.Originality/valueThis paper supplements more reductive models for project sustainability with a more holistic approach that others may learn from in mapping and sustaining user value for their projects for the medium to long terms.

Download Full-text

An Integrated Approach to 3D Web Visualization of Cultural Heritage Heterogeneous Datasets

Remote Sensing ◽

10.3390/rs11212508 ◽

2019 ◽

Vol 11 (21) ◽

pp. 2508 ◽

Cited By ~ 3

Author(s):

Argyro-Maria Boutsi ◽

Charalabos Ioannidis ◽

Sofia Soile

Keyword(s):

Cultural Heritage ◽

Programming Languages ◽

Large Scale ◽

User Interaction ◽

Holistic Approach ◽

Integrated Approach ◽

3D Models ◽

Data Interoperability ◽

Web Based ◽

3D Scene

The evolution of the high-quality 3D archaeological representations from niche products to integrated online media has not yet been completed. Digital archives of the field often lack multimodal data interoperability, user interaction and intelligibility. A web-based cultural heritage archive that compensates for these issues is presented in this paper. The multi-resolution 3D models constitute the core of the visualization on top of which supportive documentation data and multimedia content are spatial and logical connected. Our holistic approach focuses on the dynamic manipulation of the 3D scene through the development of advanced navigation mechanisms and information retrieval tools. Users parse the multi-modal content in a geo-referenced way through interactive annotation systems over cultural points of interest and automatic narrative tours. Multiple 3D and 2D viewpoints are enabled in real-time to support data inspection. The implementation exploits front-end programming languages, 3D graphic libraries and visualization frameworks to handle efficiently the asynchronous operations and preserve the initial assets’ accuracy. The choice of Greece’s Meteora, UNESCO world site, as a case study accounts for the platform’s applicability to complex geometries and large-scale historical environments.

Download Full-text

A Dynamic Programmable Network for Large-Scale Scientific Data Transfer Using AmoebaNet

Applied Sciences ◽

10.3390/app9214541 ◽

2019 ◽

Vol 9 (21) ◽

pp. 4541

Author(s):

Syed Asif Raza Shah ◽

Seo-Young Noh

Keyword(s):

High Performance ◽

Large Scale ◽

Data Transfer ◽

Scientific Data ◽

Network Service ◽

Bulk Data ◽

Experimental Facilities ◽

Bulk Data Transfer ◽

Programmable Network ◽

Extreme Scale

Large scientific experimental facilities currently are generating a tremendous amount of data. In recent years, the significant growth of scientific data analysis has been observed across scientific research centers. Scientific experimental facilities are producing an unprecedented amount of data and facing new challenges to transfer the large data sets across multi continents. In particular, these days the data transfer is playing an important role in new scientific discoveries. The performance of distributed scientific environment is highly dependent on high-performance, adaptive, and robust network service infrastructures. To support large scale data transfer for extreme-scale distributed science, there is the need of high performance, scalable, end-to-end, and programmable networks that enable scientific applications to use the networks efficiently. We worked on the AmoebaNet solution to address the problems of a dynamic programmable network for bulk data transfer in extreme-scale distributed science environments. A major goal of the AmoebaNet project is to apply software-defined networking (SDN) technology to provide “Application-aware” network to facilitate bulk data transfer. We have prototyped AmoebaNet’s SDN-enabled network service that allows application to dynamically program the networks at run-time for bulk data transfers. In this paper, we evaluated AmoebaNet solution with real world test cases and shown that how it efficiently and dynamically can use the networks for bulk data transfer in large-scale scientific environments.

Download Full-text

Numerical Study of Shear Banding in Flows of Fluids Governed by the Rolie-Poly Two-Fluid Model via Stabilized Finite Volume Methods

Processes ◽

10.3390/pr8070810 ◽

2020 ◽

Vol 8 (7) ◽

pp. 810

Author(s):

Jade Gesare Abuga ◽

Tiri Chinyoka

Keyword(s):

Finite Volume ◽

Numerical Algorithm ◽

Numerical Study ◽

Fluid Model ◽

Shear Banding ◽

Numerical Algorithms ◽

Viscoelastic Fluids ◽

Two Fluid Model ◽

Two Fluid ◽

Good Agreement

The flow of viscoelastic fluids may, under certain conditions, exhibit shear-banding characteristics that result from their susceptibility to unusual flow instabilities. In this work, we explore both the existing shear banding mechanisms in the literature, namely; constitutive instabilities and flow-induced inhomogeneities. Shear banding due to constitutive instabilities is modelled via either the Johnson–Segalman or the Giesekus constitutive models. Shear banding due to flow-induced inhomogeneities is modelled via the Rolie–Poly constitutive model. The Rolie–Poly constitutive equation is especially chosen because it expresses, precisely, the shear rheometry of polymer solutions for a large number of strain rates. For the Rolie–Poly approach, we use the two-fluid model wherein the stress dynamics are coupled with concentration equations. We follow a computational analysis approach via an efficient and versatile numerical algorithm. The numerical algorithm is based on the Finite Volume Method (FVM) and it is implemented in the open-source software package, OpenFOAM. The efficiency of our numerical algorithms is enhanced via two possible stabilization techniques, namely; the Log-Conformation Reformulation (LCR) and the Discrete Elastic Viscous Stress Splitting (DEVSS) methodologies. We demonstrate that our stabilized numerical algorithms accurately simulate these complex (shear banded) flows of complex (viscoelastic) fluids. Verification of the shear-banding results via both the Giesekus and Johnson-Segalman models show good agreement with existing literature using the DEVSS technique. A comparison of the Rolie–Poly two-fluid model results with existing literature for the concentration and velocity profiles is also in good agreement.

Download Full-text