RDMTk

Abstract. In the simulation of complex multi-scale flow problems, such as those arising in weather and climate modelling, one of the biggest challenges is to satisfy operational requirements in terms of time-to-solution and energy-to-solution yet without compromising the accuracy and stability of the calculation. These competing factors require the development of state-of-the-art algorithms that can optimally exploit the targeted underlying hardware and efficiently deliver the extreme computational capabilities typically required in operational forecast production. These algorithms should (i) minimise the energy footprint along with the time required to produce a solution, (ii) maintain a satisfying level of accuracy, (iii) be numerically stable and resilient, in case of hardware or software failure. The European Centre for Medium Range Weather Forecasts (ECMWF) is leading a project called ESCAPE (Energy-efficient SCalable Algorithms for weather Prediction on Exascale supercomputers) which is funded by Horizon 2020 (H2020) under initiative Future and Emerging Technologies in High Performance Computing (FET-HPC). The goal of the ESCAPE project is to develop a sustainable strategy to evolve weather and climate prediction models to next-generation computing technologies. The project partners incorporate the expertise of leading European regional forecasting consortia, university research, experienced high-performance computing centres and hardware vendors. This paper presents an overview of results obtained in the ESCAPE project in which weather prediction have been broken down into smaller building blocks called dwarfs. The participating weather prediction models are: IFS (Integrated Forecasting System), ALARO – a combination of AROME (Application de la Recherche à l'Opérationnel a Meso-Echelle) and ALADIN (Aire Limitée Adaptation Dynamique Développement International) and COSMO-EULAG – a combination of COSMO (Consortium for Small-scale Modeling) and EULAG (Eulerian/semi-Lagrangian fluid solver). The dwarfs are analysed and optimised in terms of computing performance for different hardware architectures (mainly Intel Skylake CPUs, NVIDIA GPUs, Intel Xeon Phi). The ESCAPE project includes the development of new algorithms that are specifically designed for better energy efficiency and improved portability through domain specific languages. In addition, the modularity of the algorithmic framework, naturally allows testing different existing numerical approaches, and their interplay with the emerging heterogeneous hardware landscape. Throughout the paper, we will compare different numerical techniques to solve the main building blocks that constitute weather models, in terms of energy efficiency and performance, on a variety of computing technologies.

Download Full-text

Characterizing land surface anisotropy from AVHRR data at a global scale using high performance computing

International Journal of Remote Sensing ◽

10.1080/01431160121422 ◽

2001 ◽

Vol 22 (11) ◽

pp. 2171-2191 ◽

Cited By ~ 24

Author(s):

S. N. V. Kalluri ◽

Z. Zhang ◽

J. Jájá ◽

S. Liang ◽

J. R. G. Townshend

Keyword(s):

High Performance Computing ◽

Land Surface ◽

High Performance ◽

Global Scale ◽

Surface Anisotropy ◽

Avhrr Data ◽

Performance Computing

Download Full-text

20 GB in 10 minutes: A case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration

10.7287/peerj.preprints.26951v1 ◽

2018 ◽

Author(s):

Anne E Thessen ◽

Jorrit H Poelen ◽

Matthew Collins ◽

Jen Hammock

Keyword(s):

High Performance Computing ◽

High Performance ◽

Open Data ◽

Global Scale ◽

Technical Solution ◽

Data Types ◽

Combining Data ◽

High Performance Computing Cluster ◽

Diverse Groups ◽

Performance Computing

Biodiversity information is made available through numerous databases that each have their own data models, web services, and data types. Combining data across databases leads to new insights, but is not easy because each database uses its own system of identifiers. In the absence of stable and interoperable identifiers, databases are often linked using taxonomic names. This labor intensive, error prone, and lengthy process relies on accessible versions of nomenclatural authorities and fuzzy-matching algorithms. To approach the challenge of linking diverse data, more than technology is needed. New social collaborations like the Global Unified Open Data Architecture (GUODA) that combine skills from diverse groups of computer engineers from iDigBio, server resources from the Advanced Computing and Information Systems (ACIS) Lab, global-scale data presentation from EOL, and independent developers and researchers are what is needed to make concrete progress on finding relationships between biodiversity datasets. This paper will discuss a technical solution developed by the GUODA collaboration for faster linking across databases with a use case linking Wikidata and the Global Biodiversity Interactions database (GloBI). The GUODA infrastructure is a 12-node, high performance computing cluster made up of about 192 threads with 12 TB of storage and 288 GB memory. Using GUODA, 20GB of compressed JSON from Wikidata was processed and linked to GloBI in about 10-11 minutes. Instead of comparing name strings or relying on a single identifier, Wikidata and GloBI were linked by comparing graphs of biodiversity identifiers external to each system. This method resulted in adding 119,957 Wikidata links in GloBI, an increase of 13.7% of all outgoing name links in GloBI. Wikidata and GloBI were compared to Open Tree Taxonomy to examine consistency and coverage. The process of parsing Wikidata, Open Tree Taxonomy and GloBI archives and calculating consistency metrics was done in minutes on the GUODA platform. As a model collaboration, GUODA has the potential to revolutionize biodiversity science by bringing diverse technically minded people together with high performance computing resources that are accessible from a laptop or desktop. However, participating in such a collaboration still requires basic programming skills.

Download Full-text

Leveraging High Performance Computing for Bioinformatics: A Methodology that Enables a Reliable Decision-Making

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) ◽

10.1109/ccgrid.2016.69 ◽

2016 ◽

Cited By ~ 3

Author(s):

Mariza Ferro ◽

Marisa F. Nicolas ◽

Quadalupe Del Rosario Q. Saji ◽

Antonio R. Mury ◽

Bruno Schulze

Keyword(s):

Decision Making ◽

High Performance Computing ◽

High Performance ◽

Performance Computing

Download Full-text

20 GB in 10 minutes: a case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration

PeerJ Computer Science ◽

10.7717/peerj-cs.164 ◽

2018 ◽

Vol 4 ◽

pp. e164 ◽

Cited By ~ 2

Author(s):

Anne E. Thessen ◽

Jorrit H. Poelen ◽

Matthew Collins ◽

Jen Hammock

Keyword(s):

High Performance Computing ◽

High Performance ◽

Open Data ◽

Biotic Interactions ◽

Global Scale ◽

Tree Of Life ◽

Technical Solution ◽

Data Types ◽

Combining Data ◽

Performance Computing

Biodiversity information is made available through numerous databases that each have their own data models, web services, and data types. Combining data across databases leads to new insights, but is not easy because each database uses its own system of identifiers. In the absence of stable and interoperable identifiers, databases are often linked using taxonomic names. This labor intensive, error prone, and lengthy process relies on accessible versions of nomenclatural authorities and fuzzy-matching algorithms. To approach the challenge of linking diverse data, more than technology is needed. New social collaborations like the Global Unified Open Data Architecture (GUODA) that combines skills from diverse groups of computer engineers from iDigBio, server resources from the Advanced Computing and Information Systems (ACIS) Lab, global-scale data presentation from EOL, and independent developers and researchers are what is needed to make concrete progress on finding relationships between biodiversity datasets. This paper will discuss a technical solution developed by the GUODA collaboration for faster linking across databases with a use case linking Wikidata and the Global Biotic Interactions database (GloBI). The GUODA infrastructure is a 12-node, high performance computing cluster made up of about 192 threads with 12 TB of storage and 288 GB memory. Using GUODA, 20 GB of compressed JSON from Wikidata was processed and linked to GloBI in about 10–11 min. Instead of comparing name strings or relying on a single identifier, Wikidata and GloBI were linked by comparing graphs of biodiversity identifiers external to each system. This method resulted in adding 119,957 Wikidata links in GloBI, an increase of 13.7% of all outgoing name links in GloBI. Wikidata and GloBI were compared to Open Tree of Life Reference Taxonomy to examine consistency and coverage. The process of parsing Wikidata, Open Tree of Life Reference Taxonomy and GloBI archives and calculating consistency metrics was done in minutes on the GUODA platform. As a model collaboration, GUODA has the potential to revolutionize biodiversity science by bringing diverse technically minded people together with high performance computing resources that are accessible from a laptop or desktop. However, participating in such a collaboration still requires basic programming skills.

Download Full-text

20 GB in 10 minutes: A case for linking major biodiversity databases using an open socio-technical infrastructure and a pragmatic, cross-institutional collaboration

10.7287/peerj.preprints.26951 ◽

2018 ◽

Author(s):

Anne E Thessen ◽

Jorrit H Poelen ◽

Matthew Collins ◽

Jen Hammock

Keyword(s):

High Performance Computing ◽

High Performance ◽

Open Data ◽

Global Scale ◽

Technical Solution ◽

Data Types ◽

Combining Data ◽

High Performance Computing Cluster ◽

Diverse Groups ◽

Performance Computing

Biodiversity information is made available through numerous databases that each have their own data models, web services, and data types. Combining data across databases leads to new insights, but is not easy because each database uses its own system of identifiers. In the absence of stable and interoperable identifiers, databases are often linked using taxonomic names. This labor intensive, error prone, and lengthy process relies on accessible versions of nomenclatural authorities and fuzzy-matching algorithms. To approach the challenge of linking diverse data, more than technology is needed. New social collaborations like the Global Unified Open Data Architecture (GUODA) that combine skills from diverse groups of computer engineers from iDigBio, server resources from the Advanced Computing and Information Systems (ACIS) Lab, global-scale data presentation from EOL, and independent developers and researchers are what is needed to make concrete progress on finding relationships between biodiversity datasets. This paper will discuss a technical solution developed by the GUODA collaboration for faster linking across databases with a use case linking Wikidata and the Global Biodiversity Interactions database (GloBI). The GUODA infrastructure is a 12-node, high performance computing cluster made up of about 192 threads with 12 TB of storage and 288 GB memory. Using GUODA, 20GB of compressed JSON from Wikidata was processed and linked to GloBI in about 10-11 minutes. Instead of comparing name strings or relying on a single identifier, Wikidata and GloBI were linked by comparing graphs of biodiversity identifiers external to each system. This method resulted in adding 119,957 Wikidata links in GloBI, an increase of 13.7% of all outgoing name links in GloBI. Wikidata and GloBI were compared to Open Tree Taxonomy to examine consistency and coverage. The process of parsing Wikidata, Open Tree Taxonomy and GloBI archives and calculating consistency metrics was done in minutes on the GUODA platform. As a model collaboration, GUODA has the potential to revolutionize biodiversity science by bringing diverse technically minded people together with high performance computing resources that are accessible from a laptop or desktop. However, participating in such a collaboration still requires basic programming skills.

Download Full-text

Usage of Publicly Available Software for Epidemiological Trends Modelling

Cybernetics and Computer Technologies ◽

10.34229/2707-451x.20.3.4 ◽

2020 ◽

pp. 32-42

Author(s):

M. Dunaievskyi ◽

O. Lefterov ◽

V. Bolshakov

Keyword(s):

Decision Making ◽

High Performance Computing ◽

High Performance ◽

Disease Modeling ◽

Ease Of Use ◽

Free Access ◽

Deterministic Models ◽

Proactive Management ◽

Epidemiological Trends ◽

Performance Computing

Introduction. Outbreaks of infectious diseases and the COVID-19 pandemic in particular pose a serious public health challenge. The other side of the challenge is always opportunity, and today such opportunities are information technology, decision making systems, best practices of proactive management and control based on modern methods of data analysis (data driven decision making) and modeling. The article reviews the prospects for the use of publicly available software in modeling epidemiological trends. Strengths and weaknesses, main characteristics and possible aspects of application are considered. The purpose of the article is to review publicly available health software. Give situations in which one or another approach will be useful. Segment and determine the effectiveness of the underlying models. Note the prospects of high-performance computing to model the spread of epidemics. Results. Although deterministic models are ready for practical use without specific additional settings, they lose comparing to other groups in terms of their functionality. To obtain evaluation results from stochastic and agentoriented models, you first need to specify the epidemic model, which requires deeper knowledge in the field of epidemiology, a good understanding of the statistical basis and the basic assumptions on which the model is based. Among the considered software, EMOD (Epidemiological MODelling software) from the Institute of Disease Modeling is a leader in functionality. Conclusions. There is a free access to a relatively wide set of software, which was originally developed by antiepidemiological institutions for internal use in decision-making, however was later opened to the public. In general, these programs have been adapted to increase their practical application. Got narrowed focus on potential issues. The possibility of adaptive use was provided. We can note the sufficient informativeness and convenience of using the software of the group of deterministic methods. Also, such models have a rather narrow functional focus. Stochastic models provide more functionality, but lose some of their ease of use. We have the maximum functionality from agentoriented models, although for their most effective use you need to have the appropriate skills to write program code. Keywords: epidemiological software, deterministic modeling, stochastic modeling, agentoriented mode-ling, high performance computing, decision making systems.

Download Full-text