Low Energy Consumption on Post-Moore Platforms for HPC Research

The increase in computational capacities has helped in the exploration, production and research process, this has allowed the use of applications that were infeasible years ago. This increase brings us a new Era (known as Post-Moore Era) and a wide range of promising devices, devices such as Single Board Computers (SBC) or Personal Computers (PC) that achieve performance that a decade ago was only found on a Server. This work presents high performance computing devices with low monetary cost and low energy cost that meet the needs for the development of research in Artificial Intelligent (AI) applications, in-situ data analysis and simulations that can be implemented on a large scale, these devices are compared in different tests, presenting advantages such as its performance per watt consumed, smart form, among others.

Download Full-text

Leveraging High Performance Computing for Managing Large and Evolving Data Collections

International Journal of Digital Curation ◽

10.2218/ijdc.v9i2.331 ◽

2014 ◽

Vol 9 (2) ◽

pp. 17-27 ◽

Cited By ~ 6

Author(s):

Ritu Arora ◽

Maria Esteva ◽

Jessica Trelogan

Keyword(s):

Data Management ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Research Process ◽

Open Science ◽

Test Case ◽

Growth Data ◽

Data Types ◽

Performance Computing

The process of developing a digital collection in the context of a research project often involves a pipeline pattern during which data growth, data types, and data authenticity need to be assessed iteratively in relation to the different research steps and in the interest of archiving. Throughout a project’s lifecycle curators organize newly generated data while cleaning and integrating legacy data when it exists, and deciding what data will be preserved for the long term. Although these actions should be part of a well-oiled data management workflow, there are practical challenges in doing so if the collection is very large and heterogeneous, or is accessed by several researchers contemporaneously. There is a need for data management solutions that can help curators with efficient and on-demand analyses of their collection so that they remain well-informed about its evolving characteristics. In this paper, we describe our efforts towards developing a workflow to leverage open science High Performance Computing (HPC) resources for routinely and efficiently conducting data management tasks on large collections. We demonstrate that HPC resources and techniques can significantly reduce the time for accomplishing critical data management tasks, and enable a dynamic archiving throughout the research process. We use a large archaeological data collection with a long and complex formation history as our test case. We share our experiences in adopting open science HPC resources for large-scale data management, which entails understanding usage of the open source HPC environment and training users. These experiences can be generalized to meet the needs of other data curators working with large collections.

Download Full-text

MetaMap: An atlas of metatranscriptomic reads in human disease-related RNA-seq data

10.1101/269092 ◽

2018 ◽

Cited By ~ 1

Author(s):

LM Simon ◽

S Karg ◽

AJ Westermann ◽

M Engel ◽

AHA Elbehery ◽

...

Keyword(s):

High Performance Computing ◽

Human Disease ◽

High Performance ◽

Large Scale ◽

Expression Patterns ◽

Rna Seq ◽

Wide Range ◽

Eukaryotic Gene ◽

Public Repositories ◽

Performance Computing

AbstractBackgroundWith the advent of the age of big data in bioinformatics, large volumes of data and high performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts, but its generic nature also enables the detection of microbial and viral transcripts.FindingsWe developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from 6 independent controlled infection experiments of cell line models and comparison with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from >17,000 samples from >400 studies relevant to human disease using state-of-the-art high performance computing systems. The resulting data of this large-scale re-analysis are made available in the presented MetaMap resource.ConclusionsOur results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation towards the role of the microbiome in human disease.

Download Full-text

Global soil moisture data derived through machine learning trained with in-situ measurements

Scientific Data ◽

10.1038/s41597-021-00964-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Sungmin O. ◽

Rene Orth

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Large Scale ◽

Short Term Memory ◽

Temporal Dynamics ◽

Soil Moisture Data ◽

Wide Range ◽

Global Soil

AbstractWhile soil moisture information is essential for a wide range of hydrologic and climate applications, spatially-continuous soil moisture data is only available from satellite observations or model simulations. Here we present a global, long-term dataset of soil moisture derived through machine learning trained with in-situ measurements, SoMo.ml. We train a Long Short-Term Memory (LSTM) model to extrapolate daily soil moisture dynamics in space and in time, based on in-situ data collected from more than 1,000 stations across the globe. SoMo.ml provides multi-layer soil moisture data (0–10 cm, 10–30 cm, and 30–50 cm) at 0.25° spatial and daily temporal resolution over the period 2000–2019. The performance of the resulting dataset is evaluated through cross validation and inter-comparison with existing soil moisture datasets. SoMo.ml performs especially well in terms of temporal dynamics, making it particularly useful for applications requiring time-varying soil moisture, such as anomaly detection and memory analyses. SoMo.ml complements the existing suite of modelled and satellite-based datasets given its distinct derivation, to support large-scale hydrological, meteorological, and ecological analyses.

Download Full-text

News & Trends - Is high-performance computing entering a new era?

IEEE Internet Computing ◽

10.1109/mic.2004.1273479 ◽

2004 ◽

Vol 8 (2) ◽

pp. 9-11

Author(s):

G. Goth

Keyword(s):

High Performance Computing ◽

High Performance ◽

New Era ◽

Performance Computing

Download Full-text

Flux rope axis geometry of magnetic clouds deduced from in situ data

Proceedings of the International Astronomical Union ◽

10.1017/s1743921313011071 ◽

2013 ◽

Vol 8 (S300) ◽

pp. 265-268

Author(s):

Miho Janvier ◽

Pascal Démoulin ◽

Sergio Dasso

Keyword(s):

Interplanetary Medium ◽

Magnetic Cloud ◽

Flux Rope ◽

Magnetic Clouds ◽

Direct Integration ◽

Axis Orientation ◽

In Situ Data ◽

Geometrical Features ◽

Wide Range

AbstractMagnetic clouds (MCs) consist of flux ropes that are ejected from the low solar corona during eruptive flares. Following their ejection, they propagate in the interplanetary medium where they can be detected by in situ instruments and heliospheric imagers onboard spacecraft. Although in situ measurements give a wide range of data, these only depict the nature of the MC along the unidirectional trajectory crossing of a spacecraft. As such, direct 3D measurements of MC characteristics are impossible. From a statistical analysis of a wide range of MCs detected at 1 AU by the Wind spacecraft, we propose different methods to deduce the most probable magnetic cloud axis shape. These methods include the comparison of synthetic distributions with observed distributions of the axis orientation, as well as the direct integration of observed probability distribution to deduce the global MC axis shape. The overall shape given by those two methods is then compared with 2D heliospheric images of a propagating MC and we find similar geometrical features.

Download Full-text

Coupled Thermo-Hydro-Geochemical Models of Engineered Barrier Systems: The Febex Project

MRS Proceedings ◽

10.1557/proc-663-561 ◽

2000 ◽

Vol 663 ◽

Cited By ~ 2

Author(s):

J. Samper ◽

R. Juncosa ◽

V. Navarro ◽

J. Delgado ◽

L. Montenegro ◽

...

Keyword(s):

Large Scale ◽

Reference Model ◽

Numerical Models ◽

Full Scale ◽

Small Scale ◽

Model Parameters ◽

In Situ Test ◽

Wide Range ◽

Engineered Barrier

ABSTRACTFEBEX (Full-scale Engineered Barrier EXperiment) is a demonstration and research project dealing with the bentonite engineered barrier designed for sealing and containment of waste in a high level radioactive waste repository (HLWR). It includes two main experiments: an situ full-scale test performed at Grimsel (GTS) and a mock-up test operating since February 1997 at CIEMAT facilities in Madrid (Spain) [1,2,3]. One of the objectives of FEBEX is the development and testing of conceptual and numerical models for the thermal, hydrodynamic, and geochemical (THG) processes expected to take place in engineered clay barriers. A significant improvement in coupled THG modeling of the clay barrier has been achieved both in terms of a better understanding of THG processes and more sophisticated THG computer codes. The ability of these models to reproduce the observed THG patterns in a wide range of THG conditions enhances the confidence in their prediction capabilities. Numerical THG models of heating and hydration experiments performed on small-scale lab cells provide excellent results for temperatures, water inflow and final water content in the cells [3]. Calculated concentrations at the end of the experiments reproduce most of the patterns of measured data. In general, the fit of concentrations of dissolved species is better than that of exchanged cations. These models were later used to simulate the evolution of the large-scale experiments (in situ and mock-up). Some thermo-hydrodynamic hypotheses and bentonite parameters were slightly revised during TH calibration of the mock-up test. The results of the reference model reproduce simultaneously the observed water inflows and bentonite temperatures and relative humidities. Although the model is highly sensitive to one-at-a-time variations in model parameters, the possibility of parameter combinations leading to similar fits cannot be precluded. The TH model of the “in situ” test is based on the same bentonite TH parameters and assumptions as for the “mock-up” test. Granite parameters were slightly modified during the calibration process in order to reproduce the observed thermal and hydrodynamic evolution. The reference model captures properly relative humidities and temperatures in the bentonite [3]. It also reproduces the observed spatial distribution of water pressures and temperatures in the granite. Once calibrated the TH aspects of the model, predictions of the THG evolution of both tests were performed. Data from the dismantling of the in situ test, which is planned for the summer of 2001, will provide a unique opportunity to test and validate current THG models of the EBS.

Download Full-text

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3219819.3219927 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alex Gittens ◽

Kai Rothauge ◽

Shusen Wang ◽

Michael W. Mahoney ◽

Lisa Gerhardt ◽

...

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

Fast Playback Framework for Analysis of Ground-Based Doppler Radar Observations Using MapReduce Technology

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-15-0118.1 ◽

2016 ◽

Vol 33 (4) ◽

pp. 621-634 ◽

Cited By ~ 4

Author(s):

Jingyin Tang ◽

Corene J. Matyas

Keyword(s):

Spatial Analysis ◽

High Performance ◽

Large Scale ◽

Doppler Radar ◽

Weather Events ◽

Data Architecture ◽

Research Grade ◽

High Performance Computing Cluster ◽

Time Systems ◽

Performance Computing

AbstractThe creation of a 3D mosaic is often the first step when using the high-spatial- and temporal-resolution data produced by ground-based radars. Efficient yet accurate methods are needed to mosaic data from dozens of radar to better understand the precipitation processes in synoptic-scale systems such as tropical cyclones. Research-grade radar mosaic methods of analyzing historical weather events should utilize data from both sides of a moving temporal window and process them in a flexible data architecture that is not available in most stand-alone software tools or real-time systems. Thus, these historical analyses require a different strategy for optimizing flexibility and scalability by removing time constraints from the design. This paper presents a MapReduce-based playback framework using Apache Spark’s computational engine to interpolate large volumes of radar reflectivity and velocity data onto 3D grids. Designed as being friendly to use on a high-performance computing cluster, these methods may also be executed on a low-end configured machine. A protocol is designed to enable interoperability with GIS and spatial analysis functions in this framework. Open-source software is utilized to enhance radar usability in the nonspecialist community. Case studies during a tropical cyclone landfall shows this framework’s capability of efficiently creating a large-scale high-resolution 3D radar mosaic with the integration of GIS functions for spatial analysis.

Download Full-text

High-Performance Computing Framework Based on Distributed Systems for Large-Scale Neurophysiological Data

10.21203/rs.3.rs-136986/v1 ◽

2021 ◽

Author(s):

Mohsen Hadianpour ◽

Ehsan Rezayat ◽

Mohammad-Reza Dehaqani

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Electrophysiological Recording ◽

Neural Data ◽

Data Framework ◽

Neurophysiological Data ◽

Computing Framework ◽

Performance Computing ◽

Neuroscience Community

Abstract Due to the significantly drastic progress and improvement in neurophysiological recording technologies, neuroscientists have faced various complexities dealing with unstructured large-scale neural data. In the neuroscience community, these complexities could create serious bottlenecks in storing, sharing, and processing neural datasets. In this article, we developed a distributed high-performance computing (HPC) framework called `Big neuronal data framework' (BNDF), to overcome these complexities. BNDF is based on open-source big data frameworks, Hadoop and Spark providing a flexible and scalable structure. We examined BNDF on three different large-scale electrophysiological recording datasets from nonhuman primate’s brains. Our results exhibited faster runtimes with scalability due to the distributed nature of BNDF. We compared BNDF results to a widely used platform like MATLAB in an equitable computational resource. Compared with other similar methods, using BNDF provides more than five times faster performance in spike sorting as a usual neuroscience application.

Download Full-text

Measuring and tuning energy efficiency on large scale high performance computing platforms.

10.2172/1035312 ◽

2011 ◽

Cited By ~ 1

Author(s):

James H., III Laros

Keyword(s):

Energy Efficiency ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Computing Platforms ◽

Performance Computing

Download Full-text