DAPT: A Package Enabling Distributed Automated Parameter Testing

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches) as well as storing simulation data requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster through the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) "database", multiple individuals can run tests simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here we describe DAPT and provide an example demonstrating its use.

Download Full-text

DAPT: A package enabling distributed automated parameter testing

Gigabyte ◽

10.46471/gigabyte.22 ◽

2021 ◽

Vol 2021 ◽

pp. 1-10

Author(s):

Ben Duggan ◽

John Metzcar ◽

Paul Macklin

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Simulation Models ◽

Power Combining ◽

Agent Based ◽

Tool Set ◽

Computational Resources ◽

Performance Computing ◽

Python Package

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches), as well as storing simulation data, requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster with the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) “database”, multiple individuals can run parameter sets simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here, we describe DAPT and provide an example demonstrating its use.

Download Full-text

DAPT: A Package Enabling Distributed Automated Parameter Testing

10.20944/preprints202103.0116.v2 ◽

2021 ◽

Author(s):

Ben Duggan ◽

John Metzcar ◽

Paul Macklin

Keyword(s):

High Performance ◽

Large Scale ◽

Ad Hoc ◽

Simulation Models ◽

Power Combining ◽

Agent Based ◽

Tool Set ◽

Computational Resources ◽

Performance Computing ◽

Python Package

Modern agent-based models (ABM) and other simulation models require evaluation and testing of many different parameters. Managing that testing for large scale parameter sweeps (grid searches) as well as storing simulation data requires multiple, potentially customizable steps that may vary across simulations. Furthermore, parameter testing, processing, and analysis are slowed if simulation and processing jobs cannot be shared across teammates or computational resources. While high-performance computing (HPC) has become increasingly available, models can often be tested faster through the use of multiple computers and HPC resources. To address these issues, we created the Distributed Automated Parameter Testing (DAPT) Python package. By hosting parameters in an online (and often free) "database", multiple individuals can run parameter sets simultaneously in a distributed fashion, enabling ad hoc crowdsourcing of computational power. Combining this with a flexible, scriptable tool set, teams can evaluate models and assess their underlying hypotheses quickly. Here we describe DAPT and provide an example demonstrating its use.

Download Full-text

High-performance computing service for bioinformatics and data science

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2018.512 ◽

2018 ◽

Vol 106 (4) ◽

Author(s):

Jean-Paul Courneya ◽

Alexa Mayo

Keyword(s):

Open Source ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Data Science ◽

Wet Work ◽

High Throughput Data ◽

Guided Learning ◽

Computational Resources ◽

Performance Computing

Despite having an ideal setup in their labs for wet work, researchers often lack the computational infrastructure to analyze the magnitude of data that result from “-omics” experiments. In this innovative project, the library supports analysis of high-throughput data from global molecular profiling experiments by offering a high-performance computer with open source software along with expert bioinformationist support. The audience for this new service is faculty, staff, and students for whom using the university’s large scale, CORE computational resources is not warranted because these resources exceed the needs of smaller projects. In the library’s approach, users are empowered to analyze high-throughput data that they otherwise would not be able to on their own computers. To develop the project, the library’s bioinformationist identified the ideal computing hardware and a group of open source bioinformatics software to provide analysis options for experimental data such as scientific images, sequence reads, and flow cytometry files. To close the loop between learning and practice, the bioinformationist developed self-guided learning materials and workshops or consultations on topics such as the National Center for Biotechnology Information’s BLAST, Bioinformatics on the Cloud, and ImageJ. Researchers apply the data analysis techniques that they learned in the classroom in an ideal computing environment.

Download Full-text

Learning-accelerated Discovery of Immune-Tumour Interactions

10.1101/573972 ◽

2019 ◽

Author(s):

Jonathan Ozik ◽

Nicholson Collier ◽

Randy Heiland ◽

Gary An ◽

Paul Macklin

Keyword(s):

Open Source ◽

High Performance Computing ◽

High Performance ◽

Spatial Dynamics ◽

Simulation Models ◽

Scale Model ◽

Agent Based ◽

Extreme Scale ◽

Cancer Immunotherapies ◽

Performance Computing

We present an integrated framework for enabling dynamic exploration of design spaces for cancer immunotherapies with detailed dynamical simulation models on high-performance computing resources. Our framework combines PhysiCell, an open source agent-based simulation platform for cancer and other multicellular systems, and EMEWS, an open source platform for extreme-scale model exploration. We build an agent-based model of immunosurveillance against heterogeneous tumours, which includes spatial dynamics of stochastic tumour-immune contact interactions. We implement active learning and genetic algorithms using high-performance computing workflows to adaptively sample the model parameter space and iteratively discover optimal cancer regression regions within biological and clinical constraints.

Download Full-text

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3219819.3219927 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alex Gittens ◽

Kai Rothauge ◽

Shusen Wang ◽

Michael W. Mahoney ◽

Lisa Gerhardt ◽

...

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

Fast Playback Framework for Analysis of Ground-Based Doppler Radar Observations Using MapReduce Technology

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-15-0118.1 ◽

2016 ◽

Vol 33 (4) ◽

pp. 621-634 ◽

Cited By ~ 4

Author(s):

Jingyin Tang ◽

Corene J. Matyas

Keyword(s):

Spatial Analysis ◽

High Performance ◽

Large Scale ◽

Doppler Radar ◽

Weather Events ◽

Data Architecture ◽

Research Grade ◽

High Performance Computing Cluster ◽

Time Systems ◽

Performance Computing

AbstractThe creation of a 3D mosaic is often the first step when using the high-spatial- and temporal-resolution data produced by ground-based radars. Efficient yet accurate methods are needed to mosaic data from dozens of radar to better understand the precipitation processes in synoptic-scale systems such as tropical cyclones. Research-grade radar mosaic methods of analyzing historical weather events should utilize data from both sides of a moving temporal window and process them in a flexible data architecture that is not available in most stand-alone software tools or real-time systems. Thus, these historical analyses require a different strategy for optimizing flexibility and scalability by removing time constraints from the design. This paper presents a MapReduce-based playback framework using Apache Spark’s computational engine to interpolate large volumes of radar reflectivity and velocity data onto 3D grids. Designed as being friendly to use on a high-performance computing cluster, these methods may also be executed on a low-end configured machine. A protocol is designed to enable interoperability with GIS and spatial analysis functions in this framework. Open-source software is utilized to enhance radar usability in the nonspecialist community. Case studies during a tropical cyclone landfall shows this framework’s capability of efficiently creating a large-scale high-resolution 3D radar mosaic with the integration of GIS functions for spatial analysis.

Download Full-text

High-Performance Computing Framework Based on Distributed Systems for Large-Scale Neurophysiological Data

10.21203/rs.3.rs-136986/v1 ◽

2021 ◽

Author(s):

Mohsen Hadianpour ◽

Ehsan Rezayat ◽

Mohammad-Reza Dehaqani

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Electrophysiological Recording ◽

Neural Data ◽

Data Framework ◽

Neurophysiological Data ◽

Computing Framework ◽

Performance Computing ◽

Neuroscience Community

Abstract Due to the significantly drastic progress and improvement in neurophysiological recording technologies, neuroscientists have faced various complexities dealing with unstructured large-scale neural data. In the neuroscience community, these complexities could create serious bottlenecks in storing, sharing, and processing neural datasets. In this article, we developed a distributed high-performance computing (HPC) framework called `Big neuronal data framework' (BNDF), to overcome these complexities. BNDF is based on open-source big data frameworks, Hadoop and Spark providing a flexible and scalable structure. We examined BNDF on three different large-scale electrophysiological recording datasets from nonhuman primate’s brains. Our results exhibited faster runtimes with scalability due to the distributed nature of BNDF. We compared BNDF results to a widely used platform like MATLAB in an equitable computational resource. Compared with other similar methods, using BNDF provides more than five times faster performance in spike sorting as a usual neuroscience application.

Download Full-text

Measuring and tuning energy efficiency on large scale high performance computing platforms.

10.2172/1035312 ◽

2011 ◽

Cited By ~ 1

Author(s):

James H., III Laros

Keyword(s):

Energy Efficiency ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Computing Platforms ◽

Performance Computing

Download Full-text

Taxonomic assignment for large-scale metagenomic data on high-perfomance systems

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/33/2/10753 ◽

2017 ◽

Vol 33 (2) ◽

pp. 119-130

Author(s):

Vinh Van Le ◽

Hoai Van Tran ◽

Hieu Ngoc Duong ◽

Giang Xuan Bui ◽

Lang Van Tran

Keyword(s):

High Performance Computing ◽

Assignment Problem ◽

High Performance ◽

Large Scale ◽

Computing System ◽

Metagenomic Data ◽

Taxonomic Assignment ◽

High Performance Computing System ◽

Powerful Approach ◽

Performance Computing

Metagenomics is a powerful approach to study environment samples which do not require the isolation and cultivation of individual organisms. One of the essential tasks in a metagenomic project is to identify the origin of reads, referred to as taxonomic assignment. Due to the fact that each metagenomic project has to analyze large-scale datasets, the metatenomic assignment is very much computation intensive. This study proposes a parallel algorithm for the taxonomic assignment problem, called SeMetaPL, which aims to deal with the computational challenge. The proposed algorithm is evaluated with both simulated and real datasets on a high performance computing system. Experimental results demonstrate that the algorithm is able to achieve good performance and utilize resources of the system efficiently. The software implementing the algorithm and all test datasets can be downloaded at http://it.hcmute.edu.vn/bioinfo/metapro/SeMetaPL.html.

Download Full-text

Cloud Computing for Scientific Simulation and High Performance Computing

Principles, Methodologies, and Service-Oriented Approaches for Cloud Computing ◽

10.4018/978-1-4666-2854-0.ch003 ◽

2013 ◽

pp. 51-70

Author(s):

Adrian Jackson ◽

Michèle Weiland

Keyword(s):

Cloud Computing ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Parallel Programs ◽

Small Scale ◽

Cloud Infrastructure ◽

Scientific Simulations ◽

Cloud Infrastructures ◽

Performance Computing

This chapter describes experiences using Cloud infrastructures for scientific computing, both for serial and parallel computing. Amazon’s High Performance Computing (HPC) Cloud computing resources were compared to traditional HPC resources to quantify performance as well as assessing the complexity and cost of using the Cloud. Furthermore, a shared Cloud infrastructure is compared to standard desktop resources for scientific simulations. Whilst this is only a small scale evaluation these Cloud offerings, it does allow some conclusions to be drawn, particularly that the Cloud can currently not match the parallel performance of dedicated HPC machines for large scale parallel programs but can match the serial performance of standard computing resources for serial and small scale parallel programs. Also, the shared Cloud infrastructure cannot match dedicated computing resources for low level benchmarks, although for an actual scientific code, performance is comparable.

Download Full-text