Experimenting with reproducibility in bioinformatics

Performance Computing ◽

Scientific Fields ◽

Research Efficiency

AbstractReproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of the scientific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our attempt to reproduce a promising bioinformatics method [1] and illustrate the challenges to use a published method for which code and data were available. First, we tried to re-run the analysis with the code and data provided by the authors. Second, we reimplemented the method in Python to avoid dependency on a MATLAB licence and ease the execution of the code on HPCC (High-Performance Computing Cluster). Third, we assessed reusability of our reimplementation and the quality of our documentation. Then, we experimented with our own software and tested how easy it would be to start from our implementation to reproduce the results, hence attempting to estimate the robustness of the reproducibility. Finally, in a second part, we propose solutions from this case study and other observations to improve reproducibility and research efficiency at the individual and collective level.Availabilitylast version of StratiPy (Python) with two examples of reproducibility are available at GitHub [2][email protected]

Spatial biodiversity modeling using high-performance computing cluster: A case study to access biological richness in Indian landscape

Geocarto International ◽

10.1080/10106049.2019.1678679 ◽

2019 ◽

pp. 1-21

Author(s):

Hariom Singh ◽

R. D. Garg ◽

Harish C. Karnatak ◽

Arijit Roy

Keyword(s):

High Performance ◽

Cluster A ◽

Performance Computing ◽

Biological Richness ◽

Computing Cluster

ON THE CONVERGENCE OF COMPUTATIONAL AND DATA GRIDS

Parallel Processing Letters ◽

10.1142/s012962640100052x ◽

2001 ◽

Vol 11 (02n03) ◽

pp. 187-202 ◽

Cited By ~ 5

Author(s):

DORIAN C. ARNOLD ◽

SATHISH S. VAHDIYAR ◽

JACK J. DONGARRA

Keyword(s):

High Performance ◽

Distributed Storage ◽

Computing System ◽

Data Services ◽

Tightly Coupled ◽

Software And Hardware ◽

Dynamic Storage ◽

Great advances in high-performance computing have given rise to scientific applications that place large demands on software and hardware infrastructures for both computational and data services. With these trends the necessity has emerged for distributed systems developers that once distinguished between these elements to acknowledge that indeed computational and data services are tightly coupled and need to be addressed simultaneously. In this article, we compile and discuss several strategies and techniques, like co-scheduling and co-allocation of computational and data services, dynamic storage capabilities, and quality-of-service, that can be used to help resolve some of the aforementioned issues. We present our interactions with a distributed computing system, NetSolve, and a Distributed Storage Infrastructure, IBP, as a case study of how some of these techniques can be effectively deployed and offer experimental evidence from early prototypes that validate our motivation and direction.

Rapidly Reconfigurable High Performance Computing Cluster

10.21236/ada438586 ◽

2005 ◽

Author(s):

Mark A. Richards ◽

Daniel P. Campbell

Keyword(s):

High Performance ◽

Performance Computing ◽

Computing Cluster

Geophysical Parameters Retrieval From Sentinel-1 Sar Data: A Case Study For High Performance Computing At EODC

24th High Performance Computing Symposium ◽

10.22360/springsim.2016.hpc.026 ◽

2016 ◽

Cited By ~ 1

Keyword(s):

High Performance ◽

Sar Data ◽

Hardware Accelerator Integration Tradeoffs for High-Performance Computing: A Case Study of GEMM Acceleration in N-Body Methods

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2021.3056045 ◽

2021 ◽

Vol 32 (8) ◽

pp. 2035-2048

Author(s):

Mochamad Asri ◽

Dhairya Malhotra ◽

Jiajun Wang ◽

George Biros ◽

Lizy K. John ◽

...

Keyword(s):

High Performance ◽

Hardware Accelerator ◽

Fast Playback Framework for Analysis of Ground-Based Doppler Radar Observations Using MapReduce Technology

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-15-0118.1 ◽

2016 ◽

Vol 33 (4) ◽

pp. 621-634 ◽

Cited By ~ 4

Author(s):

Jingyin Tang ◽

Corene J. Matyas

Keyword(s):

Spatial Analysis ◽

High Performance ◽

Large Scale ◽

Doppler Radar ◽

Weather Events ◽

Data Architecture ◽

Research Grade ◽

Time Systems ◽

AbstractThe creation of a 3D mosaic is often the first step when using the high-spatial- and temporal-resolution data produced by ground-based radars. Efficient yet accurate methods are needed to mosaic data from dozens of radar to better understand the precipitation processes in synoptic-scale systems such as tropical cyclones. Research-grade radar mosaic methods of analyzing historical weather events should utilize data from both sides of a moving temporal window and process them in a flexible data architecture that is not available in most stand-alone software tools or real-time systems. Thus, these historical analyses require a different strategy for optimizing flexibility and scalability by removing time constraints from the design. This paper presents a MapReduce-based playback framework using Apache Spark’s computational engine to interpolate large volumes of radar reflectivity and velocity data onto 3D grids. Designed as being friendly to use on a high-performance computing cluster, these methods may also be executed on a low-end configured machine. A protocol is designed to enable interoperability with GIS and spatial analysis functions in this framework. Open-source software is utilized to enhance radar usability in the nonspecialist community. Case studies during a tropical cyclone landfall shows this framework’s capability of efficiently creating a large-scale high-resolution 3D radar mosaic with the integration of GIS functions for spatial analysis.

META-pipe cloud setup and execution

F1000Research ◽

10.12688/f1000research.13204.1 ◽

2017 ◽

Vol 6 ◽

pp. 2060

Author(s):

Aleksandr Agafonov ◽

Kimmo Mattila ◽

Cuong Duong Tuan ◽

Lars Tiede ◽

Inge Alexander Raknes ◽

...

Keyword(s):

Functional Annotation ◽

High Performance ◽

Sequence Data ◽

Metagenomic Data ◽

Taxonomic Profiling ◽

Geographically Distributed ◽

Computationally Intensive ◽

And Storage ◽

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.

Tuned Forest Fire Prediction: Static Calibration of the Evolutionary Component of ‘ESS’

CLEI electronic journal ◽

10.19153/cleiej.17.2.9 ◽

2014 ◽

Vol 17 (2) ◽

Author(s):

Germán Bianchini ◽

Paola Caymes Scutari

Keyword(s):

Forest Fires ◽

High Performance ◽

Strong Impact ◽

Statistical System ◽

Static Calibration ◽

Parameters Tuning ◽

System Output ◽

The Impact ◽

Forest fires are a major risk factor with strong impact at eco-environmental and socio- economical levels, reasons why their study and modeling are very important. However, the models frequently have a certain level of uncertainty in some input parameters given that they must be approximated or estimated, as a consequence of diverse difficulties to accurately measure the conditions of the phenomenon in real time. This has resulted in the development of several methods for the uncertainty reduction, whose trade-off between accuracy and complexity can vary significantly. The system ESS (Evolutionary- Statistical System) is a method whose aim is to reduce the uncertainty, by combining Statistical Analysis, High Performance Computing (HPC) and Parallel Evolutionary Al- gorithms (PEAs). The PEAs use several parameters that require adjustment and that determine the quality of their use. The calibration of the parameters is a crucial task for reaching a good performance and to improve the system output. This paper presents an empirical study of the parameters tuning to evaluate the effectiveness of different configurations and the impact of their use in the Forest Fires prediction.

18th International Parallel and Distributed Processing Symposium, 2004. Proceedings. ◽

Performance measurement and modeling of component applications in a high performance computing environment : a case study

10.1109/ipdps.2004.1303041 ◽

2004 ◽

Cited By ~ 2

Author(s):

J. Ray ◽

N. Trebon ◽

R.C. Armstrong ◽

S. Shende ◽

A. Malony

Keyword(s):

Performance Measurement ◽

High Performance ◽

Computing Environment ◽

SAFE MOORING OF LARGE CONTAINER SHIPS AT QUAY WALLS SUBJECT TO PASSING SHIP EFFECTS

The International Journal of Maritime Engineering ◽

10.5750/ijme.v159ia4.1037 ◽

2021 ◽

Vol 159 (A4) ◽

Author(s):

T Van Zwijnsvoorde ◽

M Vantorre

Keyword(s):

Design Parameter ◽

Operational Parameter ◽

The Individual ◽

Large Container

Container traffic and individual ships’ sizes increased dramatically over the last decades, testing the existing harbour infrastructure to its limits. An important aspect regarding the safety of the berthed vessel is the quality of the mooring configuration. A case study is presented, where an 18000 TEU container vessel is moored at a quay. The motions of the moored vessel and the forces in its lines due to ship passages are simulated, using the potential software ROPES and the UGent in-house package Vlugmoor. Focus is on the mooring plan (operational parameter) and the characteristics of the individual lines (design parameter).