scholarly journals Experimenting with reproducibility in bioinformatics

2017 ◽  
Author(s):  
Yang-Min Kim ◽  
Jean-Baptiste Poline ◽  
Guillaume Dumas

AbstractReproducibility has been shown to be limited in many scientific fields. This question is a fundamental tenet of the scientific activity, but the related issues of reusability of scientific data are poorly documented. Here, we present a case study of our attempt to reproduce a promising bioinformatics method [1] and illustrate the challenges to use a published method for which code and data were available. First, we tried to re-run the analysis with the code and data provided by the authors. Second, we reimplemented the method in Python to avoid dependency on a MATLAB licence and ease the execution of the code on HPCC (High-Performance Computing Cluster). Third, we assessed reusability of our reimplementation and the quality of our documentation. Then, we experimented with our own software and tested how easy it would be to start from our implementation to reproduce the results, hence attempting to estimate the robustness of the reproducibility. Finally, in a second part, we propose solutions from this case study and other observations to improve reproducibility and research efficiency at the individual and collective level.Availabilitylast version of StratiPy (Python) with two examples of reproducibility are available at GitHub [2][email protected]

2001 ◽  
Vol 11 (02n03) ◽  
pp. 187-202 ◽  
Author(s):  
DORIAN C. ARNOLD ◽  
SATHISH S. VAHDIYAR ◽  
JACK J. DONGARRA

Great advances in high-performance computing have given rise to scientific applications that place large demands on software and hardware infrastructures for both computational and data services. With these trends the necessity has emerged for distributed systems developers that once distinguished between these elements to acknowledge that indeed computational and data services are tightly coupled and need to be addressed simultaneously. In this article, we compile and discuss several strategies and techniques, like co-scheduling and co-allocation of computational and data services, dynamic storage capabilities, and quality-of-service, that can be used to help resolve some of the aforementioned issues. We present our interactions with a distributed computing system, NetSolve, and a Distributed Storage Infrastructure, IBP, as a case study of how some of these techniques can be effectively deployed and offer experimental evidence from early prototypes that validate our motivation and direction.


2021 ◽  
Vol 32 (8) ◽  
pp. 2035-2048
Author(s):  
Mochamad Asri ◽  
Dhairya Malhotra ◽  
Jiajun Wang ◽  
George Biros ◽  
Lizy K. John ◽  
...  

2016 ◽  
Vol 33 (4) ◽  
pp. 621-634 ◽  
Author(s):  
Jingyin Tang ◽  
Corene J. Matyas

AbstractThe creation of a 3D mosaic is often the first step when using the high-spatial- and temporal-resolution data produced by ground-based radars. Efficient yet accurate methods are needed to mosaic data from dozens of radar to better understand the precipitation processes in synoptic-scale systems such as tropical cyclones. Research-grade radar mosaic methods of analyzing historical weather events should utilize data from both sides of a moving temporal window and process them in a flexible data architecture that is not available in most stand-alone software tools or real-time systems. Thus, these historical analyses require a different strategy for optimizing flexibility and scalability by removing time constraints from the design. This paper presents a MapReduce-based playback framework using Apache Spark’s computational engine to interpolate large volumes of radar reflectivity and velocity data onto 3D grids. Designed as being friendly to use on a high-performance computing cluster, these methods may also be executed on a low-end configured machine. A protocol is designed to enable interoperability with GIS and spatial analysis functions in this framework. Open-source software is utilized to enhance radar usability in the nonspecialist community. Case studies during a tropical cyclone landfall shows this framework’s capability of efficiently creating a large-scale high-resolution 3D radar mosaic with the integration of GIS functions for spatial analysis.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2060
Author(s):  
Aleksandr Agafonov ◽  
Kimmo Mattila ◽  
Cuong Duong Tuan ◽  
Lars Tiede ◽  
Inge Alexander Raknes ◽  
...  

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.


2014 ◽  
Vol 17 (2) ◽  
Author(s):  
Germán Bianchini ◽  
Paola Caymes Scutari

Forest fires are a major risk factor with strong impact at eco-environmental and socio- economical levels, reasons why their study and modeling are very important. However, the models frequently have a certain level of uncertainty in some input parameters given that they must be approximated or estimated, as a consequence of diverse difficulties to accurately measure the conditions of the phenomenon in real time. This has resulted in the development of several methods for the uncertainty reduction, whose trade-off between accuracy and complexity can vary significantly. The system ESS (Evolutionary- Statistical System) is a method whose aim is to reduce the uncertainty, by combining Statistical Analysis, High Performance Computing (HPC) and Parallel Evolutionary Al- gorithms (PEAs). The PEAs use several parameters that require adjustment and that determine the quality of their use. The calibration of the parameters is a crucial task for reaching a good performance and to improve the system output. This paper presents an empirical study of the parameters tuning to evaluate the effectiveness of different configurations and the impact of their use in the Forest Fires prediction.


Author(s):  
T Van Zwijnsvoorde ◽  
M Vantorre

Container traffic and individual ships’ sizes increased dramatically over the last decades, testing the existing harbour infrastructure to its limits. An important aspect regarding the safety of the berthed vessel is the quality of the mooring configuration. A case study is presented, where an 18000 TEU container vessel is moored at a quay. The motions of the moored vessel and the forces in its lines due to ship passages are simulated, using the potential software ROPES and the UGent in-house package Vlugmoor. Focus is on the mooring plan (operational parameter) and the characteristics of the individual lines (design parameter).


Sign in / Sign up

Export Citation Format

Share Document