parallel workflow
Recently Published Documents


TOTAL DOCUMENTS

41
(FIVE YEARS 19)

H-INDEX

5
(FIVE YEARS 2)

2022 ◽  
Vol 9 ◽  
Author(s):  
Arnau Folch ◽  
Leonardo Mingari ◽  
Andrew T. Prata

Operational forecasting of volcanic ash and SO2 clouds is challenging due to the large uncertainties that typically exist on the eruption source term and the mass removal mechanisms occurring downwind. Current operational forecast systems build on single-run deterministic scenarios that do not account for model input uncertainties and their propagation in time during transport. An ensemble-based forecast strategy has been implemented in the FALL3D-8.1 atmospheric dispersal model to configure, execute, and post-process an arbitrary number of ensemble members in a parallel workflow. In addition to intra-member model domain decomposition, a set of inter-member communicators defines a higher level of code parallelism to enable future incorporation of model data assimilation cycles. Two types of standard products are automatically generated by the ensemble post-process task. On one hand, deterministic forecast products result from some combination of the ensemble members (e.g., ensemble mean, ensemble median, etc.) with an associated quantification of forecast uncertainty given by the ensemble spread. On the other hand, probabilistic products can also be built based on the percentage of members that verify a certain threshold condition. The novel aspect of FALL3D-8.1 is the automatisation of the ensemble-based workflow, including an eventual model validation. To this purpose, novel categorical forecast diagnostic metrics, originally defined in deterministic forecast contexts, are generalised here to probabilistic forecasts in order to have a unique set of skill scores valid to both deterministic and probabilistic forecast contexts. Ensemble-based deterministic and probabilistic approaches are compared using different types of observation datasets (satellite cloud detection and retrieval and deposit thickness observations) for the July 2018 Ambae eruption in the Vanuatu archipelago and the April 2015 Calbuco eruption in Chile. Both ensemble-based approaches outperform single-run simulations in all categorical metrics but no clear conclusion can be extracted on which is the best option between these two.


2021 ◽  
Vol 12 ◽  
Author(s):  
Christian Brandt ◽  
Sebastian Krautwurst ◽  
Riccardo Spott ◽  
Mara Lohde ◽  
Mateusz Jundzill ◽  
...  

In response to the SARS-CoV-2 pandemic, a highly increased sequencing effort has been established worldwide to track and trace ongoing viral evolution. Technologies, such as nanopore sequencing via the ARTIC protocol are used to reliably generate genomes from raw sequencing data as a crucial base for molecular surveillance. However, for many labs that perform SARS-CoV-2 sequencing, bioinformatics is still a major bottleneck, especially if hundreds of samples need to be processed in a recurring fashion. Pipelines developed for short-read data cannot be applied to nanopore data. Therefore, specific long-read tools and parameter settings need to be orchestrated to enable accurate genotyping and robust reference-based genome reconstruction of SARS-CoV-2 genomes from nanopore data. Here we present poreCov, a highly parallel workflow written in Nextflow, using containers to wrap all the tools necessary for a routine SARS-CoV-2 sequencing lab into one program. The ease of installation, combined with concise summary reports that clearly highlight all relevant information, enables rapid and reliable analysis of hundreds of SARS-CoV-2 raw sequence data sets or genomes. poreCov is freely available on GitHub under the GNUv3 license: github.com/replikation/poreCov.


2021 ◽  
Author(s):  
Christian Brandt ◽  
Sebastian Krautwurst ◽  
Riccardo Spott ◽  
Mara Lohde ◽  
Mateusz Jundzill ◽  
...  

In response to the SARS-CoV-2 pandemic, a highly increased sequencing effort has been established worldwide to track and trace ongoing viral evolution. Technologies such as nanopore sequencing via the ARTIC protocol are used to reliably generate genomes from raw sequencing data as a crucial base for molecular surveillance. However, for many labs that perform SARS-CoV-2 sequencing, bioinformatics is still a major bottleneck, especially if hundreds of samples need to be processed in a recurring fashion. Pipelines developed for short-read data cannot be applied to nanopore data. Therefore, specific long-read tools and parameter settings need to be orchestrated to enable accurate genotyping and robust reference-based genome reconstruction of SARS-CoV-2 genomes from nanopore data. Here we present poreCov, a highly parallel workflow written in Nextflow, using containers to wrap all the tools necessary for a routine SARS-CoV-2 sequencing lab into one program. The ease of installation, combined with concise summary reports that clearly highlight all relevant information, enables rapid and reliable analysis of hundreds of SARS-CoV-2 raw sequence data sets or genomes. poreCov is freely available on GitHub under the GNUv3 license: github.com/replikation/poreCov.


2021 ◽  
Vol 7 ◽  
pp. e527
Author(s):  
Renan Souza ◽  
Vitor Silva ◽  
Alexandre A. B. Lima ◽  
Daniel de Oliveira ◽  
Patrick Valduriez ◽  
...  

Complex scientific experiments from various domains are typically modeled as workflows and executed on large-scale machines using a Parallel Workflow Management System (WMS). Since such executions usually last for hours or days, some WMSs provide user steering support, i.e., they allow users to run data analyses and, depending on the results, adapt the workflows at runtime. A challenge in the parallel execution control design is to manage workflow data for efficient executions while enabling user steering support. Data access for high scalability is typically transaction-oriented, while for data analysis, it is online analytical-oriented so that managing such hybrid workloads makes the challenge even harder. In this work, we present SchalaDB, an architecture with a set of design principles and techniques based on distributed in-memory data management for efficient workflow execution control and user steering. We propose a distributed data design for scalable workflow task scheduling and high availability driven by a parallel and distributed in-memory DBMS. To evaluate our proposal, we develop d-Chiron, a WMS designed according to SchalaDB’s principles. We carry out an extensive experimental evaluation on an HPC cluster with up to 960 computing cores. Among other analyses, we show that even when running data analyses for user steering, SchalaDB’s overhead is negligible for workloads composed of hundreds of concurrent tasks on shared data. Our results encourage workflow engine developers to follow a parallel and distributed data-oriented approach not only for scheduling and monitoring but also for user steering.


Author(s):  
Darawan Rinchai ◽  
Jessica Roelands ◽  
Mohammed Toufiq ◽  
Wouter Hendrickx ◽  
Matthew C Altman ◽  
...  

Abstract Motivation We previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. More recently we released a third iteration (“BloodGen3” module repertoire) that comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. Custom bioinformatic tools are needed to support downstream analysis, visualization and interpretation relying on such fixed module repertoires. Results We have developed and describe here a R package, BloodGen3Module. The functions of our package permit group comparison analyses to be performed at the module-level, and to display the results as annotated fingerprint grid plots. A parallel workflow for computing module repertoire changes for individual samples rather than groups of samples is also available; these results are displayed as fingerprint heatmaps. An illustrative case is used to demonstrate the steps involved in generating blood transcriptome repertoire fingerprints of septic patients. Taken together, this resource could facilitate the analysis and interpretation of changes in blood transcript abundance observed across a wide range of pathological and physiological states. Availability The BloodGen3Module package and documentation are freely available from Github: https://github.com/Drinchai/BloodGen3Module Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Guanghong Zuo

ABSTRACTCVTree is an alignment-free algorithm to infer phylogenetic relationships from genome sequences. It had been successfully applied to study phylogeny and taxonomy of viruses, prokaryotes, and fungi based on the whole genomes, as well as chloroplasts, mitochondria, and metagenomes. Here we presented the standalone software for the CVTree algorithm. In the software, a parallel workflow for the CVTree algorithm was designed. Based on the workflow, new alignment-free methods were also implemented. And by examining the phylogeny and taxonomy of 13903 prokaryotes based on 16S rRNA sequences, we showed that CVTree software is an efficient and effective tool for the studying of phylogeny and taxonomy based on genome sequences.Availabilityhttps://github.com/ghzuo/cvtree


2020 ◽  
Author(s):  
Darawan Rinchai ◽  
Jessica Roelands ◽  
Wouter Hendrickx ◽  
Matthew C. Altman ◽  
Davide Bedognetti ◽  
...  

AbstractTranscriptional modules have been widely used for the analysis, visualization and interpretation of transcriptome data. We have previously described the construction and characterization of generic and reusable blood transcriptional module repertoires. The third and latest version that we have recently made available comprises 382 functionally annotated gene sets (modules) and encompasses 14,168 transcripts. We developed R scripts for performing module repertoire analyses and custom fingerprint visualization. These are made available here along with detailed descriptions. An illustrative public transcriptome dataset and corresponding intermediate output files are also included as supplementary material. Briefly, the steps involved in module repertoire analysis and visualization include: First, the annotation of the gene expression data matrix with module membership information. Second, running of statistical tests to determine for each module the proportion of its constitutive genes which are differentially expressed. Third, the results are expressed “at the module level” as percent of genes increased or decreased and plotted in a fingerprint grid format. A parallel workflow has been developed for computing module repertoire changes for individual samples rather than groups of samples. Such results are plotted in a heatmap format. The use case that is presented illustrates the steps involved in the generation of blood transcriptome repertoire fingerprints of septic patients at both group and individual levels.


2020 ◽  
Author(s):  
Bing Gong ◽  
Severin Hußmann ◽  
Amirpasha Mozaffari ◽  
Jan Vogelsang ◽  
Martin Schultz

<p>This study explores the adaptation of state-of-the-art deep learning architectures for video frame prediction in the context of weather and climate applications. A proof-of-concept case study was performed to predict surface temperature fields over Europe for up to 20 hours based on ERA5 reanalyses weather data. Initial results have been achieved with a PredNet and a GAN-based architecture by using various combinations of temperature, surface pressure, and 500 hPa geopotential as inputs. The results show that the GAN-based architecture outperforms the PredNet. To facilitate the massive data processing and testing of various deep learning architectures, we have developed a containerized parallel workflow for the full life-cycle of the application, which consists of data extraction, data pre-processing, training, post-processing and visualisation of results. The training for PredNet was parallelized on JUWELS supercomputer at JSC, and the training scalability performance was also evaluated.</p>


Sign in / Sign up

Export Citation Format

Share Document