BioWorkbench: a high-performance framework for managing and analyzing bioinformatics experiments

Scheduling and Resource Provisioning Algorithms for ScientificWorkflows on Commercial Clouds

10.26686/wgtn.17071976 ◽

2021 ◽

Author(s):

◽

Vahid Arabnejad

Keyword(s):

High Performance ◽

Large Scale ◽

Cost Model ◽

Universal Access ◽

Scientific Workflow ◽

Resource Provisioning ◽

Scheduling Problem ◽

Computing Paradigm ◽

Computationally Intensive ◽

Workflow Tasks

<p>Basic science is becoming ever more computationally intensive, increasing the need for large-scale compute and storage resources, be they within a High-Performance Computer cluster, or more recently, within the cloud. Commercial clouds have increasingly become a viable platform for hosting scientific analyses and computation due to their elasticity, recent introduction of specialist hardware, and pay-as-you-go cost model. This computing paradigm therefore presents a low capital and low barrier alternative to operating dedicated eScience infrastructure. Indeed, commercial clouds now enable universal access to capabilities previously available to only large well funded research groups. While the potential benefits of cloud computing are clear, there are still significant technical hurdles associated with obtaining the best execution efficiency whilst trading off cost. In most cases, large scale scientific computation is represented as a workflow for scheduling and runtime provisioning. Such scheduling becomes an even more challenging problem on cloud systems due to the dynamic nature of the cloud, in particular, the elasticity, the pricing models (both static and dynamic), the non-homogeneous resource types and the vast array of services. This mapping of workflow tasks onto a set of provisioned instances is an example of the general scheduling problem and is NP-complete. In addition, certain runtime constraints, the most typical being the cost of the computation and the time which that computation requires to complete, must be met. This thesis addresses 'the scientific workflow scheduling problem in cloud', which is to schedule workflow tasks on cloud resources in a way that users meet their defined constraints such as budget and deadline, and providers maximize profits and resource utilization. Moreover, it explores different mechanisms and strategies for distributing defined constraints over a workflow and investigate its impact on the overall cost of the resulting schedule.</p>

Download Full-text

An effective drug-disease associations prediction model based on graphic representation learning over multi-biomolecular network

BMC Bioinformatics ◽

10.1186/s12859-021-04553-2 ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Hanjing Jiang ◽

Yabing Huang

Keyword(s):

High Performance ◽

Large Scale ◽

Representation Learning ◽

Biological Data ◽

Graph Representation ◽

Data Set ◽

Validation Experiment ◽

Biomolecular Network ◽

Disease Associations ◽

Drug Reposition

Abstract Background Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. Results In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. Conclusions The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.

Download Full-text

An Integrative Framework for Stakeholder Engagement Using the Basin Futures Platform

Water ◽

10.3390/w12092398 ◽

2020 ◽

Vol 12 (9) ◽

pp. 2398 ◽

Cited By ~ 1

Author(s):

Jackie O’Sullivan ◽

Carmel Pollino ◽

Peter Taylor ◽

Ashmita Sengupta ◽

Amit Parashar

Keyword(s):

Stakeholder Engagement ◽

Web Application ◽

High Performance ◽

Large Scale ◽

Agricultural Development ◽

Environmental Flows ◽

Entry Level ◽

Integrative Framework ◽

Significant Research ◽

Research Challenge

Water resources are under growing pressures globally, and better basin planning is crucial to alleviate current and future water scarcity issues. Communicating the complex interconnections and needs of natural and human systems is a significant research challenge. With advances in cyberinfrastructure allowing for new innovative approaches to basin planning, this same technology can also facilitate better stakeholder engagement. The potential benefits of using digital basin planning platforms for stakeholder engagement are immense; yet, there is limited guidance on how to best use these platforms for more effective stakeholder engagement in water-related issues and projects. We detail our digital platform, Basin Futures, and highlight the potential uses for stakeholder engagement through an integrative framework across different assessment levels. Basin Futures is a web application that is an entry-level modelling tool that aims to support rapid and exploratory basin planning globally. As a cloud-based tool, it brings together high-performance computing and large-scale global datasets to make data analysis accessible and efficient. We explore the potential use of the tool through three case studies exploring agricultural development, transboundary water-sharing agreements and allocating water for environmental flows.

Download Full-text

Provenance-and machine learning-based recommendation of parameter values in scientific workflows

PeerJ Computer Science ◽

10.7717/peerj-cs.606 ◽

2021 ◽

Vol 7 ◽

pp. e606

Author(s):

Daniel Silva Junior ◽

Esther Pacitti ◽

Aline Paes ◽

Daniel de Oliveira

Keyword(s):

Machine Learning ◽

High Performance ◽

User Preferences ◽

Scientific Workflows ◽

Machine Learning Techniques ◽

Provenance Data ◽

Learning Techniques ◽

Parameter Values ◽

And Storage ◽

Composition Monitoring

Scientific Workflows (SWfs) have revolutionized how scientists in various domains of science conduct their experiments. The management of SWfs is performed by complex tools that provide support for workflow composition, monitoring, execution, capturing, and storage of the data generated during execution. In some cases, they also provide components to ease the visualization and analysis of the generated data. During the workflow’s composition phase, programs must be selected to perform the activities defined in the workflow specification. These programs often require additional parameters that serve to adjust the program’s behavior according to the experiment’s goals. Consequently, workflows commonly have many parameters to be manually configured, encompassing even more than one hundred in many cases. Wrongly parameters’ values choosing can lead to crash workflows executions or provide undesired results. As the execution of data- and compute-intensive workflows is commonly performed in a high-performance computing environment e.g., (a cluster, a supercomputer, or a public cloud), an unsuccessful execution configures a waste of time and resources. In this article, we present FReeP—Feature Recommender from Preferences, a parameter value recommendation method that is designed to suggest values for workflow parameters, taking into account past user preferences. FReeP is based on Machine Learning techniques, particularly in Preference Learning. FReeP is composed of three algorithms, where two of them aim at recommending the value for one parameter at a time, and the third makes recommendations for n parameters at once. The experimental results obtained with provenance data from two broadly used workflows showed FReeP usefulness in the recommendation of values for one parameter. Furthermore, the results indicate the potential of FReeP to recommend values for n parameters in scientific workflows.

Download Full-text

SISTEM INFORMASI PROGRAM MAHASISWA WIRAUSAHA UNIVERSITAS NEGERI PADANG

Voteteknika (Vocational Teknik Elektronika dan Informatika) ◽

10.24036/voteteknika.v7i2.104400 ◽

2019 ◽

Vol 7 (2) ◽

pp. 138

Author(s):

Wahidin Saputra ◽

Elfi Tasrif

Keyword(s):

Higher Education ◽

Information System ◽

Data Storage ◽

Web Application ◽

High Performance ◽

Large Scale ◽

Ministry Of Education ◽

Application Development ◽

Directorate General ◽

Student Program

Entrepreneurship plays an important role in Indonesia's development. Entrepreneurship is important because of the magnitude of the role played by entrepreneurs in overcoming various problems of national economic development such as poverty alleviation, high unemployment, the Entrepreneurial Student Program is a program of the Ministry of Education and Culture's Directorate General of Higher Education implemented and developed by universities. Padang State University is one of the universities that has received assistance from the Directorate General of Higher Education. The Entrepreneurial Student Program Information System built has a web-based display, where with this information system students or other users can access anytime and anywhere to get information about the Entrepreneurial Student Program. For proposers, they can input their proposals directly through this information system and facilitate data storage for managers of the Entrepreneurial Student Program. This system is built using the Yii2 Framework, a component-based PHP programming that has high performance for large-scale web application development. Keywords: Information Systems, Entrepreneurial Student Programs, Yii2 Framework

Download Full-text

User Steering Support in Large-scale Workflows

10.5753/sbbd_estendido.2021.18185 ◽

2021 ◽

Author(s):

Renan Souza ◽

Marta Mattoso ◽

Patrick Valduriez

Keyword(s):

Performance Indicators ◽

High Performance ◽

Large Scale ◽

Provenance Data ◽

Management Concepts ◽

Data Files ◽

The Impact ◽

Computing Machines ◽

Performance Computing ◽

Fine Tune

Large-scale workflows that execute on High-Performance Computing machines need to be dynamically steered by users. This means that users analyze big data files, assess key performance indicators, fine-tune parameters, and evaluate the tuning impacts while the workflows generate multiple files, which is challenging. If one does not keep track of such interactions (called user steering actions), it may be impossible to understand the consequences of steering actions and to reproduce the results. This thesis proposes a generic approach to enable tracking user steering actions by characterizing, capturing, relating, and analyzing them by leveraging provenance data management concepts. Experiments with real users show that the approach enabled the understanding of the impact of steering actions while incurring negligible overhead.

Download Full-text

bíogo: a simple high-performance bioinformatics toolkit for the Go language

10.1101/005033 ◽

2014 ◽

Cited By ~ 6

Author(s):

R Daniel Kortschak ◽

David L Adelson

Keyword(s):

High Performance ◽

Large Scale ◽

Large Data ◽

Biological Data ◽

Data Sets ◽

Barriers To Entry ◽

Data Types ◽

Concurrent Processing ◽

Computationally Intensive ◽

And Performance

bíogo is a framework designed to ease development and maintenance of computationally intensive bioinformatics applications. The library is written in the Go programming language, a garbage-collected, strictly typed compiled language with built in support for concurrent processing, and performance comparable to C and Java. It provides a variety of data types and utility functions to facilitate manipulation and analysis of large scale genomic and other biological data. bíogo uses a concise and expressive syntax, lowering the barriers to entry for researchers needing to process large data sets with custom analyses while retaining computational safety and ease of code review. We believe bíogo provides an excellent environment for training and research in computational biology because of its combination of strict typing, simple and expressive syntax, and high performance.

Download Full-text

High-Performance Computing on Very Large-Scale Biological Data

Current Synthetic and Systems Biology ◽

10.4172/2332-0737.1000e117 ◽

2015 ◽

Vol 03 (01) ◽

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Biological Data ◽

Performance Computing

Download Full-text

Work in progress — Integration of the scientific workflow paradigm into high performance computing and large scale data management curricula

2010 IEEE Frontiers in Education Conference (FIE) ◽

10.1109/fie.2010.5673235 ◽

2010 ◽

Author(s):

Brandeis Marshall ◽

John Springer ◽

Thomas Hacker

Keyword(s):

Data Management ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Scientific Workflow ◽

Work In Progress ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

Scheduling and Resource Provisioning Algorithms for ScientificWorkflows on Commercial Clouds

10.26686/wgtn.17071976.v1 ◽

2021 ◽

Author(s):

◽

Vahid Arabnejad

Keyword(s):

High Performance ◽

Large Scale ◽

Cost Model ◽

Universal Access ◽

Scientific Workflow ◽

Resource Provisioning ◽

Scheduling Problem ◽

Computing Paradigm ◽

Computationally Intensive ◽

Workflow Tasks

<p>Basic science is becoming ever more computationally intensive, increasing the need for large-scale compute and storage resources, be they within a High-Performance Computer cluster, or more recently, within the cloud. Commercial clouds have increasingly become a viable platform for hosting scientific analyses and computation due to their elasticity, recent introduction of specialist hardware, and pay-as-you-go cost model. This computing paradigm therefore presents a low capital and low barrier alternative to operating dedicated eScience infrastructure. Indeed, commercial clouds now enable universal access to capabilities previously available to only large well funded research groups. While the potential benefits of cloud computing are clear, there are still significant technical hurdles associated with obtaining the best execution efficiency whilst trading off cost. In most cases, large scale scientific computation is represented as a workflow for scheduling and runtime provisioning. Such scheduling becomes an even more challenging problem on cloud systems due to the dynamic nature of the cloud, in particular, the elasticity, the pricing models (both static and dynamic), the non-homogeneous resource types and the vast array of services. This mapping of workflow tasks onto a set of provisioned instances is an example of the general scheduling problem and is NP-complete. In addition, certain runtime constraints, the most typical being the cost of the computation and the time which that computation requires to complete, must be met. This thesis addresses 'the scientific workflow scheduling problem in cloud', which is to schedule workflow tasks on cloud resources in a way that users meet their defined constraints such as budget and deadline, and providers maximize profits and resource utilization. Moreover, it explores different mechanisms and strategies for distributing defined constraints over a workflow and investigate its impact on the overall cost of the resulting schedule.</p>

Download Full-text