Laniakea: an open solution to provide Galaxy “on-demand” instances over heterogeneous cloud infrastructures

AbstractBackgroundGalaxy is rapidly becoming the de facto standard among workflow managers for bioinformatics. A rich feature set, its overall flexibility, and a thriving community of enthusiastic users are among the main factors contributing to the popularity of Galaxy and Galaxy based applications. One of the main advantages of Galaxy consists in providing access to sophisticated analysis pipelines, e.g., involving numerous steps and large data sets, even to users lacking computer proficiency, while at the same time improving reproducibility and facilitating teamwork and data sharing among researchers. Although several Galaxy public services are currently available, these resources are often overloaded with a large number of jobs and offer little or no customization options to end users. Moreover, there are scenarios where a private Galaxy instance still constitutes a more viable alternative, including, but not limited to, heavy workloads, data privacy concerns or particular needs of customization. In such cases, a cloud-based virtual Galaxy instance can represent a solution that overcomes the typical burdens of managing the local hardware and software infrastructure needed to run and maintain a production-grade Galaxy service.ResultsHere we present Laniakea, a robust and feature-rich software suite which can be deployed on any scientific or commercial Cloud infrastructure in order to provide a “Galaxy on demand” Platform as a Service (PaaS). Laying its foundations on the INDIGO-DataCloud middleware, which has been developed to accommodate the needs of a large number of scientific communities, Laniakea can be deployed and provisioned over multiple architectures by private or public e-infrastructures. The end user interacts with Laniakea through a front-end that allows a general setup of the Galaxy instance, then Laniakea takes charge of the deployment both of the virtual hardware and all the software components. At the end of the process the user has access to a private, production-grade, yet fully customizable, Galaxy virtual instance. Laniakea’s supports the deployment of plain or cluster backed Galaxy instances, shared reference data volumes, encrypted data volumes and rapid development of novel Galaxy flavours, that is Galaxy configurations tailored for specific tasks. As a proof of concept, we provide a demo Laniakea instance hosted at an ELIXIR-IT Cloud facility.ConclusionsThe migration of scientific computational services towards virtualization and e-infrastructures is one of the most visible trends of our times. Laniakea provides Cloud administrators with a ready-to-use software suite that enables them to offer Galaxy, a popular workflow manager for bioinformatics, as an on-demand PaaS to their users. We believe that Laniakea can concur in making the many advantages of using Galaxy more accessible to a broader user base by removing most of the burdens involved in running a private instance. Finally, Laniakea’s design is sufficiently general and modular that could be easily adapted to support different services and platforms beyond Galaxy.

Download Full-text

Laniakea: an open solution to provide Galaxy “on-demand” instances over heterogeneous cloud infrastructures

GigaScience ◽

10.1093/gigascience/giaa033 ◽

2020 ◽

Vol 9 (4) ◽

Cited By ~ 1

Author(s):

Marco Antonio Tangaro ◽

Giacinto Donvito ◽

Marica Antonacci ◽

Matteo Chiara ◽

Pietro Mandreoli ◽

...

Keyword(s):

Data Privacy ◽

Optimal Solution ◽

Cloud Services ◽

Single Server ◽

Software Infrastructure ◽

Public And Private ◽

Complete Control ◽

Platform As A Service ◽

On Demand ◽

Demand Service

Abstract Background While the popular workflow manager Galaxy is currently made available through several publicly accessible servers, there are scenarios where users can be better served by full administrative control over a private Galaxy instance, including, but not limited to, concerns about data privacy, customisation needs, prioritisation of particular job types, tools development, and training activities. In such cases, a cloud-based Galaxy virtual instance represents an alternative that equips the user with complete control over the Galaxy instance itself without the burden of the hardware and software infrastructure involved in running and maintaining a Galaxy server. Results We present Laniakea, a complete software solution to set up a “Galaxy on-demand” platform as a service. Building on the INDIGO-DataCloud software stack, Laniakea can be deployed over common cloud architectures usually supported both by public and private e-infrastructures. The user interacts with a Laniakea-based service through a simple front-end that allows a general setup of a Galaxy instance, and then Laniakea takes care of the automatic deployment of the virtual hardware and the software components. At the end of the process, the user gains access with full administrative privileges to a private, production-grade, fully customisable, Galaxy virtual instance and to the underlying virtual machine (VM). Laniakea features deployment of single-server or cluster-backed Galaxy instances, sharing of reference data across multiple instances, data volume encryption, and support for VM image-based, Docker-based, and Ansible recipe-based Galaxy deployments. A Laniakea-based Galaxy on-demand service, named Laniakea@ReCaS, is currently hosted at the ELIXIR-IT ReCaS cloud facility. Conclusions Laniakea offers to scientific e-infrastructures a complete and easy-to-use software solution to provide a Galaxy on-demand service to their users. Laniakea-based cloud services will help in making Galaxy more accessible to a broader user base by removing most of the burdens involved in deploying and running a Galaxy service. In turn, this will facilitate the adoption of Galaxy in scenarios where classic public instances do not represent an optimal solution. Finally, the implementation of Laniakea can be easily adapted and expanded to support different services and platforms beyond Galaxy.

Download Full-text

The DODAS Experience on the EGI Federated Cloud

EPJ Web of Conferences ◽

10.1051/epjconf/202024507033 ◽

2020 ◽

Vol 245 ◽

pp. 07033

Author(s):

Daniele Spiga ◽

Enol Fernandez ◽

Vincenzo Spinoso ◽

Diego Ciangottini ◽

Mirco Tracolli ◽

...

Keyword(s):

Information Discovery ◽

Comprehensive Overview ◽

Platform As A Service ◽

Single Sign On ◽

Computing Platform ◽

On Demand ◽

Correct Function ◽

High Level ◽

Federated Identity ◽

Cloud Infrastructures

The EGI Cloud Compute service offers a multi-cloud IaaS federation that brings together research clouds as a scalable computing platform for research accessible with OpenID Connect Federated Identity. The federation is not limited to single sign-on, it also introduces features to facilitate the portability of applications across providers: i) a common VM image catalogue VM image replication to ensure these images will be available at providers whenever needed; ii) a GraphQL information discovery API to understand the capacities and capabilities available at each provider; and iii) integration with orchestration tools (such as Infrastructure Manager) to abstract the federation and facilitate using heterogeneous providers. EGI also monitors the correct function of every provider and collects usage information across all the infrastructure. DODAS (Dynamic On Demand Analysis Service) is an open-source Platform-as-a-Service tool, which allows to deploy software applications over heterogeneous and hybrid clouds. DODAS is one of the so-called Thematic Services of the EOSC-hub project and it instantiates on-demand container-based clusters offering a high level of abstraction to users, allowing to exploit distributed cloud infrastructures with a very limited knowledge of the underlying technologies.This work presents a comprehensive overview of DODAS integration with EGI Cloud Federation, reporting the experience of the integration with CMS Experiment submission infrastructure system.

Download Full-text

Mask R-CNN Based C. Elegans Detection with a DIY Microscope

Biosensors ◽

10.3390/bios11080257 ◽

2021 ◽

Vol 11 (8) ◽

pp. 257

Author(s):

Sebastian Fudickar ◽

Eike Jannik Nustede ◽

Eike Dreyer ◽

Julia Bornhorst

Keyword(s):

Cell Biology ◽

High Throughput Screening ◽

Low Cost ◽

Image Acquisition ◽

Rapid Development ◽

Model Organism ◽

Large Data ◽

Data Set ◽

C Elegans ◽

Do It Yourself

Caenorhabditis elegans (C. elegans) is an important model organism for studying molecular genetics, developmental biology, neuroscience, and cell biology. Advantages of the model organism include its rapid development and aging, easy cultivation, and genetic tractability. C. elegans has been proven to be a well-suited model to study toxicity with identified toxic compounds closely matching those observed in mammals. For phenotypic screening, especially the worm number and the locomotion are of central importance. Traditional methods such as human counting or analyzing high-resolution microscope images are time-consuming and rather low throughput. The article explores the feasibility of low-cost, low-resolution do-it-yourself microscopes for image acquisition and automated evaluation by deep learning methods to reduce cost and allow high-throughput screening strategies. An image acquisition system is proposed within these constraints and used to create a large data-set of whole Petri dishes containing C. elegans. By utilizing the object detection framework Mask R-CNN, the nematodes are located, classified, and their contours predicted. The system has a precision of 0.96 and a recall of 0.956, resulting in an F1-Score of 0.958. Considering only correctly located C. elegans with an [email protected] IoU, the system achieved an average precision of 0.902 and a corresponding F1 Score of 0.906.

Download Full-text

Resource-Aware Network Topology Management Framework

10.20944/preprints201905.0174.v2 ◽

2020 ◽

Author(s):

Aaqif Afzaal Abbasi ◽

Shahab Shamshirband ◽

Mohammed A. A. Al-qaness ◽

Almas Abbasi ◽

Nashat T. AL-Jallad ◽

...

Keyword(s):

Network Reliability ◽

Service Level ◽

Cloud Infrastructure ◽

Topology Management ◽

Management Framework ◽

Provider Network ◽

Computing Services ◽

Cloud Infrastructures ◽

Resource Aware ◽

Path Computation Element

Cloud infrastructure provides computing services where computing resources can be adjusted on-demand. However, the adoption of cloud infrastructures brings concerns like reliance on the service provider network, reliability, compliance for service level agreements (SLAs), etc. Software-defined networking (SDN) is a networking concept that suggests the segregation of a network’s data plane from the control plane. This concept improves networking behavior. In this paper, we present an SDN-enabled resource-aware topology framework. The proposed framework employs SLA compliance, Path Computation Element (PCE) and shares fair loading to achieve better topology features. We also present an evaluation, showcasing the potential of our framework.

Download Full-text

Privacy-Preserving Efficient Data Retrieval in IoMT Based on Low-Cost Fog Computing

Complexity ◽

10.1155/2021/6211475 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Na Wang ◽

Yuanyuan Cai ◽

Junsong Fu ◽

Jie Xu

Keyword(s):

Energy Cost ◽

Data Privacy ◽

Low Cost ◽

Rapid Development ◽

Fog Computing ◽

Bat Algorithm ◽

Data Retrieval ◽

High Energy ◽

Physiological Data ◽

Range Tree

The rapid development of Internet of Medical Things (IoMT) is remarkable. However, IoMT faces many problems including privacy disclosure, long delay of service orders, low retrieval efficiency of medical data, and high energy cost of fog computing. For these, this paper proposes a data privacy protection and efficient retrieval scheme for IoMT based on low-cost fog computing. First, a fog computing system is located between a cloud server and medical workers, for processing data retrieval requests of medical workers and orders for controlling medical devices. Simultaneously, it preprocesses physiological data of patients uploaded by IoMT, collates them into various data sets, and transmits them to medical institutions in this way. It makes the entire execution process of low latency and efficient. Second, multidimensional physiological data are of great value, and we use ciphertext retrieval to protect privacy of patient data in this paper. In addition, this paper uses range tree to build an index for storing physiological data vectors, and meanwhile a range retrieval method is also proposed to improve data search efficiency. Finally, bat algorithm (BA) is designed to allocate cost on a fog server group for significant energy cost reduction. Extensive experiments are conducted to demonstrate the efficiency of the proposed scheme.

Download Full-text

Independent Task Scheduling in Heterogeneous System

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.d9560.118419 ◽

2019 ◽

Vol 8 (4) ◽

pp. 10093-10099

Keyword(s):

Cloud Computing ◽

Energy Consumption ◽

Task Scheduling ◽

Rapid Development ◽

Network Connectivity ◽

Variable Number ◽

Cloud Infrastructure ◽

Storage Devices ◽

Quantum Genetic Algorithm ◽

Test Scenarios

Recently, the rapid development in processing speeds, fast storage devices and better network connectivity, hasaccelerated the popularization of cloud computing. Cloud computing is an on-demand-servicewhich provides users with high end servers,storage and processing capabilities where the user need not be concerned with its infrastructure.Although, there are abundant resources in the cloud infrastructure, for the efficient working and execution of tasks, task scheduling plays a crucial role. Task scheduling results in better performance (throughput) of the system along with better resource utilization which ultimately results inreduced energy consumption. At any given time, a processor should never be in idle state, as it still consumes some amount of energy. In this paper, the use of Quantum Genetic Algorithm has led to the reduction in energy consumption. The objective is to find a scheduling sequencewhich can be implemented ina cloud computing environment. Along with minimizing energy consumption, the algorithm helps reduce makespan time of a processor as well.The results show a decrease in energy consumption by 10-15% under different test scenarios involving a variable number of tasks, processors, and the number of iterations (generations) for which the algorithm was run. The algorithm converges to the desired result within 10-15 iterations, as can be seen from the results published in this paper.

Download Full-text

Characterizing PaaS Solutions Enabling Cloud Federations

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Developing Interoperable and Federated Cloud Architecture ◽

10.4018/978-1-5225-0153-4.ch004 ◽

2016 ◽

pp. 91-117

Author(s):

Tamas Pflanzner ◽

Roland Tornyai ◽

Ákos Zoltán Gorácz ◽

Attila Kertesz

Keyword(s):

Application Development ◽

Platform As A Service ◽

Cloud Application ◽

Flexible Resource ◽

Recent Trends ◽

Developer Tools ◽

Multiple Clouds ◽

Cloud Infrastructures ◽

Academic Area

Cloud Computing has opened new ways of flexible resource provisions for businesses migrating IT applications and data to the cloud to respond to new demands from customers. Recently, many businesses plan to take advantage of the flexible resource provision. Cloud Federations envisage a distributed, heterogeneous environment consisting of various cloud infrastructures by aggregating different IaaS provider capabilities coming from both the commercial and academic area. Recent solutions hide the diversity of multiple clouds and form a unified federation on top of them. Many approaches follow recent trends in cloud application development, and offer federation capabilities at the platform level, thus creating Platform-as-a-Service solutions. In this chapter the authors investigate capabilities of PaaS solutions and present a classification of these tools: what levels of developer experience they offer, what types of APIs, developer tools they support and what web GUIs they provide. Developer experience is measured by creating and executing sample applications with these PaaS tools.

Download Full-text

Bought and Sold: Exploring the Effects of Big Data on User Agency and Commodification

10.32920/ryerson.14657883.v1 ◽

2021 ◽

Author(s):

Kristia M. Pavlakos

Keyword(s):

Social Sciences ◽

Big Data ◽

Data Privacy ◽

Large Data ◽

Data Sets ◽

Privacy Regulation ◽

Scholarly Literature ◽

User Interests ◽

The Social ◽

The Relationship

Big Data1is a phenomenon that has been increasingly studied in the academy in recent years, especially in technological and scientific contexts. However, it is still a relatively new field of academic study; because it has been previously considered in mainly technological contexts, more attention needs to be drawn to the contributions made in Big Data scholarship in the social sciences by scholars like Omar Tene and Jules Polonetsky, Bart Custers, Kate Crawford, Nick Couldry, and Jose van Dijk. The purpose of this Major Research Paper is to gain insight into the issues surrounding privacy and user rights, roles, and commodification in relation to Big Data in a social sciences context. The term “Big Data” describes the collection, aggregation, and analysis of large data sets. While corporations are usually responsible for the analysis and dissemination of the data, most of this data is user generated, and there must be considerations regarding the user’s rights and roles. In this paper, I raise three main issues that shape the discussion: how users can be more active agents in data ownership, how consent measures can be made to actively reflect user interests instead of focusing on benefitting corporations, and how user agency can be preserved. Through an analysis of social sciences scholarly literature on Big Data, privacy, and user commodification, I wish to determine how these concepts are being discussed, where there have been advancements in privacy regulation and the prevention of user commodification, and where there is a need to improve these measures. In doing this, I hope to discover a way to better facilitate the relationship between data collectors and analysts, and user-generators. 1 While there is no definitive resolution as to whether or not to capitalize the term “Big Data”, in capitalizing it I chose to conform with such authors as boyd and Crawford (2012), Couldry and Turow (2014), and Dalton and Thatcher (2015), who do so in the scholarly literature.

Download Full-text

Enhanced Integrity Checking for Preserve Data Owner and User Level Privacy Using Dual Cryptography Approach

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit195346 ◽

2019 ◽

pp. 138-146

Author(s):

Poovizhi. M ◽

Raja. G

Keyword(s):

Data Storage ◽

Data Privacy ◽

Capital Expenditure ◽

Data Access ◽

Third Party ◽

Configurable Computing ◽

Local Data ◽

Cloud Data ◽

On Demand ◽

Integrity Checking

Using Cloud Storage, users can tenuously store their data and enjoy the on-demand great quality applications and facilities from a shared pool of configurable computing resources, without the problem of local data storage and maintenance. However, the fact that users no longer have physical possession of the outsourced data makes the data integrity protection in Cloud Computing a formidable task, especially for users with constrained dividing resources. From users’ perspective, including both individuals and IT systems, storing data remotely into the cloud in a flexible on-demand manner brings tempting benefits: relief of the burden for storage management, universal data access with independent geographical locations, and avoidance of capital expenditure on hardware, software, and personnel maintenances, etc. To securely introduce an effective Sanitizer and third party auditor (TPA), the following two fundamental requirements have to be met: 1) TPA should be able to capably audit the cloud data storage without demanding the local copy of data, and introduce no additional on-line burden to the cloud user; 2) The third party auditing process should take in no new vulnerabilities towards user data privacy. In this project, utilize and uniquely combine the public auditing protocols with double encryption approach to achieve the privacy-preserving public cloud data auditing system, which meets all integrity checking without any leakage of data. To support efficient handling of multiple auditing tasks, we further explore the technique of online signature to extend our main result into a multi-user setting, where TPA can perform multiple auditing tasks simultaneously. We can implement double encryption algorithm to encrypt the data twice and stored cloud server in Electronic Health Record applications.

Download Full-text

Efficient Indexing RDF Query Algorithm for Big Data

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.441.691 ◽

2013 ◽

Vol 441 ◽

pp. 691-694

Author(s):

Yi Qun Zeng ◽

Jing Bin Wang

Keyword(s):

Large Scale ◽

Rapid Development ◽

Large Data ◽

Index Structure ◽

Data Query ◽

Large Scale Data ◽

Tree Index ◽

Rdf Data ◽

Query Algorithm ◽

Scale Data

With the rapid development of information technology, data grows explosionly, how to deal with the large scale data become more and more important. Based on the characteristics of RDF data, we propose to compress RDF data. We construct an index structure called PAR-Tree Index, then base on the MapReduce parallel computing framework and the PAR-Tree Index to execute the query. Experimental results show that the algorithm can improve the efficiency of large data query.

Download Full-text