Scheduling Data Intensive Scientific Workflows in Cloud Environment Using Nature Inspired Algorithms

Author(s):  
Shikha Mehta ◽  
Parmeet Kaur

Workflows are a commonly used model to describe applications consisting of computational tasks with data or control flow dependencies. They are used in domains of bioinformatics, astronomy, physics, etc., for data-driven scientific applications. Execution of data-intensive workflow applications in a reasonable amount of time demands a high-performance computing environment. Cloud computing is a way of purchasing computing resources on demand through virtualization technologies. It provides the infrastructure to build and run workflow applications, which is called ‘Infrastructure as a Service.' However, it is necessary to schedule workflows on cloud in a way that reduces the cost of leasing resources. Scheduling tasks on resources is a NP hard problem and using meta-heuristic algorithms is an obvious choice for the same. This chapter presents application of nature-inspired algorithms: particle swarm optimization, shuffled frog leaping algorithm and grey wolf optimization algorithm to the workflow scheduling problem on the cloud. Simulation results prove the efficacy of the suggested algorithms.

Author(s):  
Chun-Yuan Lin ◽  
Jin Ye ◽  
Che-Lun Hung ◽  
Chung-Hung Wang ◽  
Min Su ◽  
...  

Current high-end graphics processing units (abbreviate to GPUs), such as NVIDIA Tesla, Fermi, Kepler series cards which contain up to thousand cores per-chip, are widely used in the high performance computing fields. These GPU cards (called desktop GPUs) should be installed in personal computers/servers with desktop CPUs; moreover, the cost and power consumption of constructing a high performance computing platform with these desktop CPUs and GPUs are high. NVIDIA releases Tegra K1, called Jetson TK1, which contains 4 ARM Cortex-A15 CPUs and 192 CUDA cores (Kepler GPU) and is an embedded board with low cost, low power consumption and high applicability advantages for embedded applications. NVIDIA Jetson TK1 becomes a new research direction. Hence, in this paper, a bioinformatics platform was constructed based on NVIDIA Jetson TK1. ClustalWtk and MCCtk tools for sequence alignment and compound comparison were designed on this platform, respectively. Moreover, the web and mobile services for these two tools with user friendly interfaces also were provided. The experimental results showed that the cost-performance ratio by NVIDIA Jetson TK1 is higher than that by Intel XEON E5-2650 CPU and NVIDIA Tesla K20m GPU card.


Author(s):  
Nikos Karacapilidis ◽  
Manolis Tzagarakis ◽  
Spyros Christodoulou ◽  
Georgia Tsiliki

This paper reports on a Web 2.0 tool that aims to facilitate and augment collaboration and decision making in data-intensive and cognitively-complex biomedical settings. The proposed tool exploits prominent high-performance computing paradigms and large data processing technologies to meaningfully search, analyze and aggregate data existing in diverse, extremely large and rapidly evolving sources. It can be viewed as an innovative workbench incorporating and orchestrating a set of interoperable services that reduce the data-intensiveness and complexity overload at critical decision points to a manageable level, thus permitting stakeholders to be more productive and concentrate on creative activities. Through a particular collaboration scenario, we explore various possibilities and challenges of managing biomedical collaboration with the use of the proposed tool. Much attention is given at the increase of volume, rate of production and complexity of the associated data types.


2019 ◽  
Author(s):  
Michael A Bekos ◽  
Henry Förster ◽  
Christian Geckeler ◽  
Lukas Holländer ◽  
Michael Kaufmann ◽  
...  

Abstract The crossing resolution of a non-planar drawing of a graph is the value of the minimum angle formed by any pair of crossing edges. Recent experiments suggest that the larger the crossing resolution is, the easier it is to read and interpret a drawing of a graph. However, maximizing the crossing resolution turns out to be an NP-hard problem in general, and only heuristic algorithms are known that are mainly based on appropriately adjusting force-directed algorithms. In this paper, we propose a new heuristic algorithm for the crossing resolution maximization problem and we experimentally compare it against the known approaches from the literature. Our experimental evaluation indicates that the new heuristic produces drawings with better crossing resolution, but this comes at the cost of slightly higher edge-length ratio, especially when the input graph is large.


Author(s):  
Mark Freshley ◽  
Paul Dixon ◽  
Paul Black ◽  
Bruce Robinson ◽  
Tom Stockton ◽  
...  

The U.S. Department of Energy (USDOE) Office of Environmental Management (EM), Office of Soil and Groundwater (EM-12), is supporting development of the Advanced Simulation Capability for Environmental Management (ASCEM). ASCEM is a state-of-the-art scientific tool and approach that is currently aimed at understanding and predicting contaminant fate and transport in natural and engineered systems. ASCEM is a modular and open source high-performance computing tool. It will be used to facilitate integrated approaches to modeling and site characterization, and provide robust and standardized assessments of performance and risk for EM cleanup and closure activities. The ASCEM project continues to make significant progress in development of capabilities, with current emphasis on integration of capabilities in FY12. Capability development is occurring for both the Platform and Integrated Toolsets and High-Performance Computing (HPC) multiprocess simulator. The Platform capabilities provide the user interface and tools for end-to-end model development, starting with definition of the conceptual model, management of data for model input, model calibration and uncertainty analysis, and processing of model output, including visualization. The HPC capabilities target increased functionality of process model representations, toolsets for interaction with Platform, and verification and model confidence testing. The integration of the Platform and HPC capabilities were tested and evaluated for EM applications in a set of demonstrations as part of Site Applications Thrust Area activities in 2012. The current maturity of the ASCEM computational and analysis capabilities has afforded the opportunity for collaborative efforts to develop decision analysis tools to support and optimize radioactive waste disposal. Recent advances in computerized decision analysis frameworks provide the perfect opportunity to bring this capability into ASCEM. This will allow radioactive waste disposal to be evaluated based on decision needs, such as disposal, closure, and maintenance. Decision models will be used in ASCEM to identify information/data needs, and model refinements that might be necessary to effectively reduce uncertainty in waste disposal decisions. Decision analysis models start with tools for framing the problem, and continue with modeling both the science side of the problem (for example, inventories, source terms, fate and transport, receptors, risk, etc.), and the cost side of the problem, which could include costs of implementation of any action that is chosen (e.g., for disposal or closure), and the values associated with those actions. The cost side of the decision problem covers economic, environmental and societal costs, which correspond to the three pillars of sustainability (economic, social, and environmental). These tools will facilitate stakeholder driven decision analysis to support optimal sustainable solutions in ASCEM.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ying-Chih Lin ◽  
Chin-Sheng Yu ◽  
Yen-Jen Lin

Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable.


2012 ◽  
pp. 841-861
Author(s):  
Chao-Tung Yang ◽  
Wen-Chung Shih

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments (Grid and PC Cluster systems), multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.


Author(s):  
M. B. Giles ◽  
I. Reguly

High-performance computing has evolved remarkably over the past 20 years, and that progress is likely to continue. However, in recent years, this progress has been achieved through greatly increased hardware complexity with the rise of multicore and manycore processors, and this is affecting the ability of application developers to achieve the full potential of these systems. This article outlines the key developments on the hardware side, both in the recent past and in the near future, with a focus on two key issues: energy efficiency and the cost of moving data. It then discusses the much slower evolution of system software, and the implications of all of this for application developers.


Author(s):  
Rohit Kumar Sachan ◽  
Dharmender Singh Kushwaha

Background: Nature-Inspired Algorithms (NIAs) are the most efficient way to solve advanced engineering and real-world optimization problems. Since the last few decades, various researchers have proposed an immense number of NIAs. These NIAs get inspiration from natural phenomenon. A young researcher attempting to undertake or solve a problem using NIAs is bogged down by a plethora of proposals that exist today. Not every algorithm is suited for all kinds of problem. Some scores over others. Objective: This paper presents a comprehensive study of seven NIAs, which have new and unique inspirations. This study shall useful to easily understand the fundamentals of NIAs for any new entrant. Conclusion: Here, we classify the NIAs as natural evolution based, swarm intelligence based, biological based, science based and others. In this survey, well-establish and relatively new NIAs, namely- Shuffled Frog Leaping Algorithm (SFLA), Firefly Algorithm (FA), Gravitational Search Algorithm (GSA), Flower Pollination Algorithm (FPA), Water Cycle Algorithm (WCA), Jaya Algorithm and Anti-Predatory NIA (APNIA), have been studied. This study presents a theoretical perspective of NIAs in a simplified form based on its source of inspiration, mathematical formulations, control parameters, features, variants and area of application, where these algorithms have been successfully applied.


Author(s):  
Geetha J. ◽  
Uday Bhaskar N ◽  
Chenna Reddy P.

Data intensive systems aim to efficiently process “big” data. Several data processing engines have evolved over past decade. These data processing engines are modeled around the MapReduce paradigm. This article explores Hadoop's MapReduce engine and propose techniques to obtain a higher level of optimization by borrowing concepts from the world of High Performance Computing. Consequently, power consumed and heat generated is lowered. This article designs a system with a pipelined dataflow in contrast to the existing unregulated “bursty” flow of network traffic, the ability to carry out both Map and Reduce tasks in parallel, and a system which incorporates modern high-performance computing concepts using Remote Direct Memory Access (RDMA). To establish the claim of an increased performance measure of the proposed system, the authors provide an algorithm for RoCE enabled MapReduce and a mathematical derivation contrasting the runtime of vanilla Hadoop. This article proves mathematically, that the proposed system functions 1.67 times faster than the vanilla version of Hadoop.


Sign in / Sign up

Export Citation Format

Share Document