Fault-Tolerant and Data-Intensive Resource Scheduling and Management for Scientific Applications in Cloud Computing

Cloud computing is a fully fledged, matured and flexible computing paradigm that provides services to scientific and business applications in a subscription-based environment. Scientific applications such as Montage and CyberShake are organized scientific workflows with data and compute-intensive tasks and also have some special characteristics. These characteristics include the tasks of scientific workflows that are executed in terms of integration, disintegration, pipeline, and parallelism, and thus require special attention to task management and data-oriented resource scheduling and management. The tasks executed during pipeline are considered as bottleneck executions, the failure of which result in the wholly futile execution, which requires a fault-tolerant-aware execution. The tasks executed during parallelism require similar instances of cloud resources, and thus, cluster-based execution may upgrade the system performance in terms of make-span and execution cost. Therefore, this research work presents a cluster-based, fault-tolerant and data-intensive (CFD) scheduling for scientific applications in cloud environments. The CFD strategy addresses the data intensiveness of tasks of scientific workflows with cluster-based, fault-tolerant mechanisms. The Montage scientific workflow is considered as a simulation and the results of the CFD strategy were compared with three well-known heuristic scheduling policies: (a) MCT, (b) Max-min, and (c) Min-min. The simulation results showed that the CFD strategy reduced the make-span by 14.28%, 20.37%, and 11.77%, respectively, as compared with the existing three policies. Similarly, the CFD reduces the execution cost by 1.27%, 5.3%, and 2.21%, respectively, as compared with the existing three policies. In case of the CFD strategy, the SLA is not violated with regard to time and cost constraints, whereas it is violated by the existing policies numerous times.

Download Full-text

A Novel Completion-Time-Minimization Scheduling Approach of Scientific Workflows Over Heterogeneous Cloud Computing Systems

International Journal of Web Services Research ◽

10.4018/ijwsr.2019100101 ◽

2019 ◽

Vol 16 (4) ◽

pp. 1-20

Author(s):

S. Sabahat H. Bukhari ◽

Yunni Xia

Keyword(s):

Cloud Computing ◽

Time Management ◽

Completion Time ◽

Large Scale ◽

Scientific Workflow ◽

Scientific Workflows ◽

Transmission Delays ◽

Computing Paradigm ◽

Heterogeneous Cloud ◽

The Impact

The cloud computing paradigm provides an ideal platform for supporting large-scale scientific-workflow-based applications over the internet. However, the scheduling and execution of scientific workflows still face various challenges such as cost and response time management, which aim at handling acquisition delays of physical servers and minimizing the overall completion time of workflows. A careful investigation into existing methods shows that most existing approaches consider static performance of physical machines (PMs) and ignore the impact of resource acquisition delays in their scheduling models. In this article, the authors present a meta-heuristic-based method to scheduling scientific workflows aiming at reducing workflow completion time through appropriately managing acquisition and transmission delays required for inter-PM communications. The authors carry out extensive case studies as well based on real-world commercial cloud sand multiple workflow templates. Experimental results clearly show that the proposed method outperforms the state-of-art ones such as ICPCP, CEGA, and JIT-C in terms of workflow completion time.

Download Full-text

Extended Balanced Scheduler with Clustering and Rep- lication for Data Intensive Scientific Workflow Applications in Cloud Computing

Journal of Electronic Research and Application ◽

10.26689/jera.v2i3.380 ◽

2018 ◽

Vol 2 (3) ◽

Author(s):

Satwinder Kaur ◽

Mehak Aggarwal

Keyword(s):

Cloud Computing ◽

Task Scheduling ◽

Execution Time ◽

Performance Metrics ◽

Scheduling Algorithm ◽

Research Work ◽

Scientific Workflow ◽

It Services ◽

Data Intensive ◽

Fine Grain

Cloud computing is an advance computing model using which several applications, data and countless IT services are provided over the Internet. Task scheduling plays a crucial role in cloud computing systems. The issue of task scheduling can be viewed as the finding or searching an optimal mapping/assignment of set of subtasks of different tasks over the available set of resources so that we can achieve the desired goals for tasks. With the enlargement of users of cloud the tasks need to be scheduled. Cloudâ€™s performance depends on the task scheduling algorithms used. Numerous algorithms have been submitted in the past to solve the task scheduling problem for heterogeneous network of computers. The existing research work proposes different methods for data intensive applications which are energy and deadline aware task scheduling method. As scientific workflow is combination of fine grain and coarse grain task. Every task scheduled to VM has system overhead. If multiple fine grain task are executing in scientific workflow, it increase the scheduling overhead. To overcome the scheduling overhead, multiple small tasks has been combined to large task, which decrease the scheduling overhead and improve the execution time of the workflow. Horizontal clustering has been used to cluster the fine grained task further replication technique has been combined. The proposed scheduling algorithm improves the performance metrics such as execution time and cost. Further this research can be extended with improved clustering technique and replication methods.

Download Full-text

Current Trends in Cloud Computing for Data Science Experiments

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2021100105 ◽

2021 ◽

Vol 11 (4) ◽

pp. 80-99

Author(s):

Syed Imran Jami ◽

Siraj Munir

Keyword(s):

Cloud Computing ◽

Resource Sharing ◽

Data Science ◽

Job Scheduling ◽

Resource Scheduling ◽

Data Intensive ◽

Organizational Issues ◽

Current Trends ◽

Recent Trends ◽

And Storage

Recent trends in data-intensive experiments require extensive computing and storage resources that are now handled using cloud resources. Industry experts and researchers use cloud-based services and resources to get analytics of their data to avoid inter-organizational issues including power overhead on local machines, cost associated with maintaining and running infrastructure, etc. This article provides detailed review of selected metrics for cloud computing according to the requirements of data science and big data that includes (1) load balancing, (2) resource scheduling, (3) resource allocation, (4) resource sharing, and (5) job scheduling. The major contribution of this review is the inclusion of these metrics collectively which is the first attempt towards evaluating the latest systems in the context of data science. The detailed analysis shows that cloud computing needs research in its association with data-intensive experiments with emphasis on the resource scheduling area.

Download Full-text

Resource Scheduling in Cloud Computing Based on a Hybridized Whale Optimization Algorithm

Applied Sciences ◽

10.3390/app9224893 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4893 ◽

Cited By ~ 15

Author(s):

Ivana Strumberger ◽

Nebojsa Bacanin ◽

Milan Tuba ◽

Eva Tuba

Keyword(s):

Cloud Computing ◽

Swarm Intelligence ◽

Optimization Algorithm ◽

System Performance ◽

Resource Scheduling ◽

Whale Optimization Algorithm ◽

Cloud System ◽

Data Sets ◽

Computing Paradigm ◽

Whale Optimization

The cloud computing paradigm, as a novel computing resources delivery platform, has significantly impacted society with the concept of on-demand resource utilization through virtualization technology. Virtualization enables the usage of available physical resources in a way that multiple end-users can share the same underlying hardware infrastructure. In cloud computing, due to the expectations of clients, as well as on the providers side, many challenges exist. One of the most important nondeterministic polynomial time (NP) hard challenges in cloud computing is resource scheduling, due to its critical impact on the cloud system performance. Previously conducted research from this domain has shown that metaheuristics can substantially improve cloud system performance if they are used as scheduling algorithms. This paper introduces a hybridized whale optimization algorithm, that falls into the category of swarm intelligence metaheuristics, adapted for tackling the resource scheduling problem in cloud environments. To more precisely evaluate performance of the proposed approach, original whale optimization was also adapted for resource scheduling. Considering the two most important mechanisms of any swarm intelligence algorithm (exploitation and exploration), where the efficiency of a swarm algorithm depends heavily on their adjusted balance, the original whale optimization algorithm was enhanced by addressing its weaknesses of inappropriate exploitation–exploration trade-off adjustments and the premature convergence. The proposed hybrid algorithm was first tested on a standard set of bound-constrained benchmarks with the goal to more accurately evaluate its performance. After, simulations were performed using two different resource scheduling models in cloud computing with real, as well as with artificial data sets. Simulations were performed on the robust CloudSim platform. A hybrid whale optimization algorithm was compared with other state-of-the-art metaheurisitcs and heuristics, as well as with the original whale optimization for all conducted experiments. Achieved results in all simulations indicate that the proposed hybrid whale optimization algorithm, on average, outperforms the original version, as well as other heuristics and metaheuristics. By using the proposed algorithm, improvements in tackling the resource scheduling issue in cloud computing have been established, as well enhancements to the original whale optimization implementation.

Download Full-text

An Exotic IWD - SVR Based Approach for Failure Prognostication in Cloud-Based Scientific Workflows

10.21203/rs.3.rs-716843/v1 ◽

2021 ◽

Author(s):

Sridevi S ◽

Jeevaa Katiravan Jeevaa Katiravan

Keyword(s):

Large Scale ◽

Performance Metrics ◽

Prediction Models ◽

Fault Tolerant ◽

Scientific Workflow ◽

Scientific Workflows ◽

Support Vector ◽

Learning Approaches ◽

Task Failure ◽

Proactive Measures

Abstract Scientific workflows deserve the emerging attention in sophisticated large-scale scientific problem-solving environments. Though a single task failure occurs in workflow based applications, due to its task dependency nature the reliability of the overall system will be affected drastically. Hence rather than reactive fault tolerant approaches, proactive measures are vital in scientific workflows. This work puts forth an attempt to concentrate on the exploration issue of structuring an Exotic Intelligent Water Drops - Support Vector Regression-based approach for task failure prognostication which facilitates proactive fault tolerance in scientific workflow applications. The failure prediction models in this study have been implemented through SVR-based machine learning approaches and its precision accuracy is optimized by IWDA and various performance metrics were evaluated. The experimental results prove that the proposed approach performs better compared with the other existing techniques.

Download Full-text

The PBase Scientific Workflow Provenance Repository

International Journal of Digital Curation ◽

10.2218/ijdc.v9i2.332 ◽

2014 ◽

Vol 9 (2) ◽

pp. 28-38 ◽

Cited By ~ 16

Author(s):

Víctor Cuevas-Vicenttín ◽

Parisa Kianmajd ◽

Bertram Ludäscher ◽

Paolo Missier ◽

Fernando Chirigati ◽

...

Keyword(s):

Scientific Workflow ◽

Scientific Workflows ◽

Data Reuse ◽

Data Intensive ◽

Research Collaborations ◽

Provenance Data ◽

Scientific Experiments ◽

History Of ◽

Scientific Results ◽

User Friendly

Scientific workflows and their supporting systems are becoming increasingly popular for compute-intensive and data-intensive scientific experiments. The advantages scientific workflows offer include rapid and easy workflow design, software and data reuse, scalable execution, sharing and collaboration, and other advantages that altogether facilitate “reproducible science”. In this context, provenance – information about the origin, context, derivation, ownership, or history of some artifact – plays a key role, since scientists are interested in examining and auditing the results of scientific experiments. However, in order to perform such analyses on scientific results as part of extended research collaborations, an adequate environment and tools are required. Concretely, the need arises for a repository that will facilitate the sharing of scientific workflows and their associated execution traces in an interoperable manner, also enabling querying and visualization. Furthermore, such functionality should be supported while taking performance and scalability into account. With this purpose in mind, we introduce PBase: a scientific workflow provenance repository implementing the ProvONE proposed standard, which extends the emerging W3C PROV standard for provenance data with workflow specific concepts. PBase is built on the Neo4j graph database, thus offering capabilities such as declarative and efficient querying. Our experiences demonstrate the power gained by supporting various types of queries for provenance data. In addition, PBase is equipped with a user friendly interface tailored for the visualization of scientific workflow provenance data, making the specification of queries and the interpretation of their results easier and more effective.

Download Full-text

Cost Effective Heuristic workflow scheduling algorithm in Cloud under Deadline Constraint

Recent Patents on Computer Science ◽

10.2174/2213275912666190822113039 ◽

2019 ◽

Vol 12 ◽

Author(s):

Jasraj Meena ◽

Manu Vardhan

Keyword(s):

Cloud Computing ◽

Scheduling Algorithm ◽

Cost Effective ◽

Scientific Workflow ◽

Scientific Workflows ◽

Workflow Scheduling ◽

Performance Variation ◽

Acquisition Delay ◽

Very High ◽

Deadline Constraint

Cloud computing is used to deliver IT resources over the internet. Due to the popularity of cloud computing, nowadays, most of the scientific workflows are shifted towards this environment. There are lots of algorithms has been proposed in the literature to schedule scientific workflows in the cloud, but their execution cost is very high as well as they are not meeting the user-defined deadline constraint. This paper focuses on satisfying the userdefined deadline of a scientific workflow while minimizing the total execution cost. So, to achieve this, we have proposed a Cost-Effective under Deadline (CEuD) constraint workflow scheduling algorithm. The proposed CEuD algorithm considers all the essential features of Cloud and resolves the major issues such as performance variation, and acquisition delay. We have compared the proposed CEuD algorithm with the existing literature algorithms for scientific workflows (i.e., Montage, Epigenomics, and CyberShake) and getting better results for minimizing the overall execution cost of the workflow while satisfying the user-defined deadline.

Download Full-text

Design of a Fault Tolerant Strategy for Resource Scheduling in Cloud Environment

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1519.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5121-5128

Keyword(s):

Cloud Computing ◽

Failure Rate ◽

Virtual Machines ◽

Fault Tolerant ◽

Previous History ◽

Resource Scheduling ◽

Efficient Manner ◽

Computing Power ◽

Efficient Resource

Cloud computing supports the technological need of the industry supporting many other technologies. Also, the demand for computing power and storage by recent technologies is reasonably growing in a drastic way. Cloud computing, serving for these technologies are to be developed with advancements that lead to performance improvement both in support to the technologies like block-chain and big data. The allocation of cloud resources is an important strategy to be followed in a wiser manner to incorporate the needs of extra ordinary computing power. In this paper, an efficient resource allocation strategy (FTVMA) is introduced that involves the creation of effective virtual machines (VMs) and performs VM allocation in an efficient manner by considering the failure rates, previous history of failure of VM, execution efficiency as a part of effective scheduling. There exist many reasons for cloudlet failure in VMs. Some of them are overloading of VMs and non-availability of VMs. The introduced FTVMA algorithm considers the failure rate of the physical machine, load of virtual machines and the cost priority of the tasks in order to achieve Quality of Service (QoS) and Quality of Experience (QoE) of the user. The FTVMA methodology proposed in this paper works better for computation intensive VMs and is tested using CloudSim environment. The QoS metrics used to measure the performance of the proposed algorithm are Makespan and VM Utilization. The metric to measure QoE are Priority Miss Rate and Failure Rate. The proposed algorithm shows its improvement in terms of the QoS and QoE metrics. The results obtained are compared with the existing resource scheduling algorithms and it is inferred that the proposed algorithm performs better in terms of QoS and QoE.

Download Full-text

Design of Energy Aware Scheduling Algorithm for Executing Scientific Workflows in Cloud

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2013.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 1305-1311

Keyword(s):

Energy Efficient ◽

Virtual Machines ◽

Scheduling Algorithm ◽

Scientific Workflow ◽

Scientific Workflows ◽

Cloud Environment ◽

Scientific Applications ◽

Energy Aware ◽

Efficient Manner ◽

The Given

The usage of cloud computing and its resources for the execution of scientific workflow is a rapidly increasing demand. The Scientific applications are generally large in scale; even a single scientific workflow includes more number of complex tasks. Execution of these tasks can be made successful only by deploying it in the cloud virtual machines, because only cloud environment can only provide very large number of computing assets. In cloud, every processing resource is given as Virtual Machine. Any scientific workflow deployed in the cloud needs large number of virtual machines so; huge amount of computational energy is spent by the virtual machines to execute multifaceted scientific workflows. Hence there arises the need to utilize the cloud resources in an energy efficient way. Also, if the virtual machines are planned to schedule in an energy efficient manner there is an increase of makepsan of the workflow which is going to be an important parameter for completing the workflow within the deadline. So, the need for executing scientific workflows in energy efficient way with reduced makespan becomes a major issue among the researchers. It also becomes very challenging task to executing a scientific workflow in within the given deadline of a task in the given workflow. To address these issues, a new Energy Aware workflow scheduling algorithm is proposed and designed with improved makespan for the execution of different scientific applications in cloud environment.

Download Full-text

Autonomic fault tolerant scheduling approach for scientific workflows in Cloud computing

Concurrent Engineering ◽

10.1177/1063293x14567783 ◽

2015 ◽

Vol 23 (1) ◽

pp. 27-39 ◽

Cited By ~ 30

Author(s):

Anju Bala ◽

Inderveer Chana

Keyword(s):

Cloud Computing ◽

Fault Tolerant ◽

Scientific Workflows

Download Full-text