Hybrid Balanced Task Clustering Algorithm for Scientific Workflows in Cloud Computing

Avinash Kaur; Pooja Gupta; Manpreet Singh

doi:10.12694/scpe.v20i2.1515

Hybrid Balanced Task Clustering Algorithm for Scientific Workflows in Cloud Computing

Scalable Computing Practice and Experience ◽

10.12694/scpe.v20i2.1515 ◽

2019 ◽

Vol 20 (2) ◽

pp. 237-258

Author(s):

Avinash Kaur ◽

Pooja Gupta ◽

Manpreet Singh

Keyword(s):

Impact Factor ◽

Large Scale ◽

Clustering Algorithm ◽

Data Transfer ◽

Scientific Workflow ◽

Scientific Workflows ◽

Coarse Grained ◽

Clustering Methods ◽

The Impact ◽

Task Clustering

Scientific Workflow is a composition of both coarse-grained and fine-grained computational tasks displaying varying execution requirements. Large-scale data transfer is involved in scientific workflows, so efficient techniques are required to reduce the makespan of the workflow. Task clustering is an efficient technique used in such a scenario that involves combining multiple tasks with shorter execution time into a single cluster to be executed on a resource. This leads to a reduction of scheduling overheads in scientific workflows and thus improvement of performance. However available task clustering methods involve clustering the tasks horizontally without the consideration of the structure of tasks in a workflow. We propose hybrid balanced task clustering algorithm that uses the parameter of impact factor of workflows along with the structure of workflow. According to this technique, tasks can be considered for clustering either vertically or horizontally based on the value of the impact factor. This minimizes the system overheads and the makespan for execution of a workflow. A simulation based evaluation is performed on real workflows that shows the proposed algorithm is efficient in recommending clusters. It shows improvement of 5-10\% in makespan time of workflow depending on the type of workflow used.

Download Full-text

A Novel Completion-Time-Minimization Scheduling Approach of Scientific Workflows Over Heterogeneous Cloud Computing Systems

International Journal of Web Services Research ◽

10.4018/ijwsr.2019100101 ◽

2019 ◽

Vol 16 (4) ◽

pp. 1-20

Author(s):

S. Sabahat H. Bukhari ◽

Yunni Xia

Keyword(s):

Cloud Computing ◽

Time Management ◽

Completion Time ◽

Large Scale ◽

Scientific Workflow ◽

Scientific Workflows ◽

Transmission Delays ◽

Computing Paradigm ◽

Heterogeneous Cloud ◽

The Impact

The cloud computing paradigm provides an ideal platform for supporting large-scale scientific-workflow-based applications over the internet. However, the scheduling and execution of scientific workflows still face various challenges such as cost and response time management, which aim at handling acquisition delays of physical servers and minimizing the overall completion time of workflows. A careful investigation into existing methods shows that most existing approaches consider static performance of physical machines (PMs) and ignore the impact of resource acquisition delays in their scheduling models. In this article, the authors present a meta-heuristic-based method to scheduling scientific workflows aiming at reducing workflow completion time through appropriately managing acquisition and transmission delays required for inter-PM communications. The authors carry out extensive case studies as well based on real-world commercial cloud sand multiple workflow templates. Experimental results clearly show that the proposed method outperforms the state-of-art ones such as ICPCP, CEGA, and JIT-C in terms of workflow completion time.

Download Full-text

The impact of inhomogeneous subgrid clumping on cosmic reionization

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/stz2986 ◽

2019 ◽

Vol 491 (2) ◽

pp. 1600-1621

Author(s):

Yi Mao ◽

Jun Koda ◽

Paul R Shapiro ◽

Ilian T Iliev ◽

Garrelt Mellema ◽

...

Keyword(s):

High Resolution ◽

Large Scale ◽

Power Spectra ◽

Coarse Grained ◽

Small Scale ◽

Sources And Sinks ◽

Coarse Grained Simulations ◽

Cent Error ◽

Spatially Varying ◽

The Impact

ABSTRACT Cosmic reionization was driven by the imbalance between early sources and sinks of ionizing radiation, both of which were dominated by small-scale structure and are thus usually treated in cosmological reionization simulations by subgrid modelling. The recombination rate of intergalactic hydrogen is customarily boosted by a subgrid clumping factor, 〈n2〉/〈n〉2, which corrects for unresolved fluctuations in gas density n on scales below the grid-spacing of coarse-grained simulations. We investigate in detail the impact of this inhomogeneous subgrid clumping on reionization and its observables, as follows: (1) Previous attempts generally underestimated the clumping factor because of insufficient mass resolution. We perform a high-resolution N-body simulation that resolves haloes down to the pre-reionization Jeans mass to derive the time-dependent, spatially varying local clumping factor and a fitting formula for its correlation with local overdensity. (2) We then perform a large-scale N-body and radiative transfer simulation that accounts for this inhomogeneous subgrid clumping by applying this clumping factor-overdensity correlation. Boosting recombination significantly slows the expansion of ionized regions, which delays completion of reionization and suppresses 21 cm power spectra on large scales in the later stages of reionization. (3) We also consider a simplified prescription in which the globally averaged, time-evolving clumping factor from the same high-resolution N-body simulation is applied uniformly to all cells in the reionization simulation, instead. Observables computed with this model agree fairly well with those from the inhomogeneous clumping model, e.g. predicting 21 cm power spectra to within 20 per cent error, suggesting it may be a useful approximation.

Download Full-text

An Exotic IWD - SVR Based Approach for Failure Prognostication in Cloud-Based Scientific Workflows

10.21203/rs.3.rs-716843/v1 ◽

2021 ◽

Author(s):

Sridevi S ◽

Jeevaa Katiravan Jeevaa Katiravan

Keyword(s):

Large Scale ◽

Performance Metrics ◽

Prediction Models ◽

Fault Tolerant ◽

Scientific Workflow ◽

Scientific Workflows ◽

Support Vector ◽

Learning Approaches ◽

Task Failure ◽

Proactive Measures

Abstract Scientific workflows deserve the emerging attention in sophisticated large-scale scientific problem-solving environments. Though a single task failure occurs in workflow based applications, due to its task dependency nature the reliability of the overall system will be affected drastically. Hence rather than reactive fault tolerant approaches, proactive measures are vital in scientific workflows. This work puts forth an attempt to concentrate on the exploration issue of structuring an Exotic Intelligent Water Drops - Support Vector Regression-based approach for task failure prognostication which facilitates proactive fault tolerance in scientific workflow applications. The failure prediction models in this study have been implemented through SVR-based machine learning approaches and its precision accuracy is optimized by IWDA and various performance metrics were evaluated. The experimental results prove that the proposed approach performs better compared with the other existing techniques.

Download Full-text

Metagenome sequence clustering with hash-based canopies

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720017400066 ◽

2017 ◽

Vol 15 (06) ◽

pp. 1740006 ◽

Cited By ~ 6

Author(s):

Mohammad Arifur Rahman ◽

Nathan LaPierre ◽

Huzefa Rangwala ◽

Daniel Barbara

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

State Of The Art ◽

Sequence Data ◽

Clustering Algorithms ◽

Clustering Methods ◽

Operational Taxonomic Units ◽

Sequence Clustering ◽

Scalable Clustering ◽

Metagenome Sequence

Metagenomics is the collective sequencing of co-existing microbial communities which are ubiquitous across various clinical and ecological environments. Due to the large volume and random short sequences (reads) obtained from community sequences, analysis of diversity, abundance and functions of different organisms within these communities are challenging tasks. We present a fast and scalable clustering algorithm for analyzing large-scale metagenome sequence data. Our approach achieves efficiency by partitioning the large number of sequence reads into groups (called canopies) using hashing. These canopies are then refined by using state-of-the-art sequence clustering algorithms. This canopy-clustering (CC) algorithm can be used as a pre-processing phase for computationally expensive clustering algorithms. We use and compare three hashing schemes for canopy construction with five popular and state-of-the-art sequence clustering methods. We evaluate our clustering algorithm on synthetic and real-world 16S and whole metagenome benchmarks. We demonstrate the ability of our proposed approach to determine meaningful Operational Taxonomic Units (OTU) and observe significant speedup with regards to run time when compared to different clustering algorithms. We also make our source code publicly available on Github. a

Download Full-text

Morphological Reconstruction-Based Image-Guided Fuzzy Clustering with a Novel Impact Factor

Journal of Healthcare Engineering ◽

10.1155/2021/6747371 ◽

2021 ◽

Vol 2021 ◽

pp. 1-13

Author(s):

Qingxue Qin ◽

Guangmei Xu ◽

Jin Zhou ◽

Rongrong Wang ◽

Hui Jiang ◽

...

Keyword(s):

Impact Factor ◽

Clustering Algorithm ◽

Influence Factor ◽

Guided Filter ◽

Edge Preserving ◽

Morphological Reconstruction ◽

Noisy Images ◽

Fcm Clustering ◽

The Impact ◽

Selection Of

The guided filter is a novel explicit image filtering method, which implements a smoothing filter on “flat patch” regions and ensures edge preserving on “high variance” regions. Recently, the guided filter has been successfully incorporated into the process of fuzzy c-means (FCM) to boost the clustering results of noisy images. However, the adaptability of the existing guided filter-based FCM methods to different images is deteriorated, as the factor ε of the guided filter is fixed to a scalar. To solve this issue, this paper proposes a new guided filter-based FCM method (IFCM_GF), in which the guidance image of the guided filter is adjusted by a newly defined influence factor ρ . By dynamically changing the impact factor ρ , the IFCM_GF acquires excellent segmentation results on various noisy images. Furthermore, to promote the segmentation accuracy of images with heavy noise and simplify the selection of the influence factor ρ , we further propose a morphological reconstruction-based improved FCM clustering algorithm with guided filter (MRIFCM_GF). In this approach, the original noisy image is reconstructed by the morphological reconstruction (MR) before clustering, and the IFCM_GF is performed on the reconstructed image by utilizing the adjusted guidance image. Due to the efficiency of the MR to remove noise, the MRIFCM_GF achieves better segmentation results than the IFCM_GF on images with heavy noise and the selection of the influence factor for the MRIFCM_GF is simple. Experiments demonstrate the effectiveness of the presented methods.

Download Full-text

dropClust: Efficient clustering of ultra-large scRNA-seq data

10.1101/170308 ◽

2017 ◽

Cited By ~ 2

Author(s):

Debajyoti Sinha ◽

Akhilesh Kumar ◽

Himanshu Kumar ◽

Sanghamitra Bandyopadhyay ◽

Debarka Sengupta

Keyword(s):

Single Cell ◽

Large Scale ◽

Best Practice ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

De Novo ◽

Single Cells ◽

Nearest Neighbor Search ◽

Locality Sensitive Hashing ◽

Clustering Methods

ABSTRACTDroplet based single cell transcriptomics has recently enabled parallel screening of tens of thousands of single cells. Clustering methods that scale for such high dimensional data without compromising accuracy are scarce. We exploit Locality Sensitive Hashing, an approximate nearest neighbor search technique to develop ade novoclustering algorithm for large-scale single cell data. On a number of real datasets, dropClust outperformed the existing best practice methods in terms of execution time, clustering accuracy and detectability of minor cell sub-types.

Download Full-text

A BRANCH AND BOUND ALGORITHM FOR WORKFLOW SCHEDULING

Vietnam Journal of Science and Technology ◽

10.15625/2525-2518/56/2/10672 ◽

2018 ◽

Vol 56 (2) ◽

pp. 246

Author(s):

Phan Thanh Toan ◽

Nguyen The Loc

Keyword(s):

Branch And Bound ◽

Large Scale ◽

Resource Constraints ◽

Data Transfer ◽

Scheduling Algorithm ◽

Large Data ◽

Scientific Workflow ◽

Branch And Bound Algorithm ◽

Workflow Scheduling ◽

Tools And Techniques

Nowadays, people are connected to the Internet and use different Cloud solutions to store, process and deliver data. The Cloud consists of a collection of virtual servers that promise to provision on-demand computational and storage resources when needed. Workflow data is becoming an ubiquitous term in both science and technology and there is a strong need for new tools and techniques to process and analyze large-scale complex datasets that are growing exponentially. scientific workflow is a sequence of connected tasks with large data transfer from parent task to children tasks. Workflow scheduling is the activity of assigning tasks to execution on servers and satisfying resource constraints and this is an NP-hard problem. In this paper, we propose a scheduling algorithm for workflow data that is derived from the Branch and Bound Algorithm.

Download Full-text

Protecting vulnerable people during pandemics through home delivery of essential supplies: a distribution logistics model

Journal of Humanitarian Logistics and Supply Chain Management ◽

10.1108/jhlscm-07-2020-0062 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Eric Breitbarth ◽

Wendelin Groβ ◽

Alexander Zienau

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Service Providers ◽

Home Delivery ◽

Vulnerable Population ◽

Mixed Integer ◽

Baseline Scenario ◽

Content Type ◽

Distribution Logistics ◽

The Impact

PurposeThis paper studies a concept for protecting vulnerable population groups during pandemics using direct home deliveries of essential supplies, from a distribution logistics perspective. The purpose of this paper is to evaluate feasible and resource-efficient home delivery strategies, including collaboration between retailers and logistics service providers based on a practical application.Design/methodology/approachA food home delivery concept in urban areas during pandemics is mathematically modeled. All seniors living in a district of Berlin, Germany, represent the vulnerable population supplied by a grocery distribution center. A capacitated vehicle routing problem (CVRP) is developed in combination with a k-means clustering algorithm. To manage this large-scale problem efficiently, mixed-integer programming (MIP) is used. The impact of collaboration and additional delivery scenarios is examined with a sensitivity analysis.FindingsRoughly 45 medically vulnerable persons can be served by one delivery vehicle in the baseline scenario. Operational measures allow a drastic decrease in required resources by reducing service quality. In this way, home delivery for the vulnerable population of Berlin can be achieved. This requires collaboration between grocery and parcel services and public authorities as well as overcoming accompanying challenges.Originality/valueDeveloping a home delivery concept for providing essential goods to urban vulnerable groups during pandemics creates a special value. Setting a large-scale CVRP with variable fleet size in combination with a clustering algorithm contributes to the originality.

Download Full-text

The Implementation of Regularized Markov Clustering with Pigeon Inspired Optimization Algorithm in Analyzing the SARS-CoV-2 (COVID-19) Protein Interaction Network

Desimal Jurnal Matematika ◽

10.24042/djm.v3i3.6822 ◽

2020 ◽

Vol 3 (3) ◽

pp. 191-200

Author(s):

M. Syamsuddin Wisnubroto ◽

Marsudi Siburian ◽

Febri Dwi Irawati

Keyword(s):

Protein Interaction ◽

Optimization Algorithm ◽

Large Scale ◽

Clustering Algorithm ◽

Drug Research ◽

Interaction Network ◽

Protein Interaction Networks ◽

Interaction Networks ◽

Clustering Methods ◽

Markov Clustering

Proteins interact with other proteins, DNA, and other molecules, forming large-scale protein interaction networks and for easy analysis, clustering methods are needed. Regularized Markov clustering algorithm is an improvement of MCL where operations on expansion are replaced by new operations that update the flow distributions of each node. But to reduce the weaknesses of the RMCL optimization, Pigeon Inspired Optimization Algorithm (PIO) is used to replace the inflation parameters. The simulation results of IPC SARS-Cov-2 (COVID-19) inflation parameters get the result of 42 proteins as the center of the cluster and 8 protein pairs interacting with each other. Proteins of COVID-19 that interact with 20 or more proteins are ORF8, NSP13, NSP7, M, N, ORF9C, NSP8, and NSP1. Their interactions might be used as a target for drug research.

Download Full-text

A social norms approach to changing school children’s perceptions of tobacco usage

Health Education ◽

10.1108/he-01-2017-0006 ◽

2017 ◽

Vol 117 (6) ◽

pp. 530-539 ◽

Cited By ~ 3

Author(s):

Afzal Sheikh ◽

Sunil Vadera ◽

Michael Ravey ◽

Gary Lovatt ◽

Grace Kelly

Keyword(s):

Social Norms ◽

Clustering Algorithm ◽

Effective Means ◽

Baseline Survey ◽

Clustering Methods ◽

Normal Behaviour ◽

Content Type ◽

The Uk ◽

The Impact ◽

Interactive Feedback

Purpose Over 200,000 young people in the UK embark on a smoking career annually, thus continued effort is required to understand the types of interventions that are most effective in changing perceptions about smoking amongst teenagers. Several authors have proposed the use of social norms programmes, where correcting misconceptions of what is considered normal behaviour lead to improved behaviours. There are a limited number of studies showing the effectiveness of such programmes for changing teenagers’ perception of smoking habits, and hence this paper reports on the results from one of the largest social norms programmes that used a variety of interventions aimed at improving teenagers’ perceptions of smoking. The paper aims to discuss this issue. Design/methodology/approach A range of interventions were adopted for 57 programmes in year nine students, ranging from passive interventions such as posters and banners to active interventions such as student apps and enterprise days. Each programme consisted of a baseline survey followed by interventions and a repeat survey to calculate the change in perception. A clustering algorithm was also used to reveal the impact of combinations of interventions. Findings The study reveals three main findings: the use of social norms is an effective means of changing perceptions, the level of interventions and change in perceptions are positively correlated, and that the most effective combinations of interventions include the use of interactive feedback assemblies, enterprise days, parent and student apps and newsletters to parents. Originality/value The paper presents results from one of the largest social norm programmes aimed at improving young people’s perceptions and the first to use clustering methods to reveal the impact of combinations of intervention.

Download Full-text