EFFICIENT LARGE-SCALE SERVICE CLUSTERING VIA SPARSE FUNCTIONAL REPRESENTATION AND ACCELERATED OPTIMIZATION

Clustering techniques offer a systematic approach to organize the diverse and fast increasing Web services by assigning relevant services into homogeneous service communities. However, the ever increasing number of Web services poses key challenges for building large-scale service communities. In this paper, we tackle the scalability issue in service clustering, aiming to accurately and efficiently discover service communities over very large-scale services. A key observation is that service descriptions are usually represented by long but very sparse term vectors as each service is only described by a limited number of terms. This inspires us to seek a new service representation that is economical to store, efficient to process, and intuitive to interpret. This new representation enables service clustering to scale to massive number of services. More specifically, a set of anchor services are identified that allows each service to represent as a linear combination of a small number of anchor services. In this way, the large number of services are encoded with a much more compact anchor service space. Despite service clustering can be performed much more efficiently in the compact anchor service space, discovery of anchor services from large-scale service descriptions may incur high computational cost. We develop principled optimization strategies for efficient anchor service discovery. Extensive experiments are conducted on real-world service data to assess both the effectiveness and efficiency of the proposed approach. Results on a dataset with over 3,700 Web services clearly demonstrate the good scalability of sparse functional representation and the efficiency of the optimization algorithms for anchor service discovery.

Download Full-text

A NOTE ON PHASING LONG GENOMIC REGIONS USING LOCAL HAPLOTYPE PREDICTIONS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720006002272 ◽

2006 ◽

Vol 04 (03) ◽

pp. 639-647 ◽

Cited By ~ 6

Author(s):

ELEAZAR ESKIN ◽

RODED SHARAN ◽

ERAN HALPERIN

Keyword(s):

Large Scale ◽

Computational Cost ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Novel Approach ◽

Maximum Likelihood Criterion ◽

The Common ◽

Genomic Regions ◽

High Computational Cost ◽

Combining Information

The common approaches for haplotype inference from genotype data are targeted toward phasing short genomic regions. Longer regions are often tackled in a heuristic manner, due to the high computational cost. Here, we describe a novel approach for phasing genotypes over long regions, which is based on combining information from local predictions on short, overlapping regions. The phasing is done in a way, which maximizes a natural maximum likelihood criterion. Among other things, this criterion takes into account the physical length between neighboring single nucleotide polymorphisms. The approach is very efficient and is applied to several large scale datasets and is shown to be successful in two recent benchmarking studies (Zaitlen et al., in press; Marchini et al., in preparation). Our method is publicly available via a webserver at .

Download Full-text

Sparse Functional Representation for Large-Scale Service Clustering

Service-Oriented Computing - Lecture Notes in Computer Science ◽

10.1007/978-3-642-34321-6_31 ◽

2012 ◽

pp. 468-483 ◽

Cited By ~ 3

Author(s):

Qi Yu

Keyword(s):

Large Scale ◽

Functional Representation ◽

Service Clustering

Download Full-text

Experimental Analysis for Semantic based Large Scale Service Composition using Deep Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j1061.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 4280-4283

Keyword(s):

Deep Learning ◽

Web Services ◽

Web Service ◽

Service Composition ◽

Web Application ◽

Large Scale ◽

Learning Algorithm ◽

Computational Cost ◽

Web Service Composition ◽

User Requirement

In Service Oriented Architecture (SOA) web services plays important role. Web services are web application components that can be published, found, and used on the Web. Also machine-to-machine communication over a network can be achieved through web services. Cloud computing and distributed computing brings lot of web services into WWW. Web service composition is the process of combing two or more web services to together to satisfy the user requirements. Tremendous increase in the number of services and the complexity in user requirement specification make web service composition as challenging task. The automated service composition is a technique in which Web Service Composition can be done automatically with minimal or no human intervention. In this paper we propose a approach of web service composition methods for large scale environment by considering the QoS Parameters. We have used stacked autoencoders to learn features of web services. Recurrent Neural Network (RNN) leverages uses the learned features to predict the new composition. Experiment results show the efficiency and scalability. Use of deep learning algorithm in web service composition, leads to high success rate and less computational cost.

Download Full-text

Context-Aware Web Service Clustering and Visualization

International Journal of Web Services Research ◽

10.4018/ijwsr.2020100103 ◽

2020 ◽

Vol 17 (4) ◽

pp. 32-54

Author(s):

Banage T. G. S. Kumara ◽

Incheon Paik ◽

Yuichi Yaguchi

Keyword(s):

Web Services ◽

Web Service ◽

Service Discovery ◽

Search Space ◽

Context Aware ◽

Specific Context ◽

Web Service Discovery ◽

Domain Specific ◽

Clustering Approach ◽

Service Clustering

With the large number of web services now available via the internet, web service discovery has become a challenging and time-consuming task. Organizing web services into similar clusters is a very efficient approach to reducing the search space. A principal issue for clustering is computing the semantic similarity between services. Current approaches do not consider the domain-specific context in measuring similarity and this has affected their clustering performance. This paper proposes a context-aware similarity (CAS) method that learns domain context by machine learning to produce models of context for terms retrieved from the web. To analyze visually the effect of domain context on the clustering results, the clustering approach applies a spherical associated-keyword-space algorithm. The CAS method analyzes the hidden semantics of services within a particular domain, and the awareness of service context helps to find cluster tensors that characterize the cluster elements. Experimental results show that the clustering approach works efficiently.

Download Full-text

Web Service Clustering Using Relational Database Approach

International Journal of Software Engineering and Knowledge Engineering ◽

10.1142/s021819401550028x ◽

2015 ◽

Vol 25 (08) ◽

pp. 1365-1393 ◽

Cited By ~ 5

Author(s):

Jianxiao Liu ◽

Feng Liu ◽

Xiaoxia Li ◽

Keqing He ◽

Yutao Ma ◽

...

Keyword(s):

Web Services ◽

Web Service ◽

Relational Database ◽

Service Discovery ◽

The Self ◽

Input Output ◽

Computing Services ◽

Service Information ◽

Service Oriented ◽

Service Clustering

In the era of service-oriented software engineering (SOSE), service clustering is used to organize Web services, and it can help to enhance the efficiency and accuracy of service discovery. In order to improve the efficiency and accuracy of service clustering, this paper uses the self-join operation in relational database (RDB) to realize Web service clustering. Based on storing service information, it does the self-join operation towards the Input, Output, Precondition, Effect (IOPE) tables of Web services, which can enhance the efficiency of computing services similarity. The semantic reasoning relationship between concepts and the concept status path are used to do the calculation, which can improve the calculation accuracy. Finally, we use experiments to validate the effectiveness of the proposed methods.

Download Full-text

Grafting for combinatorial binary model using frequent itemset mining

Data Mining and Knowledge Discovery ◽

10.1007/s10618-019-00657-9 ◽

2019 ◽

Vol 34 (1) ◽

pp. 101-123 ◽

Cited By ~ 1

Author(s):

Taito Lee ◽

Shin Matsushima ◽

Kenji Yamanishi

Keyword(s):

Large Scale ◽

Computational Cost ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Binary Model ◽

High Knowledge ◽

Linear Predictors ◽

Computational Difficulty ◽

High Computational Cost

Abstract We consider the class of linear predictors over all logical conjunctions of binary attributes, which we refer to as the class of combinatorial binary models (CBMs) in this paper. CBMs are of high knowledge interpretability but naïve learning of them from labeled data requires exponentially high computational cost with respect to the length of the conjunctions. On the other hand, in the case of large-scale datasets, long conjunctions are effective for learning predictors. To overcome this computational difficulty, we propose an algorithm, GRAfting for Binary datasets (GRAB), which efficiently learns CBMs within the $$L_1$$L1-regularized loss minimization framework. The key idea of GRAB is to adopt weighted frequent itemset mining for the most time-consuming step in the grafting algorithm, which is designed to solve large-scale $$L_1$$L1-RERM problems by an iterative approach. Furthermore, we experimentally showed that linear predictors of CBMs are effective in terms of prediction accuracy and knowledge discovery.

Download Full-text

Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics

F1000Research ◽

10.12688/f1000research.9416.3 ◽

2017 ◽

Vol 5 ◽

pp. 1987 ◽

Cited By ~ 2

Author(s):

Jasper J. Koehorst ◽

Edoardo Saccenti ◽

Peter J. Schaap ◽

Vitor A. P. Martins dos Santos ◽

Maria Suarez-Diez

Keyword(s):

Comparative Analysis ◽

Large Scale ◽

Sequence Similarity ◽

Computational Cost ◽

Protein Domain ◽

Gene Acquisition ◽

Bacterial Fitness ◽

Efficient Alternative ◽

Comparative Functional Genomics ◽

High Computational Cost

A functional comparative genome analysis is essential to understand the mechanisms underlying bacterial evolution and adaptation. Detection of functional orthologs using standard global sequence similarity methods faces several problems; the need for defining arbitrary acceptance thresholds for similarity and alignment length, lateral gene acquisition and the high computational cost for finding bi-directional best matches at a large scale. We investigated the use of protein domain architectures for large scale functional comparative analysis as an alternative method. The performance of both approaches was assessed through functional comparison of 446 bacterial genomes sampled at different taxonomic levels. We show that protein domain architectures provide a fast and efficient alternative to methods based on sequence similarity to identify groups of functionally equivalent proteins within and across taxonomic boundaries, and it is suitable for large scale comparative analysis. Running both methods in parallel pinpoints potential functional adaptations that may add to bacterial fitness.

Download Full-text

Improving the Effectiveness and Efficiency of Stochastic Neighbour Embedding with Isolation Kernel

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12904 ◽

2021 ◽

Vol 71 ◽

pp. 667-695

Author(s):

Ye Zhu ◽

Kai Ming Ting

Keyword(s):

Large Scale ◽

Computational Cost ◽

Gaussian Kernel ◽

Space Partitioning ◽

Root Cause ◽

Use Of Data ◽

Local Point ◽

Effectiveness And Efficiency ◽

Insight Into

This paper presents a new insight into improving the performance of Stochastic Neighbour Embedding (t-SNE) by using Isolation kernel instead of Gaussian kernel. Isolation kernel outperforms Gaussian kernel in two aspects. First, the use of Isolation kernel in t-SNE overcomes the drawback of misrepresenting some structures in the data, which often occurs when Gaussian kernel is applied in t-SNE. This is because Gaussian kernel determines each local bandwidth based on one local point only, while Isolation kernel is derived directly from the data based on space partitioning. Second, the use of Isolation kernel yields a more efficient similarity computation because data-dependent Isolation kernel has only one parameter that needs to be tuned. In contrast, the use of data-independent Gaussian kernel increases the computational cost by determining n bandwidths for a dataset of n points. As the root cause of these deficiencies in t-SNE is Gaussian kernel, we show that simply replacing Gaussian kernel with Isolation kernel in t-SNE significantly improves the quality of the final visualisation output (without creating misrepresented structures) and removes one key obstacle that prevents t-SNE from processing large datasets. Moreover, Isolation kernel enables t-SNE to deal with large-scale datasets in less runtime without trading off accuracy, unlike existing methods in speeding up t-SNE.

Download Full-text

Protein domain architectures provide a fast, efficient and scalable alternative to sequence-based methods for comparative functional genomics

F1000Research ◽

10.12688/f1000research.9416.1 ◽

2016 ◽

Vol 5 ◽

pp. 1987 ◽

Cited By ~ 7

Author(s):

Jasper J. Koehorst ◽

Edoardo Saccenti ◽

Peter J. Schaap ◽

Vitor A. P. Martins dos Santos ◽

Maria Suarez-Diez

Keyword(s):

Comparative Analysis ◽

Large Scale ◽

Sequence Similarity ◽

Computational Cost ◽

Protein Domain ◽

Gene Acquisition ◽

Bacterial Fitness ◽

Efficient Alternative ◽

Comparative Functional Genomics ◽

High Computational Cost

A functional comparative genome analysis is essential to understand the mechanisms underlying bacterial evolution and adaptation. Detection of functional orthologs using standard global sequence similarity methods faces several problems; the need for defining arbitrary acceptance thresholds for similarity and alignment length, lateral gene acquisition and the high computational cost for finding bi-directional best matches at a large scale. We investigated the use of protein domain architectures for large scale functional comparative analysis as an alternative method. The performance of both approaches was assessed through functional comparison of 446 bacterial genomes sampled at different taxonomic levels. We show that protein domain architectures provide a fast and efficient alternative to methods based on sequence similarity to identify groups of functionally equivalent proteins within and across taxonomic bounderies. As the computational cost scales linearly, and not quadratically with the number of genomes, it is suitable for large scale comparative analysis. Running both methods in parallel pinpoints potential functional adaptations that may add to bacterial fitness.

Download Full-text

Web Service Clustering and Data Mining in SOA System

Advances in Business Information Systems and Analytics - Exploring Enterprise Service Bus in the Service-Oriented Architecture Paradigm ◽

10.4018/978-1-5225-2157-0.ch011 ◽

2017 ◽

pp. 157-177

Author(s):

Sreeparna Mukherjee ◽

Asoke Nath

Keyword(s):

Data Mining ◽

Web Services ◽

Web Service ◽

Service Discovery ◽

Software Components ◽

Discovery Process ◽

Data Mining Techniques ◽

Web Service Discovery ◽

Service Clustering ◽

The Web

The success of the web depended on the fact that it was simple and ubiquitous. Over the years, the web has evolved to become not only the repository for accessing information but also for storing software components. This transformation resulted in increased business needs and with the availability of huge volumes of data and the continuous evolution in Web services functions derive the need of application of data mining in the Web service domain. Here we focus on applying various data mining techniques to the cluster web services to improve the Web service discovery process. We end this with the various challenges that are faced in this process of data mining of web services.

Download Full-text