EFFICIENT LARGE-SCALE SERVICE CLUSTERING VIA SPARSE FUNCTIONAL REPRESENTATION AND ACCELERATED OPTIMIZATION
Clustering techniques offer a systematic approach to organize the diverse and fast increasing Web services by assigning relevant services into homogeneous service communities. However, the ever increasing number of Web services poses key challenges for building large-scale service communities. In this paper, we tackle the scalability issue in service clustering, aiming to accurately and efficiently discover service communities over very large-scale services. A key observation is that service descriptions are usually represented by long but very sparse term vectors as each service is only described by a limited number of terms. This inspires us to seek a new service representation that is economical to store, efficient to process, and intuitive to interpret. This new representation enables service clustering to scale to massive number of services. More specifically, a set of anchor services are identified that allows each service to represent as a linear combination of a small number of anchor services. In this way, the large number of services are encoded with a much more compact anchor service space. Despite service clustering can be performed much more efficiently in the compact anchor service space, discovery of anchor services from large-scale service descriptions may incur high computational cost. We develop principled optimization strategies for efficient anchor service discovery. Extensive experiments are conducted on real-world service data to assess both the effectiveness and efficiency of the proposed approach. Results on a dataset with over 3,700 Web services clearly demonstrate the good scalability of sparse functional representation and the efficiency of the optimization algorithms for anchor service discovery.