Clustering Columns of the Wide-Table in Cloud Computing
Various data-centric web applications are becoming the developing trend of information society. Cloud computing currently adopt column-oriented storage wide table to represent the heterogeneous structured data of these applications. The wide table reduces the waste of storage space, but slows down query efficiency. The paper implements the hybrid partition on access frequent (HPAF) to horizontally and vertically partition a wide table. It uses a variant of consistent hashing to dynamically horizontally partition a wide table across multiple storage nodes on each node’s performance; It use entropy to represent the number of reducing access data block from the table with N columns than from N column-oriented storage tables. According to the second law of thermodynamics, the paper designs an entropy increasing clustering algorithm to classify the columns of a wide table. The algorithm finds a cluster with multiple classes which save maximum access time. The paper implements an algorithm for structured query across multiple materialized views too. Lastly the paper demonstrates the query performance and storage efficiency of our strategy compared to single column storage.