Distributed Algorithms for Finding Meta-Paths of a Large Heterogeneous Information Network on Cloud
Meta-path is an important concept of heterogeneous information networks (HINs). Meta-paths were used in many tasks such as information retrieval, decision making, and product recommendation. Normally meta-paths were proposed by human experts. Recently, works on meta-path discovery have proposed in-memory solutions that fit in one computer. With large HINs, the whole HIN cannot be loaded in the memory. In this chapter, the authors proposed distributed algorithms to discover meta-paths of large HINs on cloud. They develop the distributed algorithms to discover the significant meta-path, maximal significant meta-path, and top-k meta-paths between two vertices of HIN. Calculation of the support of meta-paths or performing breadth first search can be computational costly in very large HINs. Conveniently, the distributed algorithms utilize the GraphFrames library of Apache Spark on cloud computing environment to efficiently query large HINs. The authors conduct the experiments on large DBLP dataset to prove the performance of our algorithms on cloud.