Matching Document Pairs using Multi-Feature Semantic Fusion Based on Knowledge Graph
Abstract Discriminating the homology and heterogeneity of two documents in information retrieval is very important and difficult step. Existing methods mainly focus on word-based document duplicate checking or sentence pairs matching except manual verification which need a lot of human resource cost. The word-based document duplicate checking can not judge the similarity of two documents from the semantic level and the matching sentence pair methods can not effectively mine the semantic information from a long text which is frequent retrieval results. A concept-based Multi-Feature Semantic Fusion Model (MFSFM) is proposed. It employs multi-feature enhanced semantics to construct a concept map for represent the document, and employs a multi-convolution mixed residual CNN module to introduce local attention mechanism for improve the sensitivity of conceptual boundary information. To improve the feasibility of the proposed MFSFM based on concept maps, two multi-feature document data sets are set up. Each of them consists of about 500 actual scientific and technological project feasibility reports. Experimental results based on the actual datasets show that the proposed MFSFM converges quickly while expanding the latest methods of natural language matching at the accuracy rate.