Approaches to assessing the semantic similarity and future citation of publications by identifying informative terms with predictive properties
The article discusses new approaches to assessing the semantic similarity of documents in a vector space, taking into account statistically significant and informative terms. Informative terms reflect the current state of research in a certain field of research. To select informative terms, an algorithm for calculating the impact factor of the term is proposed. It is shown that informative terms allow both to evaluate the semantic similarity of texts and to predict future citations. The developed methods for assessing the semantic similarity and future impact of scientific publications can be used in the framework of “Predictive optimization”, a modern technology that allows us to make decisions based on forecasts. In evaluating the activities of research and individual scientists, bibliometric indicators often play an important role. However, the use of citation-based indicators is problematic in determining the impact of recent publications. Usually, two years after the publication of most articles, they receive only a few links. The probability of future citation can be predicted using the proposed indicator - IFT.