lsemantica: A command for text similarity based on latent semantic analysis

Author(s):  
Carlo Schwarz

In this article, I present the lsemantica command, which implements latent semantic analysis in Stata. Latent semantic analysis is a machine learning algorithm for word and text similarity comparison and uses truncated singular value decomposition to derive the hidden semantic relationships between words and texts. lsemantica provides a simple command for latent semantic analysis as well as complementary commands for text similarity comparison.

Automatic text summarization of a resource-poor language is a challenging task. Unsupervised extractive techniques are often preferred for such languages due to scarcity of resources. Latent Semantic Analysis (LSA) is an unsupervised technique which automatically identifies semantically important sentences from a text document. Two methods based on Latent Semantic Analysis have been evaluated on two datasets of a resource-poor language using Singular Value Decomposition (SVD) on different vector-space models. The performance of the methods is evaluated using ROUGE-L scores obtained by comparing the system generated summaries with human generated model summaries. Both the methods are found to be performing better for shorter documents than longer ones.


2018 ◽  
Vol 13 ◽  
pp. 174830181881360 ◽  
Author(s):  
Zhenyu Zhao ◽  
Riguang Lin ◽  
Zehong Meng ◽  
Guoqiang He ◽  
Lei You ◽  
...  

A modified truncated singular value decomposition method for solving ill-posed problems is presented in this paper, in which the solution has a slightly different form. Both theoretical and numerical results show that the limitations of the classical TSVD method have been overcome by the new method and very few additive computations are needed.


Sign in / Sign up

Export Citation Format

Share Document