A New LSA and Entropy-Based Approach for Automatic Text Document Summarization
Automatic text document summarization is active research area in text mining field. In this article, the authors are proposing two new approaches (three models) for sentence selection, and a new entropy-based summary evaluation criteria. The first approach is based on the algebraic model, Singular Value Decomposition (SVD), i.e. Latent Semantic Analysis (LSA) and model is termed as proposed_model-1, and Second Approach is based on entropy that is further divided into proposed_model-2 and proposed_model-3. In first proposed model, the authors are using right singular matrix, and second & third proposed models are based on Shannon entropy. The advantage of these models is that these are not a Length dominating model, giving better results, and low redundancy. Along with these three new models, an entropy-based summary evaluation criteria is proposed and tested. They are also showing that their entropy based proposed models statistically closer to DUC-2002's standard/gold summary. In this article, the authors are using a dataset taken from Document Understanding Conference-2002.