Automatic Text Document Summarization Based on Machine Learning

Author(s):  
Gabriel Silva ◽  
Rafael Ferreira ◽  
Rafael Dueire Lins ◽  
Luciano Cabral ◽  
Hilário Oliveira ◽  
...  
Author(s):  
Chandra Yadav ◽  
Aditi Sharan

Automatic text document summarization is active research area in text mining field. In this article, the authors are proposing two new approaches (three models) for sentence selection, and a new entropy-based summary evaluation criteria. The first approach is based on the algebraic model, Singular Value Decomposition (SVD), i.e. Latent Semantic Analysis (LSA) and model is termed as proposed_model-1, and Second Approach is based on entropy that is further divided into proposed_model-2 and proposed_model-3. In first proposed model, the authors are using right singular matrix, and second & third proposed models are based on Shannon entropy. The advantage of these models is that these are not a Length dominating model, giving better results, and low redundancy. Along with these three new models, an entropy-based summary evaluation criteria is proposed and tested. They are also showing that their entropy based proposed models statistically closer to DUC-2002's standard/gold summary. In this article, the authors are using a dataset taken from Document Understanding Conference-2002.


Automatic text summarization is a technique of generating short and accurate summary of a longer text document. Text summarization can be classified based on the number of input documents (single document and multi-document summarization) and based on the characteristics of the summary generated (extractive and abstractive summarization). Multi-document summarization is an automatic process of creating relevant, informative and concise summary from a cluster of related documents. This paper does a detailed survey on the existing literature on the various approaches for text summarization. Few of the most popular approaches such as graph based, cluster based and deep learning-based summarization techniques are discussed here along with the evaluation metrics, which can provide an insight to the future researchers.


2018 ◽  
Vol 8 (3) ◽  
pp. 14-32 ◽  
Author(s):  
Chandra Shakhar Yadav ◽  
Aditi Sharan

This article proposes a new concept of Lexical Network for Automatic Text Document Summarization. Instead of a number of chains, the authors are getting a network of sentences which is called as Lexical Network termed as LexNetwork. This network is created between sentences based on different lexical and semantic relations. In this network, a node is representing sentences and edges are representing strength between two sentences. Strength means the number of relations present between the two sentences. The importance of the sentences is decided based on different centrality measures and extracted for the summary. WSD is done with Simple Lesk technique, and Cosine-Similarity threshold (Ɵ, TH) is used as post processing task. In this article, the authors are suggesting that a Cosine similarity threshold 10% is better vs. 5%, and an Eigen-Value based centrality measure is better for summarization process. At last for comparison, they are using Semantrica-Lexalytics System.


Author(s):  
Etana Fikadu

The aim of this study is to find the optimal method that can be used to classify Afaan Oromo text among different classifier by using the same number of text document. Automatic text classification has been needed in many fields for a long time. Many methods are used to classify text. The performance of this classifier we used in this study is measured in terms of recall, precision and F-measure. Finally we compare the efficiencies of the Bayesian Network, Naïve Bayesian, IBK and SMO to classify Afaan Oromo text. Experimental results on the same set of Afaan Oromo documents used before show that SMO slightly outperforms the other methods. Comparison reported in this paper shows that the SMO classifier exceeds the other four Machine learning classifier.


2021 ◽  
Author(s):  
Joshua Lois Cruz Paulino ◽  
Lexter Carl Antoja Almirol ◽  
Jun Marco Cruz Favila ◽  
Kent Alvin Gerald Loria Aquino ◽  
Angelica Hernandez De La Cruz ◽  
...  

2021 ◽  
Vol 10 (2) ◽  
pp. 42-60
Author(s):  
Khadidja Chettah ◽  
Amer Draa

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.


Sign in / Sign up

Export Citation Format

Share Document