Text Summarization Approaches Using Machine Learning & LSTM

Due to the massive amount of online textual data generated in a diversity of social media, web, and other information-centric applications. To select the vital data from the large text, need to study the full article and generate summary also not loose critical information of text document this process is called summarization. Text summarization is done either by human which need expertise in that area, also very tedious and time consuming. second type of summarization is done through system which is known as automatic text summarization which generate summary automatically. There are mainly two categories of Automatic text summarizations that is abstractive and extractive text summarization. Extractive summary is produced by picking important and high rank sentences and word from the text document on the other hand the sentences and word are present in the summary generated through Abstractive method may not present in original text. This article mainly focuses on different ATS (Automatic text summarization) techniques that has been instigated in the present are argue. The paper begin with a concise introduction of automatic text summarization, then closely discussed the innovative developments in extractive and abstractive text summarization methods, and then transfers to literature survey, and it finally sum-up with the proposed techniques using LSTM with encoder Decoder for abstractive text summarization are discussed along with some future work directions.

Download Full-text

SGATS: Semantic Graph-based Automatic Text Summarization from Hindi Text Documents

ACM Transactions on Asian and Low-Resource Language Information Processing ◽

10.1145/3464381 ◽

2021 ◽

Vol 20 (6) ◽

pp. 1-32

Author(s):

Manju Lata Joshi ◽

Nisheeth Joshi ◽

Namita Mittal

Keyword(s):

Language Processing ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Theoretic Approach ◽

Extractive Summarization ◽

Semantic Graph ◽

Text Document ◽

Automatic Text Summarization ◽

Automatic Text

Creating a coherent summary of the text is a challenging task in the field of Natural Language Processing (NLP). Various Automatic Text Summarization techniques have been developed for abstractive as well as extractive summarization. This study focuses on extractive summarization which is a process containing selected delineative paragraphs or sentences from the original text and combining these into smaller forms than the document(s) to generate a summary. The methods that have been used for extractive summarization are based on a graph-theoretic approach, machine learning, Latent Semantic Analysis (LSA), neural networks, cluster, and fuzzy logic. In this paper, a semantic graph-based approach SGATS (Semantic Graph-based approach for Automatic Text Summarization) is proposed to generate an extractive summary. The proposed approach constructs a semantic graph of the original Hindi text document by establishing a semantic relationship between sentences of the document using Hindi Wordnet ontology as a background knowledge source. Once the semantic graph is constructed, fourteen different graph theoretical measures are applied to rank the document sentences depending on their semantic scores. The proposed approach is applied to two data sets of different domains of Tourism and Health. The performance of the proposed approach is compared with the state-of-the-art TextRank algorithm and human-annotated summary. The performance of the proposed system is evaluated using widely accepted ROUGE measures. The outcomes exhibit that our proposed system produces better results than TextRank for health domain corpus and comparable results for tourism corpus. Further, correlation coefficient methods are applied to find a correlation between eight different graphical measures and it is observed that most of the graphical measures are highly correlated.

Download Full-text

A Quantum-Inspired Genetic Algorithm for Extractive Text Summarization

International Journal of Natural Computing Research ◽

10.4018/ijncr.2021040103 ◽

2021 ◽

Vol 10 (2) ◽

pp. 42-60

Author(s):

Khadidja Chettah ◽

Amer Draa

Keyword(s):

Genetic Algorithm ◽

State Of The Art ◽

Text Summarization ◽

Automated System ◽

Evaluation Metrics ◽

Document Summarization ◽

Automatic Text Summarization ◽

Reference Methods ◽

Textual Data ◽

Automatic Text

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.

Download Full-text

A Framework for Word Embedding Based Automatic Text Summarization and Evaluation

Information ◽

10.3390/info11020078 ◽

2020 ◽

Vol 11 (2) ◽

pp. 78 ◽

Cited By ~ 2

Author(s):

Tulu Tilahun Hailu ◽

Junqing Yu ◽

Tessfu Geteye Fantaye

Keyword(s):

Text Summarization ◽

Evaluation Framework ◽

Word Embedding ◽

Evaluation Metrics ◽

Original Text ◽

Automatic Evaluation ◽

Source Text ◽

Automatic Text Summarization ◽

Automatic Text

Text summarization is a process of producing a concise version of text (summary) from one or more information sources. If the generated summary preserves meaning of the original text, it will help the users to make fast and effective decision. However, how much meaning of the source text can be preserved is becoming harder to evaluate. The most commonly used automatic evaluation metrics like Recall-Oriented Understudy for Gisting Evaluation (ROUGE) strictly rely on the overlapping n-gram units between reference and candidate summaries, which are not suitable to measure the quality of abstractive summaries. Another major challenge to evaluate text summarization systems is lack of consistent ideal reference summaries. Studies show that human summarizers can produce variable reference summaries of the same source that can significantly affect automatic evaluation metrics scores of summarization systems. Humans are biased to certain situation while producing summary, even the same person perhaps produces substantially different summaries of the same source at different time. This paper proposes a word embedding based automatic text summarization and evaluation framework, which can successfully determine salient top-n sentences of a source text as a reference summary, and evaluate the quality of systems summaries against it. Extensive experimental results demonstrate that the proposed framework is effective and able to outperform several baseline methods with regard to both text summarization systems and automatic evaluation metrics when tested on a publicly available dataset.

Download Full-text

Developing a new approach to summarize Arabic text automatically using syntactic and semantic analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i2.30324 ◽

2020 ◽

Vol 9 (2) ◽

pp. 342

Author(s):

Amal Alkhudari

Keyword(s):

Language Processing ◽

Automatic System ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Arabic Text ◽

Wide Spread ◽

New Approach ◽

Automatic Text Summarization ◽

Automatic Text

Due to the wide spread information and the diversity of its sources, there is a need to produce an accurate text summary with the least time and effort. This summary must preserve key information content and overall meaning of the original text. Text summarization is one of the most important applications of Natural Language Processing (NLP). The goal of automatic text summarization is to create summaries that are similar to human-created ones. However, in many cases, the readability of created summaries is not satisfactory, because the summaries do not consider the meaning of the words and do not cover all the semantically relevant aspects of data. In this paper we use syntactic and semantic analysis to propose an automatic system of Arabic texts summarization. This system is capable of understanding the meaning of information and retrieves only the relevant part. The effectiveness and evaluation of the proposed work are demonstrated under EASC corpus using Rouge measure. The generated summaries will be compared against those done by human and precedent researches.

Download Full-text

A New Biomimetic Method Based on the Power Saves of Social Bees for Automatic Summaries of Texts by Extraction

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2015010102 ◽

2015 ◽

Vol 7 (1) ◽

pp. 18-38 ◽

Cited By ~ 5

Author(s):

Mohamed Amine Boudia ◽

Reda Mohamed Hamou ◽

Abdelmalek Amine ◽

Amine Rahmani

Keyword(s):

Text Summarization ◽

Second Step ◽

Original Text ◽

Simple Majority ◽

New Approach ◽

Social Bees ◽

Automatic Text Summarization ◽

Final Layer ◽

Biomimetic Method ◽

Automatic Text

In this paper, the authors propose a new approach for automatic text summarization by extraction based on Saving Energy Function where the first step constitute to use two techniques of extraction: scoring of phrases, and similarity that aims to eliminate redundant phrases without losing the theme of the text. While the second step aims to optimize the results of the previous layer by the metaheuristic based on Bee Algorithm, the objective function of the optimization is to maximize the sum of similarity between phrases of the candidate summary in order to keep the theme of the text, minimize the sum of scores in order to increase the summarization rate, this optimization also will give a candidate's summary where the order of the phrases changes compared to the original text. The third and final layer aims to choose the best summary from the candidate summaries generated by bee optimization, the authors opted for the technique of voting with a simple majority.

Download Full-text

Automatic Text Summarization by Providing Coverage, Non-Redundancy, and Novelty Using Sentence Graph

Journal of Information Technology Research ◽

10.4018/jitr.2022010108 ◽

2022 ◽

Vol 15 (1) ◽

pp. 1-18

Author(s):

Krishnaveni P. ◽

Balasundaram S. R.

Keyword(s):

Graph Algorithms ◽

Maximal Clique ◽

Text Summarization ◽

Original Text ◽

Online Information ◽

Automatic Text Summarization ◽

Global Properties ◽

Input Text ◽

Local Properties ◽

Automatic Text

The day-to-day growth of online information necessitates intensive research in automatic text summarization (ATS). The ATS software produces summary text by extracting important information from the original text. With the help of summaries, users can easily read and understand the documents of interest. Most of the approaches for ATS used only local properties of text. Moreover, the numerous properties make the sentence selection difficult and complicated. So this article uses a graph based summarization to utilize structural and global properties of text. It introduces maximal clique based sentence selection (MCBSS) algorithm to select important and non-redundant sentences that cover all concepts of the input text for summary. The MCBSS algorithm finds novel information using maximal cliques (MCs). The experimental results of recall oriented understudy for gisting evaluation (ROUGE) on Timeline dataset show that the proposed work outperforms the existing graph algorithms Bushy Path (BP), Aggregate Similarity (AS), and TextRank (TR).

Download Full-text

Single Document Automatic Text Summarization using Term Frequency-Inverse Document Frequency (TF-IDF)

ComTech Computer Mathematics and Engineering Applications ◽

10.21512/comtech.v7i4.3746 ◽

2016 ◽

Vol 7 (4) ◽

pp. 285 ◽

Cited By ~ 14

Author(s):

Hans Christian ◽

Mikhael Pramodana Agus ◽

Derwin Suhartono

Keyword(s):

Language Processing ◽

Text Summarization ◽

The Other ◽

Online Information ◽

Inverse Document Frequency ◽

Automatic Text Summarization ◽

Document Frequency ◽

Online Source ◽

Automatic Text ◽

F Measure

The increasing availability of online information has triggered an intensive research in the area of automatic text summarization within the Natural Language Processing (NLP). Text summarization reduces the text by removing the less useful information which helps the reader to find the required information quickly. There are many kinds of algorithms that can be used to summarize the text. One of them is TF-IDF (TermFrequency-Inverse Document Frequency). This research aimed to produce an automatic text summarizer implemented with TF-IDF algorithm and to compare it with other various online source of automatic text summarizer. To evaluate the summary produced from each summarizer, The F-Measure as the standard comparison value had been used. The result of this research produces 67% of accuracy with three data samples which are higher compared to the other online summarizers.

Download Full-text

A Systematic Survey on Multi-document Text Summarization

International Journal of Advanced Trends in Computer Science and Engineering ◽

10.30534/ijatcse/2021/111062021 ◽

2021 ◽

Vol 10 (6) ◽

pp. 3148-3153

Keyword(s):

Deep Learning ◽

Text Summarization ◽

Evaluation Metrics ◽

Automatic Process ◽

Document Summarization ◽

Text Document ◽

Automatic Text Summarization ◽

As Graph ◽

Abstractive Summarization ◽

Automatic Text

Automatic text summarization is a technique of generating short and accurate summary of a longer text document. Text summarization can be classified based on the number of input documents (single document and multi-document summarization) and based on the characteristics of the summary generated (extractive and abstractive summarization). Multi-document summarization is an automatic process of creating relevant, informative and concise summary from a cluster of related documents. This paper does a detailed survey on the existing literature on the various approaches for text summarization. Few of the most popular approaches such as graph based, cluster based and deep learning-based summarization techniques are discussed here along with the evaluation metrics, which can provide an insight to the future researchers.

Download Full-text

Automatic Text Summarization Using Deep Reinforcement Learning and Beyond

Information Technology And Control ◽

10.5755/j01.itc.50.3.28047 ◽

2021 ◽

Vol 50 (3) ◽

pp. 458-469

Author(s):

Gang Sun ◽

Zhongxin Wang ◽

Jia Zhao

Keyword(s):

Information Overload ◽

Optimization Method ◽

Text Summarization ◽

Original Text ◽

Baseline Model ◽

Extractive Summarization ◽

Automatic Text Summarization ◽

Text Information ◽

Abstractive Summarization ◽

Automatic Text

In the era of big data, information overload problems are becoming increasingly prominent. It is challengingfor machines to understand, compress and filter massive text information through the use of artificial intelligencetechnology. The emergence of automatic text summarization mainly aims at solving the problem ofinformation overload, and it can be divided into two types: extractive and abstractive. The former finds somekey sentences or phrases from the original text and combines them into a summarization; the latter needs acomputer to understand the content of the original text and then uses the readable language for the human tosummarize the key information of the original text. This paper presents a two-stage optimization method forautomatic text summarization that combines abstractive summarization and extractive summarization. First,a sequence-to-sequence model with the attention mechanism is trained as a baseline model to generate initialsummarization. Second, it is updated and optimized directly on the ROUGE metric by using deep reinforcementlearning (DRL). Experimental results show that compared with the baseline model, Rouge-1, Rouge-2,and Rouge-L have been increased on the LCSTS dataset and CNN/DailyMail dataset.

Download Full-text

A boundary-based tokenization technique for extractive text summarization

World Journal of Advanced Research and Reviews ◽

10.30574/wjarr.2021.11.2.0351 ◽

2021 ◽

Vol 11 (2) ◽

pp. 303-312

Author(s):

Nnaemeka M Oparauwah ◽

Juliet N Odii ◽

Ikechukwu I Ayogu ◽

Vitalis C Iwuchukwu

Keyword(s):

Academic Research ◽

Text Summarization ◽

Text Documents ◽

Health Records ◽

Extractive Summarization ◽

Content Creation ◽

Text Document ◽

Automatic Text Summarization ◽

Automatic Text ◽

Selection Of

The need to extract and manage vital information contained in copious volumes of text documents has given birth to several automatic text summarization (ATS) approaches. ATS has found application in academic research, medical health records analysis, content creation and search engine optimization, finance and media. This study presents a boundary-based tokenization method for extractive text summarization. The proposed method performs word tokenization by defining word boundaries in place of specific delimiters. An extractive summarization algorithm was further developed based on the proposed boundary-based tokenization method, as well as word length consideration to control redundancy in summary output. Experimental results showed that the proposed approach enhanced word tokenization by enhancing the selection of appropriate keywords from text document to be used for summarization.

Download Full-text