Statistical Approaches to Automatic Text Summarization

Abstract—Propelled by the modern technological innovations data and text will be more abundant throughout the year. With this much text, automatic text summarization is needed now more than ever to help summarize a text. Automatic text summarization is defined as the creation of a shortened version of a text by a computer program, the product of this procedure still contains the most important points of the original text. Statistical approaches is one of automatic text summarization method. There is 5 statistical approaches that being used namely aggregation similarity method, frequency method, location method, title method (if text has a title), dan tf-based query method (if text doesn’t have a title). Cosine similarity is used to calculate title method, aggregation similarity method, and tf- based query method. There is two type of validation, user validation and system validation. For system validation compare the similarity between human summary and summary generated by program, which result in accuracy of 76.7647% for summary with 30% length of the original journal. For user validation result in 82% accuracy. The conclusion based on user validation and system validation is statistical approaches is suitable for automatic text summarization. Keywords: automatic text summarization, statistical approaches, Indonesian document, cosine similarity Abstrak— Dengan kemajuan teknologi jumlah data dan teks akan semakin melimpah sepanjang tahun. Dengan banyaknya teks ini dibutuhkan bantuan automatic text summarization untuk merangkum teks tersebut. Automatic text summarization didefinisikan sebagai versi singkat dari suatu teks menggunakan program komputer yang hasilnya masih memiliki informasi penting berupa gagasan dasar dan kata atau kalimat yang dapat merepresentasikan keseluruhan teks original. Salah satu metode dalam automatic text summarization adalah pendekatan statistika. Pendekatan statistika yang digunakan ada 5 yaitu aggregation similarity method, frequency method, location method, title method (bila teks memiliki judul), dan tf-based query method (bila teks tidak memiliki judul). Cosine similarity dipakai untuk perhitungan title method, tf-based query method, dan aggregation similarity method. Validasi dilakukan dengan dua macam validasi. Pertama adalah validasi sistem dengan membandingkan similaritas antara rangkuman program dan rangkuman manusia, yang menghasilkan akurasi 76.7647% untuk rangkuman dengan panjang 30% dari jurnal original. Kedua adalah validasi user yang menghasilkan akurasi 81%. Kesimpulannya berdasarkan validasi user dan validasi sistem yang cukup baik maka pendekatan statistika cocok dipakai dalam kasus automatic text summarization. Kata kunci: automatic text summarization, pendekatan statistika, cosine similarity, dokumen berbahasa Indonesia

Download Full-text

Automatic Text Summarization on Social Media

Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control ◽

10.1145/3440084.3441182 ◽

2020 ◽

Author(s):

Zhang Kerui ◽

Hu Haichao ◽

Liu Yuxia

Keyword(s):

Social Media ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Using librarian techniques in automatic text summarization for information retrieval

Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries - JCDL '02 ◽

10.1145/544220.544227 ◽

2002 ◽

Cited By ~ 7

Author(s):

Min-Yen Kan ◽

Judith L. Klavans

Keyword(s):

Information Retrieval ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

A Quantum-Inspired Genetic Algorithm for Extractive Text Summarization

International Journal of Natural Computing Research ◽

10.4018/ijncr.2021040103 ◽

2021 ◽

Vol 10 (2) ◽

pp. 42-60

Author(s):

Khadidja Chettah ◽

Amer Draa

Keyword(s):

Genetic Algorithm ◽

State Of The Art ◽

Text Summarization ◽

Automated System ◽

Evaluation Metrics ◽

Document Summarization ◽

Automatic Text Summarization ◽

Reference Methods ◽

Textual Data ◽

Automatic Text

Automatic text summarization has recently become a key instrument for reducing the huge quantity of textual data. In this paper, the authors propose a quantum-inspired genetic algorithm (QGA) for extractive single-document summarization. The QGA is used inside a totally automated system as an optimizer to search for the best combination of sentences to be put in the final summary. The presented approach is compared with 11 reference methods including supervised and unsupervised summarization techniques. They have evaluated the performances of the proposed approach on the DUC 2001 and DUC 2002 datasets using the ROUGE-1 and ROUGE-2 evaluation metrics. The obtained results show that the proposal can compete with other state-of-the-art methods. It is ranked first out of 12, outperforming all other algorithms.

Download Full-text

Calculating the Upper Bounds for Portuguese Automatic Text Summarization Using Genetic Algorithm

Advances in Artificial Intelligence - IBERAMIA 2018 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-03928-8_36 ◽

2018 ◽

pp. 442-454 ◽

Cited By ~ 1

Author(s):

Jonathan Rojas-Simón ◽

Yulia Ledeneva ◽

René Arnulfo García-Hernández

Keyword(s):

Genetic Algorithm ◽

Upper Bounds ◽

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Automatic Text Summarization Techniques Used in Industry

Proceedings of ICETIT 2019 - Lecture Notes in Electrical Engineering ◽

10.1007/978-3-030-30577-2_19 ◽

2019 ◽

pp. 229-237

Author(s):

Mukesh Kumar Kharita ◽

Pardeep Singh

Keyword(s):

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Prediction and Analysis of Extracting Relations using Spacy Model

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f8524.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 3281-3287

Keyword(s):

Natural Language ◽

Information Extraction ◽

Performance Measures ◽

Text Summarization ◽

Language Understanding ◽

Language Generation ◽

Automatic Text Summarization ◽

Structured Information ◽

Automatic Text ◽

F Measure

Text is an extremely rich resources of information. Each and every second, minutes, peoples are sending or receiving hundreds of millions of data. There are various tasks involved in NLP are machine learning, information extraction, information retrieval, automatic text summarization, question-answered system, parsing, sentiment analysis, natural language understanding and natural language generation. The information extraction is an important task which is used to find the structured information from unstructured or semi-structured text. The paper presents a methodology for extracting the relations of biomedical entities using spacy. The framework consists of following phases such as data creation, load and converting the data into spacy object, preprocessing, define the pattern and extract the relations. The dataset is downloaded from NCBI database which contains only the sentences. The created model evaluated with performance measures like precision, recall and f-measure. The model achieved 87% of accuracy in retrieving of entities relation.

Download Full-text

AUTOMATIC TEXT SUMMARIZATION USING SUPERVISED MACHINE LEARNING TECHNIQUE FOR HINDI LANGAUGE

International Journal of Research in Engineering and Technology ◽

10.15623/ijret.2016.0506065 ◽

2016 ◽

Vol 05 (06) ◽

pp. 361-367

Author(s):

Nikita Desai .

Keyword(s):

Machine Learning ◽

Text Summarization ◽

Supervised Machine Learning ◽

Machine Learning Technique ◽

Automatic Text Summarization ◽

Learning Technique ◽

Automatic Text

Download Full-text

Automatic text summarization in TIPSTER

10.3115/1119089.1119119 ◽

1996 ◽

Author(s):

Thérèse Firmin ◽

Inderjeet Mani

Keyword(s):

Text Summarization ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

A Pointer Generator Network Model to Automatic Text Summarization and Headline Generation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.e1094.0785s319 ◽

2019 ◽

Vol 8 (5S3) ◽

pp. 447-451

Keyword(s):

Neural Network ◽

Network Model ◽

Recurrent Neural Network ◽

Text Summarization ◽

Daily Mail ◽

Automatic Text Summarization ◽

Generator Model ◽

Abstractive Summarization ◽

Automatic Text

In a world where information is growing rapidly every single day, we need tools to generate summary and headlines from text which is accurate as well as short and precise. In this paper, we have described a method for generating headlines from article. This is done by using hybrid pointer-generator network with attention distribution and coverage mechanism on article which generates abstractive summarization followed by the application of encoder-decoder recurrent neural network with LSTM unit to generate headlines from the summary. Hybrid pointer generator model helps in removing inaccuracy as well as repetitions. We have used CNN / Daily Mail as our dataset.

Download Full-text