Automatic Text Summarization and Keyword Extraction using Natural Language Processing

Author(s):  
Avinash Payak ◽  
Saurabh Rai ◽  
Kanishka Shrivastava ◽  
Reshma Gulwani
2020 ◽  
Author(s):  
Wojciech Ozimek

The automatic text summarizing task is one of the most complex problems in the field of natural language processing. In this dissertation, we present the abstraction-based summarization approach which allows to paraphrase the original text and generate new sentences. Creation of new formulations, completely different from the original text is similar to how humans summarize texts. To achieve this, we propose the deep learning method using Sequence to Sequence architecture with the attention mechanism. The goal is to create the model for Polish language, using dataset containing over 200,000 articles from Polish websites, split into text and summary parts. Presented outcomes look promising, obtaining decent results utilizing standard metrics for such type of task.Based on review of prior research done during experiments, this is the very first attempt of applying abstractive text summarization techniques for Polish language.


Author(s):  
Nurul Khotimah ◽  
◽  
Adi Wibowo P ◽  
Bryan Andreas ◽  
Abba Suganda Girsang

Text summarization is one problem in natural language processing that generates a brief version of the original document. This research took attention for some researchers in this last decade and growing fast, including Indonesia language. This paper aims to recap summarization text research especially in Indonesia language. As usual, this paper discusses two summarization approaches, extractive and abstractive. In fact, the number of research of extractive is more than abstractive. This paper investigates some methods such as Statistical Based Approach, Graph Based Approach, Machine Learning Approach, Fuzzy Logic Approach, Algebraic Approach, and Hybrid Approach. This paper shows some methods details and summarize the results. Keywords— Text summarization, extractive summary, abstractive summary, natural language processing


Webology ◽  
2021 ◽  
Vol 18 (05) ◽  
pp. 1184-1190
Author(s):  
Abinaya N ◽  
Anand R ◽  
Arunkumar T ◽  
Sameema Begam S

Automatic Text Summarization (ATS) is the key challenge in the area of Natural Language Processing (NLP). It deals with generalizing a summary from a given text without losing the vital information. This is a contemporary area because of exponential content growth in internet and applied in summarizing the content available in books, newsletters, internal document analysis, patent research, e-learning etc. Various machine learning approaches are used in order to achieve the performance of human-generated summaries. The system fails to perform at few areas like checking grammatical errors and paraphrasing the sentences after the summary creation. This work provides a brief view on methods and approaches used in ATS.


2021 ◽  
Vol 1955 (1) ◽  
pp. 012072
Author(s):  
Ruiheng Li ◽  
Xuan Zhang ◽  
Chengdong Li ◽  
Zhongju Zheng ◽  
Zihang Zhou ◽  
...  

2021 ◽  
Author(s):  
Ye Seul Bae ◽  
Kyung Hwan Kim ◽  
Han Kyul Kim ◽  
Sae Won Choi ◽  
Taehoon Ko ◽  
...  

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.


2020 ◽  
Vol 8 (6) ◽  
pp. 3281-3287

Text is an extremely rich resources of information. Each and every second, minutes, peoples are sending or receiving hundreds of millions of data. There are various tasks involved in NLP are machine learning, information extraction, information retrieval, automatic text summarization, question-answered system, parsing, sentiment analysis, natural language understanding and natural language generation. The information extraction is an important task which is used to find the structured information from unstructured or semi-structured text. The paper presents a methodology for extracting the relations of biomedical entities using spacy. The framework consists of following phases such as data creation, load and converting the data into spacy object, preprocessing, define the pattern and extract the relations. The dataset is downloaded from NCBI database which contains only the sentences. The created model evaluated with performance measures like precision, recall and f-measure. The model achieved 87% of accuracy in retrieving of entities relation.


Author(s):  
Janjanam Prabhudas ◽  
C. H. Pradeep Reddy

The enormous increase of information along with the computational abilities of machines created innovative applications in natural language processing by invoking machine learning models. This chapter will project the trends of natural language processing by employing machine learning and its models in the context of text summarization. This chapter is organized to make the researcher understand technical perspectives regarding feature representation and their models to consider before applying on language-oriented tasks. Further, the present chapter revises the details of primary models of deep learning, its applications, and performance in the context of language processing. The primary focus of this chapter is to illustrate the technical research findings and gaps of text summarization based on deep learning along with state-of-the-art deep learning models for TS.


Sign in / Sign up

Export Citation Format

Share Document