News Headline Building using Hybrid Headline Generation Technique for Quick Gist

Urmila Shrawankar; Kranti Wankhede

doi:10.4018/ijncr.2017010103

News Headline Building using Hybrid Headline Generation Technique for Quick Gist

International Journal of Natural Computing Research ◽

10.4018/ijncr.2017010103 ◽

2017 ◽

Vol 6 (1) ◽

pp. 36-52

Author(s):

Urmila Shrawankar ◽

Kranti Wankhede

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Reading Time ◽

News Article ◽

Keyword Extraction ◽

Keyphrase Extraction ◽

Generation Technique ◽

Extraction Algorithm ◽

Key Terms

A considerable amount of time is required to interpret whole news article to get the gist of it. Therefore, in order to reduce the reading and interpretation time, headlines are necessary. The available techniques for news headline construction mainly includes extractive and abstractive headline generation techniques. In this paper, context based news headline is formed from long news article by using techniques of core Natural Language Processing (NLP) and key terms of news article. Key terms are retrieved from lengthy news article by using various approaches of keyword extraction. The keyphrases are picked out using Keyphrase Extraction Algorithm (KEA) which helps to construct headline syntax along with NLP's parsing technique. Sentence compression algorithm helps to generate compressed sentences from generated parse tree of leading sentences. Headline helps user for reducing cognitive burden of reader by reflecting important contents of news. The objective is to frame headline using key terms for reducing reading time and efforts of reader.

Download Full-text

Keyword extraction method for machine reading comprehension based on natural language processing

Journal of Physics Conference Series ◽

10.1088/1742-6596/1955/1/012072 ◽

2021 ◽

Vol 1955 (1) ◽

pp. 012072

Author(s):

Ruiheng Li ◽

Xuan Zhang ◽

Chengdong Li ◽

Zhongju Zheng ◽

Zihang Zhou ◽

...

Keyword(s):

Reading Comprehension ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Extraction Method ◽

Keyword Extraction ◽

Machine Reading

Download Full-text

Development of algorithm for classification smoking status from unstructured bilingual electronic health records based on natural language processing (Preprint)

10.2196/preprints.26978 ◽

2021 ◽

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Clinical Notes ◽

Electronic Health

BACKGROUND Smoking is a major risk factor and important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). OBJECTIVE We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). METHODS With acronym replacement and Python package Soynlp, we normalize 4,711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. RESULTS Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual clinical notes. Given an identical SVM classifier, the extracted keywords improve the F1 score by as much as 1.8% compared to those of the unigram and bigram Bag of Words. CONCLUSIONS Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired and used for clinical practice and research.

Download Full-text

Text Mining Business Policy Documents

International Journal of Business Intelligence Research ◽

10.4018/ijbir.20200701.oa1 ◽

2020 ◽

Vol 11 (2) ◽

pp. 28-46 ◽

Cited By ~ 1

Author(s):

Marco Spruit ◽

Drilon Ferati

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Qualitative Assessment ◽

Keyword Extraction ◽

Automatic Summarization ◽

Business Policy ◽

Policy Documents ◽

Modelling Framework ◽

Processing Techniques

In a time when the employment of natural language processing techniques in domains such as biomedicine, national security, finance, and law is flourishing, this study takes a deep look at its application in policy documents. Besides providing an overview of the current state of the literature that treats these concepts, the authors implement a set of natural language processing techniques on internal bank policies. The implementation of these techniques, together with the results that derive from the experiments and expert evaluation, introduce a meta-algorithmic modelling framework for processing internal business policies. This framework relies on three natural language processing techniques, namely information extraction, automatic summarization, and automatic keyword extraction. For the reference extraction and keyword extraction tasks, the authors calculated precision, recall, and F-scores. For the former, the researchers obtained 0.99, 0.84, and 0.89; for the latter, this research obtained 0.79, 0.87, and 0.83, respectively. Finally, the summary extraction approach was positively evaluated using a qualitative assessment.

Download Full-text

Validation of deep learning natural language processing algorithm for keyword extraction from pathology reports in electronic health records

Scientific Reports ◽

10.1038/s41598-020-77258-w ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Yoojoong Kim ◽

Jeong Hyeon Lee ◽

Sunho Choi ◽

Jeong Moon Lee ◽

Jong-Ho Kim ◽

...

Keyword(s):

Deep Learning ◽

Natural Language ◽

Language Processing ◽

Extraction Methods ◽

Complex Nature ◽

Keyword Extraction ◽

Extraction Algorithm ◽

Pathology Reports ◽

Natural Language Process ◽

Present Algorithm

AbstractPathology reports contain the essential data for both clinical and research purposes. However, the extraction of meaningful, qualitative data from the original document is difficult due to the narrative and complex nature of such reports. Keyword extraction for pathology reports is necessary to summarize the informative text and reduce intensive time consumption. In this study, we employed a deep learning model for the natural language process to extract keywords from pathology reports and presented the supervised keyword extraction algorithm. We considered three types of pathological keywords, namely specimen, procedure, and pathology types. We compared the performance of the present algorithm with the conventional keyword extraction methods on the 3115 pathology reports that were manually labeled by professional pathologists. Additionally, we applied the present algorithm to 36,014 unlabeled pathology reports and analysed the extracted keywords with biomedical vocabulary sets. The results demonstrated the suitability of our model for practical application in extracting important data from pathology reports.

Download Full-text

Automatic Text Summarization and Keyword Extraction using Natural Language Processing

2020 International Conference on Electronics and Sustainable Communication Systems (ICESC) ◽

10.1109/icesc48915.2020.9155852 ◽

2020 ◽

Author(s):

Avinash Payak ◽

Saurabh Rai ◽

Kanishka Shrivastava ◽

Reshma Gulwani

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization ◽

Keyword Extraction ◽

Automatic Text Summarization ◽

Automatic Text

Download Full-text

Keyword Extraction Algorithm for Classifying Smoking Status from Unstructured Bilingual Electronic Health Records Based on Natural Language Processing

Applied Sciences ◽

10.3390/app11198812 ◽

2021 ◽

Vol 11 (19) ◽

pp. 8812

Author(s):

Ye Seul Bae ◽

Kyung Hwan Kim ◽

Han Kyul Kim ◽

Sae Won Choi ◽

Taehoon Ko ◽

...

Keyword(s):

Natural Language Processing ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Smoking Status ◽

Extraction Methods ◽

Svm Classifier ◽

Keyword Extraction ◽

Health Records ◽

Electronic Health

Smoking is an important variable for clinical research, but there are few studies regarding automatic obtainment of smoking classification from unstructured bilingual electronic health records (EHR). We aim to develop an algorithm to classify smoking status based on unstructured EHRs using natural language processing (NLP). With acronym replacement and Python package Soynlp, we normalize 4711 bilingual clinical notes. Each EHR notes was classified into 4 categories: current smokers, past smokers, never smokers, and unknown. Subsequently, SPPMI (Shifted Positive Point Mutual Information) is used to vectorize words in the notes. By calculating cosine similarity between these word vectors, keywords denoting the same smoking status are identified. Compared to other keyword extraction methods (word co-occurrence-, PMI-, and NPMI-based methods), our proposed approach improves keyword extraction precision by as much as 20.0%. These extracted keywords are used in classifying 4 smoking statuses from our bilingual EHRs. Given an identical SVM classifier, the F1 score is improved by as much as 1.8% compared to those of the unigram and bigram Bag of Words. Our study shows the potential of SPPMI in classifying smoking status from bilingual, unstructured EHRs. Our current findings show how smoking information can be easily acquired for clinical practice and research.

Download Full-text

Key2Vec: Automatic Ranked Keyphrase Extraction from Scientific Articles using Phrase Embeddings

10.31219/osf.io/j76y3 ◽

2018 ◽

Cited By ~ 1

Author(s):

Debanjan Mahata ◽

John Kuriakose ◽

Rajiv Ratn Shah ◽

Roger Zimmermann

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

State Of The Art ◽

Keyphrase Extraction ◽

Text Documents ◽

Benchmark Datasets

Keyphrase extraction is a fundamental task in natural language processing that facilitates mapping of documents to a set of representative phrases. In this paper, we present an unsupervised technique (Key2Vec) that leverages phrase embeddings for ranking keyphrases extracted from scientific articles. Specifically, we propose an effective way of processing text documents for training multi-word phrase embeddings that are used for thematic representation of scientific articles and ranking of keyphrases extracted from them using theme-weighted PageRank. Evaluations are performed on benchmark datasets producing state-of-the-art results.

Download Full-text

Keyword extraction and ranking based on crawler and natural language processing

10.4108/eai.6-6-2021.2307707 ◽

2021 ◽

Author(s):

Enbo Zhang ◽

Changmao Li ◽

Li Liu

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Keyword Extraction

Download Full-text

Topic Modeling for Keyword Extraction: using Natural Language Processing methods for keyword extraction in Portal Min@s

Revista de Estudos da Linguagem ◽

10.17851/2237-2083.23.3.695-726 ◽

2015 ◽

Vol 23 (3) ◽

pp. 695 ◽

Cited By ~ 1

Author(s):

Arnaldo Candido Junior ◽

Célia Magalhães ◽

Helena Caseli ◽

Régis Zangirolami

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Topic Modeling ◽

Latent Dirichlet Allocation ◽

Keyword Extraction ◽

Processing Methods ◽

Dirichlet Allocation

Este artigo tem o objetivo da avaliar a aplicação de dois métodos automáticos eficientes na extração de palavras-chave, usados pelas comunidades da Linguística de Corpus e do Processamento da Língua Natural para gerar palavras-chave de textos literários: o WordSmith Tools e o Latent Dirichlet Allocation (LDA). As duas ferramentas escolhidas para este trabalho têm suas especificidades e técnicas diferentes de extração, o que nos levou a uma análise orientada para a sua performance. Objetivamos entender, então, como cada método funciona e avaliar sua aplicação em textos literários. Para esse fim, usamos análise humana, com conhecimento do campo dos textos usados. O método LDA foi usado para extrair palavras-chave por meio de sua integração com o Portal Min@s: Corpora de Fala e Escrita, um sistema geral de processamento de corpora, concebido para diferentes pesquisas de Linguística de Corpus. Os resultados do experimento confirmam a eficácia do WordSmith Tools e do LDA na extração de palavras-chave de um corpus literário, além de apontar que é necessária a análise humana das listas em um estágio anterior aos experimentos para complementar a lista gerada automaticamente, cruzando os resultados do WordSmith Tools e do LDA. Também indicam que a intuição linguística do analista humano sobre as listas geradas separadamente pelos dois métodos usados neste estudo foi mais favorável ao uso da lista de palavras-chave do WordSmith Tools.

Download Full-text

Keyword Extraction in Economics Literatures using Natural Language Processing

10.1109/icufn49451.2021.9528546 ◽

2021 ◽

Author(s):

Soojeong Kim ◽

Sunho Choi ◽

Junhee Seok

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Keyword Extraction

Download Full-text