LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

Journal of Artificial Intelligence Research ◽

10.1613/jair.1523 ◽

2004 ◽

Vol 22 ◽

pp. 457-479 ◽

Cited By ~ 753

Author(s):

G. Erkan ◽

D. R. Radev

Keyword(s):

Language Processing ◽

Text Summarization ◽

Graph Representation ◽

Connectivity Matrix ◽

Relative Importance ◽

Eigenvector Centrality ◽

Threshold Method ◽

Data Set ◽

New Approach ◽

Stochastic Graph

We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents.

Download Full-text

Text analysis on health product reviews using r approach

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i3.pp1303-1310 ◽

2020 ◽

Vol 18 (3) ◽

pp. 1303

Author(s):

Nasibah Husna Mohd Kadir ◽

Sharifah Aliman

Keyword(s):

Language Processing ◽

Text Summarization ◽

Unstructured Data ◽

Product Reviews ◽

Health Product ◽

Text Analytics ◽

Data Set ◽

Web Based ◽

The Social ◽

Key Techniques

In the social media, product reviews contain of text, emoticon, numbers and symbols that hard to identify the text summarization. Text analytics is one of the key techniques in exploring the unstructured data. The purpose of this study is solving the unstructured data by sort and summarizes the review data through a Web-Based Text Analytics using R approach. According to the comparative table between studies in Natural Language Processing (NLP) features, it was observed that Web-Based Text Analytics using R approach can analyze the unstructured data by using the data processing package in R. It combines all the NLP features in the menu part of the text analytics process in steps and it is labeled to make it easier for users to view all the text summarization. This study uses health product review from Shaklee as the data set. The proposed approach shows the acceptable performance in terms of system features execution compared with the baseline model system.

Download Full-text

Arabic text summarization using deep learning approach

Journal Of Big Data ◽

10.1186/s40537-020-00386-7 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Molham Al-Maleh ◽

Said Desouki

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Language Processing ◽

Network Models ◽

Arabic Language ◽

Text Summarization ◽

Neural Network Models ◽

Data Set ◽

Learning Techniques ◽

Arabic Text Summarization

AbstractNatural language processing has witnessed remarkable progress with the advent of deep learning techniques. Text summarization, along other tasks like text translation and sentiment analysis, used deep neural network models to enhance results. The new methods of text summarization are subject to a sequence-to-sequence framework of encoder–decoder model, which is composed of neural networks trained jointly on both input and output. Deep neural networks take advantage of big datasets to improve their results. These networks are supported by the attention mechanism, which can deal with long texts more efficiently by identifying focus points in the text. They are also supported by the copy mechanism that allows the model to copy words from the source to the summary directly. In this research, we are re-implementing the basic summarization model that applies the sequence-to-sequence framework on the Arabic language, which has not witnessed the employment of this model in the text summarization before. Initially, we build an Arabic data set of summarized article headlines. This data set consists of approximately 300 thousand entries, each consisting of an article introduction and the headline corresponding to this introduction. We then apply baseline summarization models to the previous data set and compare the results using the ROUGE scale.

Download Full-text

Developing a new approach to summarize Arabic text automatically using syntactic and semantic analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v9i2.30324 ◽

2020 ◽

Vol 9 (2) ◽

pp. 342

Author(s):

Amal Alkhudari

Keyword(s):

Language Processing ◽

Automatic System ◽

Semantic Analysis ◽

Text Summarization ◽

Original Text ◽

Arabic Text ◽

Wide Spread ◽

New Approach ◽

Automatic Text Summarization ◽

Automatic Text

Due to the wide spread information and the diversity of its sources, there is a need to produce an accurate text summary with the least time and effort. This summary must preserve key information content and overall meaning of the original text. Text summarization is one of the most important applications of Natural Language Processing (NLP). The goal of automatic text summarization is to create summaries that are similar to human-created ones. However, in many cases, the readability of created summaries is not satisfactory, because the summaries do not consider the meaning of the words and do not cover all the semantically relevant aspects of data. In this paper we use syntactic and semantic analysis to propose an automatic system of Arabic texts summarization. This system is capable of understanding the meaning of information and retrieves only the relevant part. The effectiveness and evaluation of the proposed work are demonstrated under EASC corpus using Rouge measure. The generated summaries will be compared against those done by human and precedent researches.

Download Full-text

A Review and evaluation of Machine Translation methods for Lumasaaba

Journal of Digital Science ◽

10.33847/2686-8296.2.1_1 ◽

2020 ◽

pp. 3-17

Author(s):

Peter Nabende

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Machine Translation ◽

Language Processing ◽

Research Area ◽

Data Driven ◽

East African ◽

Data Set ◽

African Languages ◽

Translation Methods

Natural Language Processing for under-resourced languages is now a mainstream research area. However, there are limited studies on Natural Language Processing applications for many indigenous East African languages. As a contribution to covering the current gap of knowledge, this paper focuses on evaluating the application of well-established machine translation methods for one heavily under-resourced indigenous East African language called Lumasaaba. Specifically, we review the most common machine translation methods in the context of Lumasaaba including both rule-based and data-driven methods. Then we apply a state of the art data-driven machine translation method to learn models for automating translation between Lumasaaba and English using a very limited data set of parallel sentences. Automatic evaluation results show that a transformer-based Neural Machine Translation model architecture leads to consistently better BLEU scores than the recurrent neural network-based models. Moreover, the automatically generated translations can be comprehended to a reasonable extent and are usually associated with the source language input.

Download Full-text

A New Approach for the Analysis of Mixture Toxicity Data

Water Science & Technology ◽

10.2166/wst.1992.0733 ◽

1992 ◽

Vol 26 (9-11) ◽

pp. 2345-2348 ◽

Cited By ~ 2

Author(s):

C. N. Haas

Keyword(s):

Quantitative Analysis ◽

Mixture Toxicity ◽

New Method ◽

Metal Exposure ◽

Positive Interactions ◽

Toxicity Data ◽

Data Set ◽

New Approach ◽

Negative Interactions

A new method for the quantitative analysis of multiple toxicity data is described and illustrated using a data set on metal exposure to copepods. Positive interactions are observed for Ni-Pb and Pb-Cr, with weak negative interactions observed for Ni-Cr.

Download Full-text

Designing a Chat-Bot for College Information using Information Retrieval and Automatic Text Summarization Techniques

Current Chinese Computer Science ◽

10.2174/2665997201999201022191540 ◽

2020 ◽

Vol 01 ◽

Author(s):

Radha Guha

Keyword(s):

Information Retrieval ◽

Language Processing ◽

Latent Dirichlet Allocation ◽

Semantic Analysis ◽

Text Summarization ◽

The Internet ◽

Specific Domain ◽

User Query ◽

College Information ◽

Chat Bot

Background:: In the era of information overload it is very difficult for a human reader to make sense of the vast information available in the internet quickly. Even for a specific domain like college or university website it may be difficult for a user to browse through all the links to get the relevant answers quickly. Objective:: In this scenario, design of a chat-bot which can answer questions related to college information and compare between colleges will be very useful and novel. Methods:: In this paper a novel conversational interface chat-bot application with information retrieval and text summariza-tion skill is designed and implemented. Firstly this chat-bot has a simple dialog skill when it can understand the user query intent, it responds from the stored collection of answers. Secondly for unknown queries, this chat-bot can search the internet and then perform text summarization using advanced techniques of natural language processing (NLP) and text mining (TM). Results:: The advancement of NLP capability of information retrieval and text summarization using machine learning tech-niques of Latent Semantic Analysis(LSI), Latent Dirichlet Allocation (LDA), Word2Vec, Global Vector (GloVe) and Tex-tRank are reviewed and compared in this paper first before implementing them for the chat-bot design. This chat-bot im-proves user experience tremendously by getting answers to specific queries concisely which takes less time than to read the entire document. Students, parents and faculty can get the answers for variety of information like admission criteria, fees, course offerings, notice board, attendance, grades, placements, faculty profile, research papers and patents etc. more effi-ciently. Conclusion:: The purpose of this paper was to follow the advancement in NLP technologies and implement them in a novel application.

Download Full-text

Natural Language Processing (NLP) based Text Summarization - A Survey

2021 6th International Conference on Inventive Computation Technologies (ICICT) ◽

10.1109/icict50816.2021.9358703 ◽

2021 ◽

Author(s):

Ishitva Awasthi ◽

Kuntal Gupta ◽

Prabjot Singh Bhogal ◽

Sahejpreet Singh Anand ◽

Piyush Kumar Soni

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Text Summarization

Download Full-text

Offensive keyword extraction based on the attention mechanism of BERT and the eigenvector centrality using a graph representation

Personal and Ubiquitous Computing ◽

10.1007/s00779-021-01605-5 ◽

2021 ◽

Author(s):

Gretel Liz De la Peña Sarracén ◽

Paolo Rosso

Keyword(s):

Language Model ◽

Hybrid Approach ◽

Attention Mechanism ◽

Graph Representation ◽

Keyword Extraction ◽

Eigenvector Centrality ◽

Model Learning ◽

Harmful Content ◽

The Impact ◽

Offensive Language

AbstractThe proliferation of harmful content on social media affects a large part of the user community. Therefore, several approaches have emerged to control this phenomenon automatically. However, this is still a quite challenging task. In this paper, we explore the offensive language as a particular case of harmful content and focus our study in the analysis of keywords in available datasets composed of offensive tweets. Thus, we aim to identify relevant words in those datasets and analyze how they can affect model learning. For keyword extraction, we propose an unsupervised hybrid approach which combines the multi-head self-attention of BERT and a reasoning on a word graph. The attention mechanism allows to capture relationships among words in a context, while a language model is learned. Then, the relationships are used to generate a graph from what we identify the most relevant words by using the eigenvector centrality. Experiments were performed by means of two mechanisms. On the one hand, we used an information retrieval system to evaluate the impact of the keywords in recovering offensive tweets from a dataset. On the other hand, we evaluated a keyword-based model for offensive language detection. Results highlight some points to consider when training models with available datasets.

Download Full-text

Research of Personalized Recommendation Technology Based on Knowledge Graphs

Applied Sciences ◽

10.3390/app11157104 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7104

Author(s):

Xu Yang ◽

Ziyi Huan ◽

Yisong Zhai ◽

Ting Lin

Keyword(s):

Neural Network ◽

Hot Spot ◽

Experimental Results ◽

Graph Representation ◽

Personalized Recommendation ◽

Knowledge Graph ◽

Learning Technology ◽

Data Set ◽

Model Based ◽

Knowledge Graphs

Nowadays, personalized recommendation based on knowledge graphs has become a hot spot for researchers due to its good recommendation effect. In this paper, we researched personalized recommendation based on knowledge graphs. First of all, we study the knowledge graphs’ construction method and complete the construction of the movie knowledge graphs. Furthermore, we use Neo4j graph database to store the movie data and vividly display it. Then, the classical translation model TransE algorithm in knowledge graph representation learning technology is studied in this paper, and we improved the algorithm through a cross-training method by using the information of the neighboring feature structures of the entities in the knowledge graph. Furthermore, the negative sampling process of TransE algorithm is improved. The experimental results show that the improved TransE model can more accurately vectorize entities and relations. Finally, this paper constructs a recommendation model by combining knowledge graphs with ranking learning and neural network. We propose the Bayesian personalized recommendation model based on knowledge graphs (KG-BPR) and the neural network recommendation model based on knowledge graphs(KG-NN). The semantic information of entities and relations in knowledge graphs is embedded into vector space by using improved TransE method, and we compare the results. The item entity vectors containing external knowledge information are integrated into the BPR model and neural network, respectively, which make up for the lack of knowledge information of the item itself. Finally, the experimental analysis is carried out on MovieLens-1M data set. The experimental results show that the two recommendation models proposed in this paper can effectively improve the accuracy, recall, F1 value and MAP value of recommendation.

Download Full-text

Identifying and intercepting health misinformation on Reddit dermatology forums with artificially intelligent bots using natural language processing (Preprint)

10.2196/preprints.20975 ◽

2021 ◽

Author(s):

Monique B. Sager ◽

Aditya M. Kashyap ◽

Mila Tamminga ◽

Sadhana Ravoori ◽

Christopher Callison-Burch ◽

...

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

The United States ◽

Test Accuracy ◽

Limited Data ◽

Test Environment ◽

Data Set ◽

Inappropriate Care ◽

Processing Techniques

BACKGROUND Reddit, the fifth most popular website in the United States, boasts a large and engaged user base on its dermatology forums where users crowdsource free medical opinions. Unfortunately, much of the advice provided is unvalidated and could lead to inappropriate care. Initial testing has shown that artificially intelligent bots can detect misinformation on Reddit forums and may be able to produce responses to posts containing misinformation. OBJECTIVE To analyze the ability of bots to find and respond to health misinformation on Reddit’s dermatology forums in a controlled test environment. METHODS Using natural language processing techniques, we trained bots to target misinformation using relevant keywords and to post pre-fabricated responses. By evaluating different model architectures across a held-out test set, we compared performances. RESULTS Our models yielded data test accuracies ranging from 95%-100%, with a BERT fine-tuned model resulting in the highest level of test accuracy. Bots were then able to post corrective pre-fabricated responses to misinformation. CONCLUSIONS Using a limited data set, bots had near-perfect ability to detect these examples of health misinformation within Reddit dermatology forums. Given that these bots can then post pre-fabricated responses, this technique may allow for interception of misinformation. Providing correct information, even instantly, however, does not mean users will be receptive or find such interventions persuasive. Further work should investigate this strategy’s effectiveness to inform future deployment of bots as a technique in combating health misinformation. CLINICALTRIAL N/A

Download Full-text