Word synonym relationships for text analysis: A graph-based approach

Keyword extraction refers to the process of detecting the most relevant terms and expressions in a given text in a timely manner. In the information explosion era, keyword extraction has attracted increasing attention. The importance of keyword extraction in text summarization, text comparisons, and document categorization has led to an emphasis on graph-based keyword extraction techniques because they can capture more structural information compared to other classic text analysis methods. In this paper, we propose a simple unsupervised text mining approach that aims to extract a set of keywords from a given text and analyze its topic diversity using graph analysis tools. Initially, the text is represented as a directed graph using synonym relationships. Then, community detection and other measures are used to identify keywords in the text. The set of extracted keywords is used to assess topic diversity within the text and analyze its sentiment. The proposed approach relies on grouping semantically similar candidate words. This approach ensures that the set of extracted keywords is comprehensive. Differing from other graph-based keyword extraction approaches, the proposed method does not require user parameters during graph construction and word scoring. The proposed approach achieved significant results compared to other keyword extraction techniques.

Download Full-text

A comparative evaluation of different keyword extraction techniques

International Journal of Information Retrieval Research ◽

10.4018/ijirr.289573 ◽

2022 ◽

Vol 12 (1) ◽

pp. 0-0

Keyword(s):

High Frequency ◽

Extraction Methods ◽

Text Summarization ◽

Keyword Extraction ◽

Extraction Techniques ◽

Scientific Texts ◽

Inverse Document Frequency ◽

Document Frequency ◽

Long Time ◽

Document Categorization

Retrieving keywords in a text is attracting researchers for a long time as it forms a base for many natural language applications like information retrieval, text summarization, document categorization etc. A text is a collection of words that represent the theme of the text naturally and to bring the naturalism under certain rules is itself a challenging task. In the present paper, the authors evaluate different spatial distribution based keyword extraction methods available in the literature on three standard scientific texts. The authors choose the first few high-frequency words for evaluation to reduce the complexity as all the methods are somehow based on frequency. The authors find that the methods are not providing good results particularly in the case of the first few retrieved words. Thus, the authors propose a new measure based on frequency, inverse document frequency, variance, and Tsallis entropy. Evaluation of different methods is done on the basis of precision, recall, and F-measure. Results show that the proposed method provides improved results.

Download Full-text

Semantic Analysis and Text Summarization in Socio-Technical Systems

Socio-Technical Decision Support in Air Navigation Systems - Advances in Mechatronics and Mechanical Engineering ◽

10.4018/978-1-5225-3108-1.ch008 ◽

2017 ◽

pp. 243-281 ◽

Cited By ~ 2

Author(s):

Nina Rizun

Keyword(s):

Decision Making ◽

Text Mining ◽

Semantic Analysis ◽

Text Summarization ◽

Technical System ◽

Synergy Effect ◽

Decision Making Processes ◽

Multi Level ◽

Technical Systems

In this chapter, the authors present the results of the development the text-mining methodology for increasing the reliability of the functioning of Socio-technical System (STS). Taking into account revealed strengths and weaknesses of Discriminant and Probabilistic approaches of Latent Semantic Relations analysis in of the abstracting and summarization projection, the Methodology of Two-level Single Document Summarization was developed. The Methodology assumes the following elements of novelty: based on obtaining a multi-level topical framework of the document (abstracting); uses the synergy effect of consistent usage the combination of two approaches for identification of conceptually significant elements of the text (summarization). The examples demonstrating the basic workability of proposed Methodology were presented. Such approaches should help human to increase the quality of supporting the decision-making processes of STS in real time.

Download Full-text

SPCCTDM, a Catalogue for Analysis of Therapeutic Drug Monitoring Related Contents

Computational Knowledge Discovery for Bioinformatics Research ◽

10.4018/978-1-4666-1785-8.ch018 ◽

2013 ◽

pp. 319-328

Author(s):

Sven Ulrich ◽

Pierre Baumann ◽

Andreas Conca ◽

Hans-Joachim Kuss ◽

Viktoria Stieffenhofer ◽

...

Keyword(s):

Drug Therapy ◽

Therapeutic Drug Monitoring ◽

Text Mining ◽

Drug Monitoring ◽

Text Analysis ◽

Scientific Evidence ◽

Plasma Concentrations ◽

Therapeutic Drug ◽

Drug Reactions ◽

First Time

Therapeutic drug monitoring (TDM) has consistently been shown to be useful for optimization of drug therapy. For the first time, a method has been developed for the text analysis of TDM in SPCs in that a catalogue SPC-ContentTDM (SPCCTDM) provides a codification of the content of TDM in SPCs. It consists of six structure-related items (dose, adverse drug reactions, drug interactions, overdose, pregnancy/breast feeding, and pharmacokinetics) according to implicit or explicit references to TDM in paragraphs of the SPC, and four theory-guided items according to the information about ranges of plasma concentrations and a recommendation of TDM in the SPC. The catalogue is regarded as valid for the text analysis of SPCs with respect to TDM. It can be used in the comparison of SPCs, in the comparison with medico-scientific evidence and for the estimation of the perception of TDM in SPCs by the reader. Regarding the approach as a model of text mining, it may be extended for evaluation of other aspects reported in SPCs.

Download Full-text

Text Mining

Handbook of Research on Public Information Technology ◽

10.4018/978-1-59904-857-4.ch054 ◽

2008 ◽

pp. 592-603 ◽

Cited By ~ 2

Author(s):

Antonina Durfee

Keyword(s):

Text Mining ◽

Deception Detection ◽

Text Summarization ◽

Authorship Attribution ◽

Venture Capitalists ◽

Help Desk ◽

News Agencies ◽

Textual Databases ◽

Available Information ◽

Mining Tools

Massive quantities of information continue accumulating at about 1.5 billion gigabytes per year in numerous repositories held at news agencies, at libraries, on corporate intranets, on personal computers, and on the Web. A large portion of all available information exists in the form of text. Researchers, analysts, editors, venture capitalists, lawyers, help desk specialists, and even students are faced with text analysis challenges. Text mining tools aim at discovering knowledge from textual databases by isolating key bits of information from large amounts of text, identifying relationships among documents. Text mining technology is used for plagiarism and authorship attribution, text summarization and retrieval, and deception detection.

Download Full-text

Automatic Keyword Extraction From Text Documents

Digital Technology Advancements in Knowledge Management - Advances in Knowledge Acquisition, Transfer, and Management ◽

10.4018/978-1-7998-6792-0.ch004 ◽

2021 ◽

pp. 71-91

Author(s):

Furkan Goz ◽

Alev Mutlu

Keyword(s):

Information Retrieval ◽

State Of The Art ◽

Online News ◽

Evaluation Metrics ◽

Keyword Extraction ◽

Feature Engineering ◽

Extraction Techniques ◽

Text Documents ◽

Scientific Papers ◽

Benchmark Datasets

Keyword indexing is the problem of assigning keywords to text documents. It is an important task as keywords play crucial roles in several information retrieval tasks. The problem is also challenging as the number of text documents is increasing, and such documents come in different forms (i.e., scientific papers, online news articles, and microblog posts). This chapter provides an overview of keyword indexing and elaborates on keyword extraction techniques. The authors provide the general motivations behind the supervised and the unsupervised keyword extraction and enumerate several pioneering and state-of-the-art techniques. Feature engineering, evaluation metrics, and benchmark datasets used to evaluate the performance of keyword extraction systems are also discussed.

Download Full-text

A Survey on Sentiment Analysis Techniques for Twitter

10.4018/978-1-7998-8413-2.ch003 ◽

2022 ◽

pp. 57-90

Author(s):

Surabhi Verma ◽

Ankit Kumar Jain

Keyword(s):

Social Media ◽

Text Mining ◽

Sentiment Analysis ◽

Text Analysis ◽

Analysis Techniques ◽

Goods And Services ◽

Text Document ◽

The Subject ◽

Over Time

People regularly use social media to express their opinions about a wide variety of topics, goods, and services which make it rich in text mining and sentiment analysis. Sentiment analysis is a form of text analysis determining polarity (positive, negative, or neutral) in text, document, paragraph, or clause. This chapter offers an overview of the subject by examining the proposed algorithms for sentiment analysis on Twitter and briefly explaining them. In addition, the authors also address fields related to monitoring sentiments over time, regional view of views, neutral tweet analysis, sarcasm detection, and various other tasks in this area that have drawn the researchers ' attention to this subject nearby. Within this chapter, all the services used are briefly summarized. The key contribution of this survey is the taxonomy based on the methods suggested and the debate on the theme's recent research developments and related fields.

Download Full-text

Uncovering community structure in networks via hybrid clustering using cascading failure dynamics and topological metric functions

International Journal of Modern Physics B ◽

10.1142/s0217979219503521 ◽

2019 ◽

Vol 33 (29) ◽

pp. 1950352

Author(s):

Bo Yang ◽

Tao Huang ◽

Xu Li

Keyword(s):

Community Structure ◽

Community Detection ◽

Nearest Neighbor ◽

Structural Information ◽

Point Of View ◽

Cascading Failures ◽

Hybrid Clustering ◽

Mesoscale Structure ◽

Metric Functions ◽

Global And Local

Many networks have community structure — groups of nodes within which connections are dense but between which they are sparser. While there exists a range of algorithms for community detection in networks, most of them try to discover this important mesoscale structure from a topological point of view solely. Here we develop a hybrid clustering approach for uncovering the community structure in a network using a combination of information on local topology of the network and on the dynamics of the cascading failures. The originality of the proposed approach is that we introduce a novel fusion of the dynamic behaviors of the cascading failures and topological metric functions in the [Formula: see text]th-nearest neighbor density scheme, which integrates both the global and local structural information of a given network for community detection. The experimental results on both artificial random and real-world benchmark networks indicate the effectiveness and reliability of our approach.

Download Full-text