Application of Community Detection Technique in Text Mining

Author(s):  
Shashank Dubey ◽  
Abhishek Tiwari ◽  
Jitendra Agrawal

Detecting and Identifying traffic sign is a complicated issue due to the changing variability in cloud conditions. Hence, it is necessary to identify and detect of traffic signs during journey. The traffic text sign identification fails due to noise, blur, distortion and occlusion. In order to identify the text, a technique should be adapted that recognizes the text with improved accuracy. In existing algorithms such as Histogram of Oriented Gradients (HOG) and Support Vector Machine (SVM) were not detecting the Centroid position. In this paper, the text Centroid of position sign is detected using text color, font and size. During journey, if the text is blurred, this Traffic Sign Detection Technique based on Centroid Position Identification (TSD-CPI) K-means algorithm for clustering is possible to use. As a result, it detects the text that with improved accuracy. Ultimately, it reduces the processing time. The experimental result reveals that using WEKA-3.8 with the proposed technique shows improvement over the existing algorithms in terms of precision and Recall which enhance the accuracy in text mining


PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0255127
Author(s):  
Hend Alrasheed

Keyword extraction refers to the process of detecting the most relevant terms and expressions in a given text in a timely manner. In the information explosion era, keyword extraction has attracted increasing attention. The importance of keyword extraction in text summarization, text comparisons, and document categorization has led to an emphasis on graph-based keyword extraction techniques because they can capture more structural information compared to other classic text analysis methods. In this paper, we propose a simple unsupervised text mining approach that aims to extract a set of keywords from a given text and analyze its topic diversity using graph analysis tools. Initially, the text is represented as a directed graph using synonym relationships. Then, community detection and other measures are used to identify keywords in the text. The set of extracted keywords is used to assess topic diversity within the text and analyze its sentiment. The proposed approach relies on grouping semantically similar candidate words. This approach ensures that the set of extracted keywords is comprehensive. Differing from other graph-based keyword extraction approaches, the proposed method does not require user parameters during graph construction and word scoring. The proposed approach achieved significant results compared to other keyword extraction techniques.


2017 ◽  
Vol 2017 ◽  
pp. 1-11 ◽  
Author(s):  
Huu Hiep Nguyen

With the advent of the k-modes algorithm, the toolbox for clustering categorical data has an efficient tool that scales linearly in the number of data items. However, random initialization of cluster centers in k-modes makes it hard to reach a good clustering without resorting to many trials. Recently proposed methods for better initialization are deterministic and reduce the clustering cost considerably. A variety of initialization methods differ in how the heuristics chooses the set of initial centers. In this paper, we address the clustering problem for categorical data from the perspective of community detection. Instead of initializing k modes and running several iterations, our scheme, CD-Clustering, builds an unweighted graph and detects highly cohesive groups of nodes using a fast community detection technique. The top-k detected communities by size will define the k modes. Evaluation on ten real categorical datasets shows that our method outperforms the existing initialization methods for k-modes in terms of accuracy, precision, and recall in most of the cases.


2013 ◽  
Vol 46 ◽  
pp. 165-201 ◽  
Author(s):  
V. Qazvinian ◽  
D. R. Radev ◽  
S. M. Mohammad ◽  
B. Dorr ◽  
D. Zajic ◽  
...  

Researchers and scientists increasingly find themselves in the position of having to quickly understand large amounts of technical material. Our goal is to effectively serve this need by using bibliometric text mining and summarization techniques to generate summaries of scientific literature. We show how we can use citations to produce automatically generated, readily consumable, technical extractive summaries. We first propose C-LexRank, a model for summarizing single scientific articles based on citations, which employs community detection and extracts salient information-rich sentences. Next, we further extend our experiments to summarize a set of papers, which cover the same scientific topic. We generate extractive summaries of a set of Question Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their citation sentences and show that citations have unique information amenable to creating a summary.


Sign in / Sign up

Export Citation Format

Share Document