A Novel Text Clustering Approach Using Deep-Learning Vocabulary Network

Text clustering is an effective approach to collect and organize text documents into meaningful groups for mining valuable information on the Internet. However, there exist some issues to tackle such as feature extraction and data dimension reduction. To overcome these problems, we present a novel approach named deep-learning vocabulary network. The vocabulary network is constructed based on related-word set, which contains the “cooccurrence” relations of words or terms. We replace term frequency in feature vectors with the “importance” of words in terms of vocabulary network and PageRank, which can generate more precise feature vectors to represent the meaning of text clustering. Furthermore, sparse-group deep belief network is proposed to reduce the dimensionality of feature vectors, and we introduce coverage rate for similarity measure in Single-Pass clustering. To verify the effectiveness of our work, we compare the approach to the representative algorithms, and experimental results show that feature vectors in terms of deep-learning vocabulary network have better clustering performance.

Download Full-text

Spatial Domain-Based Nonlinear Residual Feature Extraction for Identification of Image Operations

Applied Sciences ◽

10.3390/app10165582 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5582

Author(s):

Xiaochen Yuan ◽

Tian Huang

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Extraction Method ◽

Spatial Domain ◽

Feature Extraction Method ◽

Novel Approach ◽

Image Postprocessing ◽

Learning Technique ◽

Residual Values

In this paper, a novel approach that uses a deep learning technique is proposed to detect and identify a variety of image operations. First, we propose the spatial domain-based nonlinear residual (SDNR) feature extraction method by constructing residual values from locally supported filters in the spatial domain. By applying minimum and maximum operators, diversity and nonlinearity are introduced; moreover, this construction brings nonsymmetry to the distribution of SDNR samples. Then, we propose applying a deep learning technique to the extracted SDNR features to detect and classify a variety of image operations. Many experiments have been conducted to verify the performance of the proposed approach, and the results indicate that the proposed method performs well in detecting and identifying the various common image postprocessing operations. Furthermore, comparisons between the proposed approach and the existing methods show the superiority of the proposed approach.

Download Full-text

A frequent term based text clustering approach using novel similarity measure

2014 IEEE International Advance Computing Conference (IACC) ◽

10.1109/iadcc.2014.6779374 ◽

2014 ◽

Cited By ~ 7

Author(s):

G.Suresh Reddy ◽

T.V. Rajinikanth ◽

A.Ananda Rao

Keyword(s):

Similarity Measure ◽

Text Clustering ◽

Clustering Approach

Download Full-text

Spectral–Spatial Feature Extraction for Hyperspectral Image Classification: A Dimension Reduction and Deep Learning Approach

IEEE Transactions on Geoscience and Remote Sensing ◽

10.1109/tgrs.2016.2543748 ◽

2016 ◽

Vol 54 (8) ◽

pp. 4544-4554 ◽

Cited By ~ 404

Author(s):

Wenzhi Zhao ◽

Shihong Du

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Dimension Reduction ◽

Image Classification ◽

Hyperspectral Image ◽

Learning Approach ◽

Hyperspectral Image Classification ◽

Spatial Feature

Download Full-text

Extensibility of U-Net Neural Network Model for Hydrographic Feature Extraction and Implications for Hydrologic Modeling

Remote Sensing ◽

10.3390/rs13122368 ◽

2021 ◽

Vol 13 (12) ◽

pp. 2368

Author(s):

Lawrence V. Stanislawski ◽

Ethan J. Shavers ◽

Shaowen Wang ◽

Zhe Jiang ◽

E. Lynn Usery ◽

...

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Surface Water ◽

High Performance ◽

Land Development ◽

Training Data ◽

Topographic Map ◽

Feature Maps ◽

Flow Routing ◽

Novel Approach

Accurate maps of regional surface water features are integral for advancing ecologic, atmospheric and land development studies. The only comprehensive surface water feature map of Alaska is the National Hydrography Dataset (NHD). NHD features are often digitized representations of historic topographic map blue lines and may be outdated. Here we test deep learning methods to automatically extract surface water features from airborne interferometric synthetic aperture radar (IfSAR) data to update and validate Alaska hydrographic databases. U-net artificial neural networks (ANN) and high-performance computing (HPC) are used for supervised hydrographic feature extraction within a study area comprised of 50 contiguous watersheds in Alaska. Surface water features derived from elevation through automated flow-routing and manual editing are used as training data. Model extensibility is tested with a series of 16 U-net models trained with increasing percentages of the study area, from about 3 to 35 percent. Hydrography is predicted by each of the models for all watersheds not used in training. Input raster layers are derived from digital terrain models, digital surface models, and intensity images from the IfSAR data. Results indicate about 15 percent of the study area is required to optimally train the ANN to extract hydrography when F1-scores for tested watersheds average between 66 and 68. Little benefit is gained by training beyond 15 percent of the study area. Fully connected hydrographic networks are generated for the U-net predictions using a novel approach that constrains a D-8 flow-routing approach to follow U-net predictions. This work demonstrates the ability of deep learning to derive surface water feature maps from complex terrain over a broad area.

Download Full-text

A Novel Approach for Authorship Verification using Similarity Measure

International Journal For Innovative Engineering and Management Research ◽

10.48047/ijiemr/v09/i12/83 ◽

2020 ◽

pp. 437-445

Keyword(s):

Similarity Measure ◽

Cyber Security ◽

Threshold Value ◽

Similarity Score ◽

Vector Representation ◽

Text Documents ◽

Authorship Verification ◽

Novel Approach ◽

Verification Problem ◽

Document Vector

Authorship verification is a task of identifying whether two text documents are written by the same author or not by evaluating the veracity and authenticity of writings. Authorship Verification is used in various applications such as analysis of anonymous emails for forensic investigations, verification of historical literature, continuous authentication in cyber-security and detection of changes in writing styles. The Authorship Verification problem primarily depends on the similarity among the documents. In this work, a new approach is proposed based on the similarity between the known documents of the author and anonymous document. In this approach, extract the most frequent terms from the dataset for document vector representation. These most frequent terms are used to represent the train and test documents. The term weight measure is used to represent the term value in the vector representation. The Cosine similarity measure is used to determine the similarity among the training and test document. Based on the threshold value of similarity score, the author of a test document is verified whether the test document is written by the suspected author or not. The PAN competition 2014 Authorship Verification dataset is used in this experiment. The proposed approach achieved best results for Authorship verification when compared with various solutions proposed in this domain

Download Full-text

Radar Emitter Signal Feature Extraction Based on Time-Frequency Atom Decomposition

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.694-697.1317 ◽

2013 ◽

Vol 694-697 ◽

pp. 1317-1320

Author(s):

Jun Gang Li ◽

Zhu Yu Chen ◽

Ji Yang

Keyword(s):

Feature Extraction ◽

Differential Evolution ◽

Directed Acyclic Graph ◽

Linear Expansion ◽

Matching Pursuit ◽

Time Frequency ◽

Acyclic Graph ◽

Feature Vectors ◽

Novel Approach ◽

Code Algorithm

In this paper, a novel approach based on time-frequency atom decomposition is presented to recognize the radar emitter signals. To decompose the signals into a linear expansion of time-frequency atoms, a fast matching pursuit (MP) algorithm, which is optimized by composite differential evolution (CoDE) algorithm, is introduced. The feature vectors of radar emitter signals are extracted based on the atoms generated in the process of decomposition. The Directed Acyclic Graph SVM (DAGSVM) is selected as the classifier to classify the feature vectors of different radar emitter signals.

Download Full-text

A novel approach for water quality classification based on the integration of deep learning and feature extraction techniques

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2021.104329 ◽

2021 ◽

pp. 104329

Author(s):

Smail Dilmi ◽

Mohamed Ladjal

Keyword(s):

Water Quality ◽

Feature Extraction ◽

Deep Learning ◽

Extraction Techniques ◽

Novel Approach ◽

Quality Classification ◽

Water Quality Classification

Download Full-text

Dermatoscopy Using Multi-Layer Perceptron, Convolution Neural Network, and Capsule Network to Differentiate Malignant Melanoma From Benign Nevus

International Journal of Healthcare Information Systems and Informatics ◽

10.4018/ijhisi.20210701.oa4 ◽

2021 ◽

Vol 16 (3) ◽

pp. 58-73

Author(s):

Shamik Tiwari

Keyword(s):

Feature Extraction ◽

Deep Learning ◽

Malignant Melanoma ◽

Image Recognition ◽

Skin Lesions ◽

Extraction Methods ◽

Multi Layer Perceptron ◽

Novel Approach ◽

Artificial Learning ◽

Histological Images

Epiluminescence microscopy, more simply, dermatoscopy, entails a process using imaging to examine skin lesions. Various sorts of skin ailments, for example, melanoma, may be differentiated via these skin images. With the adverse possibilities of malignant melanoma causing death, an early diagnosis of melanoma can impact on the survival, length, and quality of life of the affected victim. Image recognition-based detection of different tissue classes is significant to implementing computer-aided diagnosis via histological images. Conventional image recognition require handcrafted feature extraction before the application of machine learning. Today, deep learning is offering significant choices with the progression of artificial learning to defeat the complications of the handcrafted feature extraction methods. A deep learning-based approach for the recognition of melanoma via the Capsule network is proposed here. The novel approach is compared with a multi-layer perceptron and convolution network with the Capsule network model yielding the classification accuracy at 98.9%.

Download Full-text

CLUSTER ANALYSIS OF MEDICAL TEXT DOCUMENTS BY USING SEMI-CLUSTERING APPROACH BASED ON GRAPH REPRESENTATION

Information System in Management ◽

10.22630/isim.2018.7.3.19 ◽

2018 ◽

Vol 7 (3) ◽

pp. 213-224

Author(s):

Rafał Woźniak ◽

Piotr Ożdżyński ◽

Danuta Zakrzewska

Keyword(s):

Cluster Analysis ◽

Dimension Reduction ◽

Classification Accuracy ◽

Space Dimension ◽

Graph Representation ◽

Clustering Method ◽

Medical Field ◽

Text Documents ◽

Multilabel Classification ◽

Clustering Approach

The development of Internet resulted in an increasing number of online text re-positories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may signifi-cantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semi-clustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multilabel classification process. The performance of the method is examined by experiments conducted on real medical datasets.

Download Full-text