scholarly journals A Novel Text Clustering Approach Using Deep-Learning Vocabulary Network

2017 ◽  
Vol 2017 ◽  
pp. 1-13 ◽  
Author(s):  
Junkai Yi ◽  
Yacong Zhang ◽  
Xianghui Zhao ◽  
Jing Wan

Text clustering is an effective approach to collect and organize text documents into meaningful groups for mining valuable information on the Internet. However, there exist some issues to tackle such as feature extraction and data dimension reduction. To overcome these problems, we present a novel approach named deep-learning vocabulary network. The vocabulary network is constructed based on related-word set, which contains the “cooccurrence” relations of words or terms. We replace term frequency in feature vectors with the “importance” of words in terms of vocabulary network and PageRank, which can generate more precise feature vectors to represent the meaning of text clustering. Furthermore, sparse-group deep belief network is proposed to reduce the dimensionality of feature vectors, and we introduce coverage rate for similarity measure in Single-Pass clustering. To verify the effectiveness of our work, we compare the approach to the representative algorithms, and experimental results show that feature vectors in terms of deep-learning vocabulary network have better clustering performance.

2020 ◽  
Vol 10 (16) ◽  
pp. 5582
Author(s):  
Xiaochen Yuan ◽  
Tian Huang

In this paper, a novel approach that uses a deep learning technique is proposed to detect and identify a variety of image operations. First, we propose the spatial domain-based nonlinear residual (SDNR) feature extraction method by constructing residual values from locally supported filters in the spatial domain. By applying minimum and maximum operators, diversity and nonlinearity are introduced; moreover, this construction brings nonsymmetry to the distribution of SDNR samples. Then, we propose applying a deep learning technique to the extracted SDNR features to detect and classify a variety of image operations. Many experiments have been conducted to verify the performance of the proposed approach, and the results indicate that the proposed method performs well in detecting and identifying the various common image postprocessing operations. Furthermore, comparisons between the proposed approach and the existing methods show the superiority of the proposed approach.


2021 ◽  
Vol 13 (12) ◽  
pp. 2368
Author(s):  
Lawrence V. Stanislawski ◽  
Ethan J. Shavers ◽  
Shaowen Wang ◽  
Zhe Jiang ◽  
E. Lynn Usery ◽  
...  

Accurate maps of regional surface water features are integral for advancing ecologic, atmospheric and land development studies. The only comprehensive surface water feature map of Alaska is the National Hydrography Dataset (NHD). NHD features are often digitized representations of historic topographic map blue lines and may be outdated. Here we test deep learning methods to automatically extract surface water features from airborne interferometric synthetic aperture radar (IfSAR) data to update and validate Alaska hydrographic databases. U-net artificial neural networks (ANN) and high-performance computing (HPC) are used for supervised hydrographic feature extraction within a study area comprised of 50 contiguous watersheds in Alaska. Surface water features derived from elevation through automated flow-routing and manual editing are used as training data. Model extensibility is tested with a series of 16 U-net models trained with increasing percentages of the study area, from about 3 to 35 percent. Hydrography is predicted by each of the models for all watersheds not used in training. Input raster layers are derived from digital terrain models, digital surface models, and intensity images from the IfSAR data. Results indicate about 15 percent of the study area is required to optimally train the ANN to extract hydrography when F1-scores for tested watersheds average between 66 and 68. Little benefit is gained by training beyond 15 percent of the study area. Fully connected hydrographic networks are generated for the U-net predictions using a novel approach that constrains a D-8 flow-routing approach to follow U-net predictions. This work demonstrates the ability of deep learning to derive surface water feature maps from complex terrain over a broad area.


Authorship verification is a task of identifying whether two text documents are written by the same author or not by evaluating the veracity and authenticity of writings. Authorship Verification is used in various applications such as analysis of anonymous emails for forensic investigations, verification of historical literature, continuous authentication in cyber-security and detection of changes in writing styles. The Authorship Verification problem primarily depends on the similarity among the documents. In this work, a new approach is proposed based on the similarity between the known documents of the author and anonymous document. In this approach, extract the most frequent terms from the dataset for document vector representation. These most frequent terms are used to represent the train and test documents. The term weight measure is used to represent the term value in the vector representation. The Cosine similarity measure is used to determine the similarity among the training and test document. Based on the threshold value of similarity score, the author of a test document is verified whether the test document is written by the suspected author or not. The PAN competition 2014 Authorship Verification dataset is used in this experiment. The proposed approach achieved best results for Authorship verification when compared with various solutions proposed in this domain


2013 ◽  
Vol 694-697 ◽  
pp. 1317-1320
Author(s):  
Jun Gang Li ◽  
Zhu Yu Chen ◽  
Ji Yang

In this paper, a novel approach based on time-frequency atom decomposition is presented to recognize the radar emitter signals. To decompose the signals into a linear expansion of time-frequency atoms, a fast matching pursuit (MP) algorithm, which is optimized by composite differential evolution (CoDE) algorithm, is introduced. The feature vectors of radar emitter signals are extracted based on the atoms generated in the process of decomposition. The Directed Acyclic Graph SVM (DAGSVM) is selected as the classifier to classify the feature vectors of different radar emitter signals.


Author(s):  
Shamik Tiwari

Epiluminescence microscopy, more simply, dermatoscopy, entails a process using imaging to examine skin lesions. Various sorts of skin ailments, for example, melanoma, may be differentiated via these skin images. With the adverse possibilities of malignant melanoma causing death, an early diagnosis of melanoma can impact on the survival, length, and quality of life of the affected victim. Image recognition-based detection of different tissue classes is significant to implementing computer-aided diagnosis via histological images. Conventional image recognition require handcrafted feature extraction before the application of machine learning. Today, deep learning is offering significant choices with the progression of artificial learning to defeat the complications of the handcrafted feature extraction methods. A deep learning-based approach for the recognition of melanoma via the Capsule network is proposed here. The novel approach is compared with a multi-layer perceptron and convolution network with the Capsule network model yielding the classification accuracy at 98.9%.


2018 ◽  
Vol 7 (3) ◽  
pp. 213-224
Author(s):  
Rafał Woźniak ◽  
Piotr Ożdżyński ◽  
Danuta Zakrzewska

The development of Internet resulted in an increasing number of online text re-positories. In many cases, documents are assigned to more than one class and automatic multi-label classification needs to be used. When the number of labels exceeds the number of the documents, effective label space dimension reduction may signifi-cantly improve classification accuracy, what is a major priority in the medical field. In the paper, we propose document clustering for label selection. We use semi-clustering method, by considering graph representation, where documents are represented by vertices and edge weights are calculated according to their mutual similarity. Assigning documents to semi-clusters helps in reducing number of labels, further used in multilabel classification process. The performance of the method is examined by experiments conducted on real medical datasets.


Sign in / Sign up

Export Citation Format

Share Document