scholarly journals Topic2features: a novel framework to classify noisy and sparse textual data using LDA topic distributions

2021 ◽  
Vol 7 ◽  
pp. e677
Author(s):  
Junaid Abdul Wahid ◽  
Lei Shi ◽  
Yufei Gao ◽  
Bei Yang ◽  
Yongcai Tao ◽  
...  

In supervised machine learning, specifically in classification tasks, selecting and analyzing the feature vector to achieve better results is one of the most important tasks. Traditional methods such as comparing the features’ cosine similarity and exploring the datasets manually to check which feature vector is suitable is relatively time consuming. Many classification tasks failed to achieve better classification results because of poor feature vector selection and sparseness of data. In this paper, we proposed a novel framework, topic2features (T2F), to deal with short and sparse data using the topic distributions of hidden topics gathered from dataset and converting into feature vectors to build supervised classifier. For this we leveraged the unsupervised topic modelling LDA (latent dirichlet allocation) approach to retrieve the topic distributions employed in supervised learning algorithms. We made use of labelled data and topic distributions of hidden topics that were generated from that data. We explored how the representation based on topics affect the classification performance by applying supervised classification algorithms. Additionally, we did careful evaluation on two types of datasets and compared them with baseline approaches without topic distributions and other comparable methods. The results show that our framework performs significantly better in terms of classification performance compared to the baseline(without T2F) approaches and also yields improvement in terms of F1 score compared to other compared approaches.

Sensors ◽  
2021 ◽  
Vol 21 (23) ◽  
pp. 8070
Author(s):  
Navya Nagananda ◽  
Abu Md Niamul Taufique ◽  
Raaga Madappa ◽  
Chowdhury Sadman Jahan ◽  
Breton Minnehan ◽  
...  

Deep learning grew in importance in recent years due to its versatility and excellent performance on supervised classification tasks. A core assumption for such supervised approaches is that the training and testing data are drawn from the same underlying data distribution. This may not always be the case, and in such cases, the performance of the model is degraded. Domain adaptation aims to overcome the domain shift between the source domain used for training and the target domain data used for testing. Unsupervised domain adaptation deals with situations where the network is trained on labeled data from the source domain and unlabeled data from the target domain with the goal of performing well on the target domain data at the time of deployment. In this study, we overview seven state-of-the-art unsupervised domain adaptation models based on deep learning and benchmark their performance on three new domain adaptation datasets created from publicly available aerial datasets. We believe this is the first study on benchmarking domain adaptation methods for aerial data. In addition to reporting classification performance for the different domain adaptation models, we present t-SNE visualizations that illustrate the benefits of the adaptation process.


Author(s):  
Hannah Garcia Doherty ◽  
Roberto Arnaiz Burgueño ◽  
Roeland P. Trommel ◽  
Vasileios Papanastasiou ◽  
Ronny I. A. Harmanny

Abstract Identification of human individuals within a group of 39 persons using micro-Doppler (μ-D) features has been investigated. Deep convolutional neural networks with two different training procedures have been used to perform classification. Visualization of the inner network layers revealed the sections of the input image most relevant when determining the class label of the target. A convolutional block attention module is added to provide a weighted feature vector in the channel and feature dimension, highlighting the relevant μ-D feature-filled areas in the image and improving classification performance.


2013 ◽  
Vol 427-429 ◽  
pp. 2309-2312
Author(s):  
Hai Bin Mei ◽  
Ming Hua Zhang

Alert classifiers built with the supervised classification technique require large amounts of labeled training alerts. Preparing for such training data is very difficult and expensive. Thus accuracy and feasibility of current classifiers are greatly restricted. This paper employs semi-supervised learning to build alert classification model to reduce the number of needed labeled training alerts. Alert context properties are also introduced to improve the classification performance. Experiments have demonstrated the accuracy and feasibility of our approach.


2020 ◽  
Vol 34 (04) ◽  
pp. 5444-5453
Author(s):  
Edward Raff ◽  
Charles Nicholas ◽  
Mark McLean

Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.


2021 ◽  
Vol 9 ◽  
Author(s):  
Vivek Dixit ◽  
Raja Selvarajan ◽  
Muhammad A. Alam ◽  
Travis S. Humble ◽  
Sabre Kais

Restricted Boltzmann Machine (RBM) is an energy-based, undirected graphical model. It is commonly used for unsupervised and supervised machine learning. Typically, RBM is trained using contrastive divergence (CD). However, training with CD is slow and does not estimate the exact gradient of the log-likelihood cost function. In this work, the model expectation of gradient learning for RBM has been calculated using a quantum annealer (D-Wave 2000Q), where obtaining samples is faster than Markov chain Monte Carlo (MCMC) used in CD. Training and classification results of RBM trained using quantum annealing are compared with the CD-based method. The performance of the two approaches is compared with respect to the classification accuracies, image reconstruction, and log-likelihood results. The classification accuracy results indicate comparable performances of the two methods. Image reconstruction and log-likelihood results show improved performance of the CD-based method. It is shown that the samples obtained from quantum annealer can be used to train an RBM on a 64-bit “bars and stripes” dataset with classification performance similar to an RBM trained with CD. Though training based on CD showed improved learning performance, training using a quantum annealer could be useful as it eliminates computationally expensive MCMC steps of CD.


2021 ◽  
Vol 5 (1) ◽  
pp. 21
Author(s):  
Edgar G. Mendez-Lopez ◽  
Jersson X. Leon-Medina ◽  
Diego A. Tibaduiza

Electronic tongue type sensor arrays are made of different materials with the property of capturing signals independently by each sensor. The signals captured when conducting electrochemical tests often have high dimensionality, which increases when performing the data unfolding process. This unfolding process consists of arranging the data coming from different experiments, sensors, and sample times, thus the obtained information is arranged in a two-dimensional matrix. In this work, a description of a tool for the analysis of electronic tongue signals is developed. This tool is developed in Matlab® App Designer, to process and classify the data from different substances analyzed by an electronic tongue type sensor array. The data processing is carried out through the execution of the following stages: (1) data unfolding, (2) normalization, (3) dimensionality reduction, (4) classification through a supervised machine learning model, and finally (5) a cross-validation procedure to calculate a set of classification performance measures. Some important characteristics of this tool are the possibility to tune the parameters of the dimensionality reduction and classifier algorithms, and also plot the two and three-dimensional scatter plot of the features after reduced the dimensionality. This to see the data separability between classes and compatibility in each class. This interface is successfully tested with two electronic tongue sensor array datasets with multi-frequency large amplitude pulse voltammetry (MLAPV) signals. The developed graphical user interface allows comparing different methods in each of the mentioned stages to find the best combination of methods and thus obtain the highest values of classification performance measures.


2021 ◽  
Vol 2021 ◽  
pp. 1-6
Author(s):  
Qiang Cai ◽  
Fenghai Li ◽  
Yifan Chen ◽  
Haisheng Li ◽  
Jian Cao ◽  
...  

Along with the strong representation of the convolutional neural network (CNN), image classification tasks have achieved considerable progress. However, majority of works focus on designing complicated and redundant architectures for extracting informative features to improve classification performance. In this study, we concentrate on rectifying the incomplete outputs of CNN. To be concrete, we propose an innovative image classification method based on Label Rectification Learning (LRL) through kernel extreme learning machine (KELM). It mainly consists of two steps: (1) preclassification, extracting incomplete labels through a pretrained CNN, and (2) label rectification, rectifying the generated incomplete labels by the KELM to obtain the rectified labels. Experiments conducted on publicly available datasets demonstrate the effectiveness of our method. Notably, our method is extensible which can be easily integrated with off-the-shelf networks for improving performance.


Sign in / Sign up

Export Citation Format

Share Document