Topic2features: a novel framework to classify noisy and sparse textual data using LDA topic distributions

PeerJ Computer Science ◽

10.7717/peerj-cs.677 ◽

2021 ◽

Vol 7 ◽

pp. e677

Author(s):

Junaid Abdul Wahid ◽

Lei Shi ◽

Yufei Gao ◽

Bei Yang ◽

Yongcai Tao ◽

...

Keyword(s):

Supervised Classification ◽

Latent Dirichlet Allocation ◽

Feature Vector ◽

Classification Performance ◽

Supervised Machine Learning ◽

Careful Evaluation ◽

Textual Data ◽

Supervised Learning Algorithms ◽

Classification Tasks ◽

Allocation Approach

In supervised machine learning, specifically in classification tasks, selecting and analyzing the feature vector to achieve better results is one of the most important tasks. Traditional methods such as comparing the features’ cosine similarity and exploring the datasets manually to check which feature vector is suitable is relatively time consuming. Many classification tasks failed to achieve better classification results because of poor feature vector selection and sparseness of data. In this paper, we proposed a novel framework, topic2features (T2F), to deal with short and sparse data using the topic distributions of hidden topics gathered from dataset and converting into feature vectors to build supervised classifier. For this we leveraged the unsupervised topic modelling LDA (latent dirichlet allocation) approach to retrieve the topic distributions employed in supervised learning algorithms. We made use of labelled data and topic distributions of hidden topics that were generated from that data. We explored how the representation based on topics affect the classification performance by applying supervised classification algorithms. Additionally, we did careful evaluation on two types of datasets and compared them with baseline approaches without topic distributions and other comparable methods. The results show that our framework performs significantly better in terms of classification performance compared to the baseline(without T2F) approaches and also yields improvement in terms of F1 score compared to other compared approaches.

Benchmarking Domain Adaptation Methods on Aerial Datasets

Sensors ◽

10.3390/s21238070 ◽

2021 ◽

Vol 21 (23) ◽

pp. 8070

Author(s):

Navya Nagananda ◽

Abu Md Niamul Taufique ◽

Raaga Madappa ◽

Chowdhury Sadman Jahan ◽

Breton Minnehan ◽

...

Keyword(s):

Deep Learning ◽

Supervised Classification ◽

Domain Adaptation ◽

State Of The Art ◽

Classification Performance ◽

Target Domain ◽

Source Domain ◽

Unsupervised Domain Adaptation ◽

Testing Data ◽

Classification Tasks

Deep learning grew in importance in recent years due to its versatility and excellent performance on supervised classification tasks. A core assumption for such supervised approaches is that the training and testing data are drawn from the same underlying data distribution. This may not always be the case, and in such cases, the performance of the model is degraded. Domain adaptation aims to overcome the domain shift between the source domain used for training and the target domain data used for testing. Unsupervised domain adaptation deals with situations where the network is trained on labeled data from the source domain and unlabeled data from the target domain with the goal of performing well on the target domain data at the time of deployment. In this study, we overview seven state-of-the-art unsupervised domain adaptation models based on deep learning and benchmark their performance on three new domain adaptation datasets created from publicly available aerial datasets. We believe this is the first study on benchmarking domain adaptation methods for aerial data. In addition to reporting classification performance for the different domain adaptation models, we present t-SNE visualizations that illustrate the benefits of the adaptation process.

Source allocation of per- and polyfluoroalkyl substances (PFAS) with supervised machine learning: Classification performance and the role of feature selection in an expanded dataset

Chemosphere ◽

10.1016/j.chemosphere.2021.130124 ◽

2021 ◽

Vol 275 ◽

pp. 130124

Author(s):

Tohren C.G. Kibbey ◽

Rafal Jabrzemski ◽

Denis M. O’Carroll

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Classification Performance ◽

Supervised Machine Learning ◽

Machine Learning Classification ◽

Polyfluoroalkyl Substances ◽

Source Allocation

Attention-based deep learning networks for identification of human gait using radar micro-Doppler spectrograms

International Journal of Microwave and Wireless Technologies ◽

10.1017/s1759078721000830 ◽

2021 ◽

pp. 1-6

Author(s):

Hannah Garcia Doherty ◽

Roberto Arnaiz Burgueño ◽

Roeland P. Trommel ◽

Vasileios Papanastasiou ◽

Ronny I. A. Harmanny

Keyword(s):

Neural Networks ◽

Feature Vector ◽

Classification Performance ◽

Input Image ◽

Human Gait ◽

Learning Networks ◽

Class Label ◽

Deep Convolutional Neural Networks ◽

Network Layers ◽

Feature Dimension

Abstract Identification of human individuals within a group of 39 persons using micro-Doppler (μ-D) features has been investigated. Deep convolutional neural networks with two different training procedures have been used to perform classification. Visualization of the inner network layers revealed the sections of the input image most relevant when determining the class label of the target. A convolutional block attention module is added to provide a weighted feature vector in the channel and feature dimension, highlighting the relevant μ-D feature-filled areas in the image and improving classification performance.

Semi-Supervised Classification and its Application to Filtering IDS False Positives

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2309 ◽

2013 ◽

Vol 427-429 ◽

pp. 2309-2312

Author(s):

Hai Bin Mei ◽

Ming Hua Zhang

Keyword(s):

Supervised Learning ◽

Supervised Classification ◽

Classification Performance ◽

False Positives ◽

Training Data ◽

Classification Model ◽

Classification Technique

Alert classifiers built with the supervised classification technique require large amounts of labeled training alerts. Preparing for such training data is very difficult and expensive. Thus accuracy and feasibility of current classifiers are greatly restricted. This paper employs semi-supervised learning to build alert classification model to reduce the number of needed labeled training alerts. Alert context properties are also introduced to improve the classification performance. Experiments have demonstrated the accuracy and feasibility of our approach.

A New Burrows Wheeler Transform Markov Distance

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5994 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5444-5453

Author(s):

Edward Raff ◽

Charles Nicholas ◽

Mark McLean

Keyword(s):

Dna Sequence ◽

Distance Measure ◽

Feature Vector ◽

Distance Metrics ◽

Prior Work ◽

Compression Algorithms ◽

Fixed Length ◽

Malware Classification ◽

Classification Tasks ◽

Burrows Wheeler Transform

Prior work inspired by compression algorithms has described how the Burrows Wheeler Transform can be used to create a distance measure for bioinformatics problems. We describe issues with this approach that were not widely known, and introduce our new Burrows Wheeler Markov Distance (BWMD) as an alternative. The BWMD avoids the shortcomings of earlier efforts, and allows us to tackle problems in variable length DNA sequence clustering. BWMD is also more adaptable to other domains, which we demonstrate on malware classification tasks. Unlike other compression-based distance metrics known to us, BWMD works by embedding sequences into a fixed-length feature vector. This allows us to provide significantly improved clustering performance on larger malware corpora, a weakness of prior methods.

Training Restricted Boltzmann Machines With a D-Wave Quantum Annealer

Frontiers in Physics ◽

10.3389/fphy.2021.589626 ◽

2021 ◽

Vol 9 ◽

Author(s):

Vivek Dixit ◽

Raja Selvarajan ◽

Muhammad A. Alam ◽

Travis S. Humble ◽

Sabre Kais

Keyword(s):

Image Reconstruction ◽

Graphical Model ◽

Classification Performance ◽

Supervised Machine Learning ◽

Learning Performance ◽

Restricted Boltzmann Machines ◽

Log Likelihood ◽

Improved Performance ◽

Gradient Learning ◽

D Wave

Restricted Boltzmann Machine (RBM) is an energy-based, undirected graphical model. It is commonly used for unsupervised and supervised machine learning. Typically, RBM is trained using contrastive divergence (CD). However, training with CD is slow and does not estimate the exact gradient of the log-likelihood cost function. In this work, the model expectation of gradient learning for RBM has been calculated using a quantum annealer (D-Wave 2000Q), where obtaining samples is faster than Markov chain Monte Carlo (MCMC) used in CD. Training and classification results of RBM trained using quantum annealing are compared with the CD-based method. The performance of the two approaches is compared with respect to the classification accuracies, image reconstruction, and log-likelihood results. The classification accuracy results indicate comparable performances of the two methods. Image reconstruction and log-likelihood results show improved performance of the CD-based method. It is shown that the samples obtained from quantum annealer can be used to train an RBM on a 64-bit “bars and stripes” dataset with classification performance similar to an RBM trained with CD. Though training based on CD showed improved learning performance, training using a quantum annealer could be useful as it eliminates computationally expensive MCMC steps of CD.

Development of a Pattern Recognition Tool for the Classification of Electronic Tongue Signals Using Machine Learning

Chemistry Proceedings ◽

10.3390/csac2021-10447 ◽

2021 ◽

Vol 5 (1) ◽

pp. 21

Author(s):

Edgar G. Mendez-Lopez ◽

Jersson X. Leon-Medina ◽

Diego A. Tibaduiza

Keyword(s):

Machine Learning ◽

Dimensionality Reduction ◽

Performance Measures ◽

Sensor Array ◽

Three Dimensional ◽

Electronic Tongue ◽

Sensor Arrays ◽

Classification Performance ◽

Supervised Machine Learning ◽

Electrochemical Tests

Electronic tongue type sensor arrays are made of different materials with the property of capturing signals independently by each sensor. The signals captured when conducting electrochemical tests often have high dimensionality, which increases when performing the data unfolding process. This unfolding process consists of arranging the data coming from different experiments, sensors, and sample times, thus the obtained information is arranged in a two-dimensional matrix. In this work, a description of a tool for the analysis of electronic tongue signals is developed. This tool is developed in Matlab® App Designer, to process and classify the data from different substances analyzed by an electronic tongue type sensor array. The data processing is carried out through the execution of the following stages: (1) data unfolding, (2) normalization, (3) dimensionality reduction, (4) classification through a supervised machine learning model, and finally (5) a cross-validation procedure to calculate a set of classification performance measures. Some important characteristics of this tool are the possibility to tune the parameters of the dimensionality reduction and classifier algorithms, and also plot the two and three-dimensional scatter plot of the features after reduced the dimensionality. This to see the data separability between classes and compatibility in each class. This interface is successfully tested with two electronic tongue sensor array datasets with multi-frequency large amplitude pulse voltammetry (MLAPV) signals. The developed graphical user interface allows comparing different methods in each of the mentioned stages to find the best combination of methods and thus obtain the highest values of classification performance measures.

Label Rectification Learning through Kernel Extreme Learning Machine

Wireless Communications and Mobile Computing ◽

10.1155/2021/6669081 ◽

2021 ◽

Vol 2021 ◽

pp. 1-6

Author(s):

Qiang Cai ◽

Fenghai Li ◽

Yifan Chen ◽

Haisheng Li ◽

Jian Cao ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Image Classification ◽

Extreme Learning Machine ◽

Classification Performance ◽

Considerable Progress ◽

Strong Representation ◽

Kernel Extreme Learning Machine ◽

Classification Tasks ◽

Learning Machine

Along with the strong representation of the convolutional neural network (CNN), image classification tasks have achieved considerable progress. However, majority of works focus on designing complicated and redundant architectures for extracting informative features to improve classification performance. In this study, we concentrate on rectifying the incomplete outputs of CNN. To be concrete, we propose an innovative image classification method based on Label Rectification Learning (LRL) through kernel extreme learning machine (KELM). It mainly consists of two steps: (1) preclassification, extracting incomplete labels through a pretrained CNN, and (2) label rectification, rectifying the generated incomplete labels by the KELM to obtain the rectified labels. Experiments conducted on publicly available datasets demonstrate the effectiveness of our method. Notably, our method is extensible which can be easily integrated with off-the-shelf networks for improving performance.

Sentiment Analysis in Social Networks: A Methodology Based on the Latent Dirichlet Allocation Approach

Proceedings of the 2019 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and Technology (EUSFLAT 2019) ◽

10.2991/eusflat-19.2019.36 ◽

2019 ◽

Author(s):

Domenico Santaniello ◽

Francesco Colace ◽

Marco Lombardi ◽

Francesco Pascale ◽

Fabio Clarizia

Keyword(s):

Social Networks ◽

Sentiment Analysis ◽

Latent Dirichlet Allocation ◽

Allocation Approach ◽

Dirichlet Allocation

A Framework for Supervised Classification Performance Analysis with Information-Theoretic Methods

IEEE Transactions on Knowledge and Data Engineering ◽

10.1109/tkde.2019.2915643 ◽

2020 ◽

Vol 32 (11) ◽

pp. 2075-2087

Author(s):

Francisco J. Valverde-Albacete ◽

Carmen Pelaez-Moreno

Keyword(s):

Performance Analysis ◽

Supervised Classification ◽

Classification Performance ◽

Information Theoretic ◽

Information Theoretic Methods