Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks

The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models’ accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved.

Download Full-text

Determining the Age of the Author of the Text Based on Deep Neural Network Models

Information ◽

10.3390/info11120589 ◽

2020 ◽

Vol 11 (12) ◽

pp. 589 ◽

Cited By ~ 1

Author(s):

Aleksandr Sergeevich Romanov ◽

Anna Vladimirovna Kurtukova ◽

Artem Alexandrovich Sobolev ◽

Alexander Alexandrovich Shelupanov ◽

Anastasia Mikhailovna Fedotova

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Network Models ◽

Russian Language ◽

Neural Network Models ◽

Inaccurate Data ◽

Language Text

This paper is devoted to solving the problem of determining the age of the author of the text based on models of deep neural networks. The article presents an analysis of methods for determining the age of the author of a text and approaches to determining the age of a user by a photo. This could be a solution to the problem of inaccurate data for training by filtering out incorrect user-specified age data. A detailed description of the author’s technique based on deep neural network models and the interpretation of the results is also presented. The study found that the proposed technique achieved 82% accuracy in determining the age of the author from Russian-language text, which makes it competitive in comparison with approaches for other languages.

Download Full-text

Sound Classification Based on Multihead Attention and Support Vector Machine

Mathematical Problems in Engineering ◽

10.1155/2021/9937383 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Lei Yang ◽

Hongdong Zhao

Keyword(s):

Neural Networks ◽

Support Vector Machine ◽

Classification Systems ◽

Support Vector ◽

Small Scale ◽

Sound Classification ◽

Comparable Performance ◽

Public Datasets ◽

Recurrent Architecture ◽

The Impact

Sound classification is a broad area of research that has gained much attention in recent years. The sound classification systems based on recurrent neural networks (RNNs) and convolutional neural networks (CNNs) have undergone significant enhancements in the recognition capability of models. However, their computational complexity and inadequate exploration of global dependencies for long sequences restrict improvements in their classification results. In this paper, we show that there are still opportunities to improve the performance of sound classification by substituting the recurrent architecture with the parallel processing structure in the feature extraction. In light of the small-scale and high-dimension sound datasets, we propose the use of the multihead attention and support vector machine (SVM) for sound taxonomy. The multihead attention is taken as the feature extractor to obtain salient features, and SVM is taken as the classifier to recognize all categories. Extensive experiments are conducted across three acoustically characterized public datasets, UrbanSound8K, GTZAN, and IEMOCAP, by using two commonly used audio spectrograms as inputs, respectively, and we fully evaluate the impact of parameters and feature types on classification accuracy. Our results suggest that the proposed model can reach comparable performance with existing methods and reveal its strong generalization ability of sound taxonomy.

Download Full-text

IoT-Based Bee Swarm Activity Acoustic Classification Using Deep Neural Networks

Sensors ◽

10.3390/s21030676 ◽

2021 ◽

Vol 21 (3) ◽

pp. 676

Author(s):

Andrej Zgank

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Markov Models ◽

Audio Signal ◽

Audio Signals ◽

Mel Frequency Cepstral Coefficients ◽

Animal Activity ◽

The Impact ◽

Acoustic Classification ◽

Swarm Activity

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.

Download Full-text

414 Deep Neural Networks: A Survey Tool for Obstructive Sleep Apnea Prediction

SLEEP ◽

10.1093/sleep/zsab072.413 ◽

2021 ◽

Vol 44 (Supplement_2) ◽

pp. A164-A164

Author(s):

Pahnwat Taweesedt ◽

JungYoon Kim ◽

Jaehyun Park ◽

Jangwoon Park ◽

Munish Sharma ◽

...

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Obstructive Sleep Apnea ◽

Sleep Apnea ◽

Deep Neural Networks ◽

Support Vector ◽

Learning Models ◽

Obstructive Sleep ◽

Screening Questionnaires ◽

Machine Learning Models

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):

Download Full-text

Intuitionistic Fuzzy Laplacian Twin Support Vector Machine for Semi-supervised Classification

Journal of the Operations Research Society of China ◽

10.1007/s40305-021-00354-9 ◽

2021 ◽

Author(s):

Jia-Bin Zhou ◽

Yan-Qin Bai ◽

Yan-Ru Guo ◽

Hai-Xiang Lin

Keyword(s):

Support Vector Machine ◽

Negative Impact ◽

Twin Support Vector Machine ◽

Fuzzy Membership ◽

Support Vector ◽

Membership Functions ◽

Fuzzy Membership Functions ◽

Intuitionistic Fuzzy ◽

Benchmark Datasets ◽

The Impact

AbstractIn general, data contain noises which come from faulty instruments, flawed measurements or faulty communication. Learning with data in the context of classification or regression is inevitably affected by noises in the data. In order to remove or greatly reduce the impact of noises, we introduce the ideas of fuzzy membership functions and the Laplacian twin support vector machine (Lap-TSVM). A formulation of the linear intuitionistic fuzzy Laplacian twin support vector machine (IFLap-TSVM) is presented. Moreover, we extend the linear IFLap-TSVM to the nonlinear case by kernel function. The proposed IFLap-TSVM resolves the negative impact of noises and outliers by using fuzzy membership functions and is a more accurate reasonable classifier by using the geometric distribution information of labeled data and unlabeled data based on manifold regularization. Experiments with constructed artificial datasets, several UCI benchmark datasets and MNIST dataset show that the IFLap-TSVM has better classification accuracy than other state-of-the-art twin support vector machine (TSVM), intuitionistic fuzzy twin support vector machine (IFTSVM) and Lap-TSVM.

Download Full-text

Sensor Dynamic Modeling Based on LS-SVM and NGA

Key Engineering Materials ◽

10.4028/www.scientific.net/kem.381-382.439 ◽

2008 ◽

Vol 381-382 ◽

pp. 439-442

Author(s):

Qi Wang ◽

Zhi Gang Feng ◽

K. Shida

Keyword(s):

Genetic Algorithm ◽

Neural Networks ◽

Support Vector Machine ◽

Least Squares ◽

Dynamic Model ◽

Dynamic Modeling ◽

Support Vector ◽

Local Minima ◽

Highly Nonlinear ◽

Sharing Function

Least squares support vector machine (LS-SVM) combined with niche genetic algorithm (NGA) are proposed for nonlinear sensor dynamic modeling. Compared with neural networks, the LS-SVM can overcome the shortcomings of local minima and over fitting, and has higher generalization performance. The sharing function based niche genetic algorithm is used to select the LS-SVM parameters automatically. The effectiveness and reliability of this method are demonstrated in two examples. The results show that this approach can escape from the blindness of man-made choice of LS-SVM parameters. It is still effective even if the sensor dynamic model is highly nonlinear.

Download Full-text

QSAR Study on the Toxicity of Phenols for Fathead Minnows by Using Support Vector Machine and Neural Networks

2008 Fourth International Conference on Natural Computation ◽

10.1109/icnc.2008.931 ◽

2008 ◽

Cited By ~ 1

Author(s):

Xiujun Cui ◽

Zhinxin Wang ◽

Zhuoyong Zhang ◽

Xing Yuan ◽

Peter de B. Harrington

Keyword(s):

Neural Networks ◽

Support Vector Machine ◽

Support Vector ◽

Fathead Minnows ◽

Qsar Study

Download Full-text

Statistical neural networks and support vector machine for the classification of genetic mutations in ovarian cancer

IGARSS 2004 2004 IEEE International Geoscience and Remote Sensing (IEEE Cat No 04CH37612) CIBCB-04 ◽

10.1109/cibcb.2004.1393946 ◽

2005 ◽

Cited By ~ 3

Author(s):

M.S.B. Sehgal ◽

I. Gondal ◽

L. Dooley

Keyword(s):

Neural Networks ◽

Ovarian Cancer ◽

Support Vector Machine ◽

Support Vector ◽

Genetic Mutations ◽

Statistical Neural Networks

Download Full-text

Intelligent Prediction of Differential Pipe Sticking by Support Vector Machine Compared With Conventional Artificial Neural Networks: An Example of Iranian Offshore Oil Fields

SPE Drilling & Completion ◽

10.2118/163062-pa ◽

2012 ◽

Vol 27 (04) ◽

pp. 586-595 ◽

Cited By ~ 18

Author(s):

Reza Jahanbakhshi ◽

Reza Keshavarzi ◽

Mahdi Aliyari Shoorehdeli ◽

Abolqasem Emamzadeh

Keyword(s):

Neural Networks ◽

Support Vector Machine ◽

Artificial Neural Networks ◽

Oil Fields ◽

Support Vector ◽

Artificial Neural ◽

Offshore Oil

Download Full-text

IMPROVEMENT OF DOSE ESTIMATION PROCESS USING ARTIFICIAL NEURAL NETWORKS

Radiation Protection Dosimetry ◽

10.1093/rpd/ncy185 ◽

2018 ◽

Vol 184 (1) ◽

pp. 36-43 ◽

Cited By ~ 3

Author(s):

Gal Amit ◽

Hanan Datz

Keyword(s):

Neural Networks ◽

Support Vector Machine ◽

Artificial Neural Networks ◽

Support Vector ◽

Physical Parameters ◽

Machine Method ◽

Algorithm Performance ◽

Artificial Neural ◽

First Time ◽

Automatic Algorithm

Abstract We present here for the first time a fast and reliable automatic algorithm based on artificial neural networks for the anomaly detection of a thermoluminescence dosemeter (TLD) glow curves (GCs), and compare its performance with formerly developed support vector machine method. The GC shape of TLD depends on numerous physical parameters, which may significantly affect it. When integrated into a dosimetry laboratory, this automatic algorithm can classify ‘anomalous’ (having any kind of anomaly) GCs for manual review, and ‘regular’ (acceptable) GCs for automatic analysis. The new algorithm performance is then compared with two kinds of formerly developed support vector machine classifiers—regular and weighted ones—using three different metrics. Results show an impressive accuracy rate of 97% for TLD GCs that are correctly classified to either of the classes.

Download Full-text