scholarly journals Authorship Identification of a Russian-Language Text Using Support Vector Machine and Deep Neural Networks

2020 ◽  
Vol 13 (1) ◽  
pp. 3
Author(s):  
Aleksandr Romanov ◽  
Anna Kurtukova ◽  
Alexander Shelupanov ◽  
Anastasia Fedotova ◽  
Valery Goncharov

The article explores approaches to determining the author of a natural language text and the advantages and disadvantages of these approaches. The importance of the considered problem is due to the active digitalization of society and reassignment of most parts of the life activities online. Text authorship methods are particularly useful for information security and forensics. For example, such methods can be used to identify authors of suicide notes, and other texts are subjected to forensic examinations. Another area of application is plagiarism detection. Plagiarism detection is a relevant issue both for the field of intellectual property protection in the digital space and for the educational process. The article describes identifying the author of the Russian-language text using support vector machine (SVM) and deep neural network architectures (long short-term memory (LSTM), convolutional neural networks (CNN) with attention, Transformer). The results show that all the considered algorithms are suitable for solving the authorship identification problem, but SVM shows the best accuracy. The average accuracy of SVM reaches 96%. This is due to thoroughly chosen parameters and feature space, which includes statistical and semantic features (including those extracted as a result of an aspect analysis). Deep neural networks are inferior to SVM in accuracy and reach only 93%. The study also includes an evaluation of the impact of attacks on the method on models’ accuracy. Experiments show that the SVM-based methods are unstable to deliberate text anonymization. In comparison, the loss in accuracy of deep neural networks does not exceed 20%. Transformer architecture is the most effective for anonymized texts and allows 81% accuracy to be achieved.

Information ◽  
2020 ◽  
Vol 11 (12) ◽  
pp. 589 ◽  
Author(s):  
Aleksandr Sergeevich Romanov ◽  
Anna Vladimirovna Kurtukova ◽  
Artem Alexandrovich Sobolev ◽  
Alexander Alexandrovich Shelupanov ◽  
Anastasia Mikhailovna Fedotova

This paper is devoted to solving the problem of determining the age of the author of the text based on models of deep neural networks. The article presents an analysis of methods for determining the age of the author of a text and approaches to determining the age of a user by a photo. This could be a solution to the problem of inaccurate data for training by filtering out incorrect user-specified age data. A detailed description of the author’s technique based on deep neural network models and the interpretation of the results is also presented. The study found that the proposed technique achieved 82% accuracy in determining the age of the author from Russian-language text, which makes it competitive in comparison with approaches for other languages.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Lei Yang ◽  
Hongdong Zhao

Sound classification is a broad area of research that has gained much attention in recent years. The sound classification systems based on recurrent neural networks (RNNs) and convolutional neural networks (CNNs) have undergone significant enhancements in the recognition capability of models. However, their computational complexity and inadequate exploration of global dependencies for long sequences restrict improvements in their classification results. In this paper, we show that there are still opportunities to improve the performance of sound classification by substituting the recurrent architecture with the parallel processing structure in the feature extraction. In light of the small-scale and high-dimension sound datasets, we propose the use of the multihead attention and support vector machine (SVM) for sound taxonomy. The multihead attention is taken as the feature extractor to obtain salient features, and SVM is taken as the classifier to recognize all categories. Extensive experiments are conducted across three acoustically characterized public datasets, UrbanSound8K, GTZAN, and IEMOCAP, by using two commonly used audio spectrograms as inputs, respectively, and we fully evaluate the impact of parameters and feature types on classification accuracy. Our results suggest that the proposed model can reach comparable performance with existing methods and reveal its strong generalization ability of sound taxonomy.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 676
Author(s):  
Andrej Zgank

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.


SLEEP ◽  
2021 ◽  
Vol 44 (Supplement_2) ◽  
pp. A164-A164
Author(s):  
Pahnwat Taweesedt ◽  
JungYoon Kim ◽  
Jaehyun Park ◽  
Jangwoon Park ◽  
Munish Sharma ◽  
...  

Abstract Introduction Obstructive sleep apnea (OSA) is a common sleep-related breathing disorder with an estimation of one billion people. Full-night polysomnography is considered the gold standard for OSA diagnosis. However, it is time-consuming, expensive and is not readily available in many parts of the world. Many screening questionnaires and scores have been proposed for OSA prediction with high sensitivity and low specificity. The present study is intended to develop models with various machine learning techniques to predict the severity of OSA by incorporating features from multiple questionnaires. Methods Subjects who underwent full-night polysomnography in Torr sleep center, Texas and completed 5 OSA screening questionnaires/scores were included. OSA was diagnosed by using Apnea-Hypopnea Index ≥ 5. We trained five different machine learning models including Deep Neural Networks with the scaled principal component analysis (DNN-PCA), Random Forest (RF), Adaptive Boosting classifier (ABC), and K-Nearest Neighbors classifier (KNC) and Support Vector Machine Classifier (SVMC). Training:Testing subject ratio of 65:35 was used. All features including demographic data, body measurement, snoring and sleepiness history were obtained from 5 OSA screening questionnaires/scores (STOP-BANG questionnaires, Berlin questionnaires, NoSAS score, NAMES score and No-Apnea score). Performance parametrics were used to compare between machine learning models. Results Of 180 subjects, 51.5 % of subjects were male with mean (SD) age of 53.6 (15.1). One hundred and nineteen subjects were diagnosed with OSA. Area Under the Receiver Operating Characteristic Curve (AUROC) of DNN-PCA, RF, ABC, KNC, SVMC, STOP-BANG questionnaire, Berlin questionnaire, NoSAS score, NAMES score, and No-Apnea score were 0.85, 0.68, 0.52, 0.74, 0.75, 0.61, 0.63, 0,61, 0.58 and 0,58 respectively. DNN-PCA showed the highest AUROC with sensitivity of 0.79, specificity of 0.67, positive-predictivity of 0.93, F1 score of 0.86, and accuracy of 0.77. Conclusion Our result showed that DNN-PCA outperforms OSA screening questionnaires, scores and other machine learning models. Support (if any):


Author(s):  
Jia-Bin Zhou ◽  
Yan-Qin Bai ◽  
Yan-Ru Guo ◽  
Hai-Xiang Lin

AbstractIn general, data contain noises which come from faulty instruments, flawed measurements or faulty communication. Learning with data in the context of classification or regression is inevitably affected by noises in the data. In order to remove or greatly reduce the impact of noises, we introduce the ideas of fuzzy membership functions and the Laplacian twin support vector machine (Lap-TSVM). A formulation of the linear intuitionistic fuzzy Laplacian twin support vector machine (IFLap-TSVM) is presented. Moreover, we extend the linear IFLap-TSVM to the nonlinear case by kernel function. The proposed IFLap-TSVM resolves the negative impact of noises and outliers by using fuzzy membership functions and is a more accurate reasonable classifier by using the geometric distribution information of labeled data and unlabeled data based on manifold regularization. Experiments with constructed artificial datasets, several UCI benchmark datasets and MNIST dataset show that the IFLap-TSVM has better classification accuracy than other state-of-the-art twin support vector machine (TSVM), intuitionistic fuzzy twin support vector machine (IFTSVM) and Lap-TSVM.


2008 ◽  
Vol 381-382 ◽  
pp. 439-442
Author(s):  
Qi Wang ◽  
Zhi Gang Feng ◽  
K. Shida

Least squares support vector machine (LS-SVM) combined with niche genetic algorithm (NGA) are proposed for nonlinear sensor dynamic modeling. Compared with neural networks, the LS-SVM can overcome the shortcomings of local minima and over fitting, and has higher generalization performance. The sharing function based niche genetic algorithm is used to select the LS-SVM parameters automatically. The effectiveness and reliability of this method are demonstrated in two examples. The results show that this approach can escape from the blindness of man-made choice of LS-SVM parameters. It is still effective even if the sensor dynamic model is highly nonlinear.


2018 ◽  
Vol 184 (1) ◽  
pp. 36-43 ◽  
Author(s):  
Gal Amit ◽  
Hanan Datz

Abstract We present here for the first time a fast and reliable automatic algorithm based on artificial neural networks for the anomaly detection of a thermoluminescence dosemeter (TLD) glow curves (GCs), and compare its performance with formerly developed support vector machine method. The GC shape of TLD depends on numerous physical parameters, which may significantly affect it. When integrated into a dosimetry laboratory, this automatic algorithm can classify ‘anomalous’ (having any kind of anomaly) GCs for manual review, and ‘regular’ (acceptable) GCs for automatic analysis. The new algorithm performance is then compared with two kinds of formerly developed support vector machine classifiers—regular and weighted ones—using three different metrics. Results show an impressive accuracy rate of 97% for TLD GCs that are correctly classified to either of the classes.


Sign in / Sign up

Export Citation Format

Share Document