Classification of Music Mood Using MPEG-7 Audio Features and SVM with Confidence Interval

2018 ◽  
Vol 27 (05) ◽  
pp. 1850016 ◽  
Author(s):  
Riyanarto Sarno ◽  
Johanes Andre Ridoean ◽  
Dwi Sunaryono ◽  
Dedy Rahman Wijaya

Psychologically, music can affect human mood and influence human behavior. In this paper, a novel method for music mood classification is introduced. In the experiment, music mood classification was performed using feature extraction based on MPEG-7 features from the ISO/IEC 15938 standard for describing multimedia content. The result of this feature extraction are 17 low-level descriptors. Here, we used the Audio Power, Audio Harmonicity, and Audio Spectrum Projection features. Moreover, the discrete wavelet transform (DWT) was utilized for audio signal reconstruction. The reconstructed audio signals were classified by the new method, which uses a support vector machine with a confidence interval (SVM-CI). According to the experimental results, the success rate of the proposed method was satisfactory and SVM-CI outperformed the ordinary SVM.

2019 ◽  
Vol 9 (8) ◽  
pp. 201 ◽  
Author(s):  
Ji ◽  
Ma ◽  
Dong ◽  
Zhang

The classification recognition rate of motor imagery is a key factor to improve the performance of brain–computer interface (BCI). Thus, we propose a feature extraction method based on discrete wavelet transform (DWT), empirical mode decomposition (EMD), and approximate entropy. Firstly, the electroencephalogram (EEG) signal is decomposed into a series of narrow band signals with DWT, then the sub-band signal is decomposed with EMD to get a set of stationary time series, which are called intrinsic mode functions (IMFs). Secondly, the appropriate IMFs for signal reconstruction are selected. Thus, the approximate entropy of the reconstructed signal can be obtained as the corresponding feature vector. Finally, support vector machine (SVM) is used to perform the classification. The proposed method solves the problem of wide frequency band coverage during EMD and further improves the classification accuracy of EEG signal motion imaging,


Author(s):  
Nilava Mukherjee ◽  
Sumitra Mukhopadhyay ◽  
Rajarshi Gupta

Abstract Motivation: In recent times, mental stress detection using physiological signals have received widespread attention from the technology research community. Although many motivating research works have already been reported in this area, the evidence of hardware implementation is occasional. The main challenge in stress detection research is using optimum number of physiological signals, and real-time detection with low complexity algorithm. Objective: In this work, a real-time stress detection technique is presented which utilises only photoplethysmogram (PPG) signal to achieve improved accuracy over multi-signal-based mental stress detection techniques. Methodology: A short segment of 5s PPG signal was used for feature extraction using an autoencoder (AE), and features were minimized using recursive feature elimination (RFE) integrated with a multi-class support vector machine (SVM) classifier. Results: The proposed AE-RFE-SVM based mental stress detection technique was tested with WeSAD dataset to detect four-levels of mental state, viz., baseline, amusement, meditation and stress and to achieve an overall accuracy, F1 score and sensitivity of 99%, 0.99 and 98% respectively for 5s PPG data. The technique provided improved performance over discrete wavelet transformation (DWT) based feature extraction followed by classification with either of the five types of classifiers, viz., SVM, random forest (RF), k-nearest neighbour (k-NN), linear regression (LR) and decision tree (DT). The technique was translated into a quad-core-based standalone hardware (1.2 GHz, and 1 GB RAM). The resultant hardware prototype achieves a low latency (~0.4 s) and low memory requirement (~1.7 MB). Conclusion: The present technique can be extended to develop remote healthcare system using wearable sensors.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Hima Bindu Valiveti ◽  
Anil Kumar B. ◽  
Lakshmi Chaitanya Duggineni ◽  
Swetha Namburu ◽  
Swaraja Kuraparthi

Purpose Road accidents, an inadvertent mishap can be detected automatically and alerts sent instantly with the collaboration of image processing techniques and on-road video surveillance systems. However, to rely exclusively on visual information especially under adverse conditions like night times, dark areas and unfavourable weather conditions such as snowfall, rain, and fog which result in faint visibility lead to incertitude. The main goal of the proposed work is certainty of accident occurrence. Design/methodology/approach The authors of this work propose a method for detecting road accidents by analyzing audio signals to identify hazardous situations such as tire skidding and car crashes. The motive of this project is to build a simple and complete audio event detection system using signal feature extraction methods to improve its detection accuracy. The experimental analysis is carried out on a publicly available real time data-set consisting of audio samples like car crashes and tire skidding. The Temporal features of the recorded audio signal like Energy Volume Zero Crossing Rate 28ZCR2529 and the Spectral features like Spectral Centroid Spectral Spread Spectral Roll of factor Spectral Flux the Psychoacoustic features Energy Sub Bands ratio and Gammatonegram are computed. The extracted features are pre-processed and trained and tested using Support Vector Machine (SVM) and K-nearest neighborhood (KNN) classification algorithms for exact prediction of the accident occurrence for various SNR ranges. The combination of Gammatonegram with Temporal and Spectral features of the validates to be superior compared to the existing detection techniques. Findings Temporal, Spectral, Psychoacoustic features, gammetonegram of the recorded audio signal are extracted. A High level vector is generated based on centroid and the extracted features are classified with the help of machine learning algorithms like SVM, KNN and DT. The audio samples collected have varied SNR ranges and the accuracy of the classification algorithms is thoroughly tested. Practical implications Denoising of the audio samples for perfect feature extraction was a tedious chore. Originality/value The existing literature cites extraction of Temporal and Spectral features and then the application of classification algorithms. For perfect classification, the authors have chosen to construct a high level vector from all the four extracted Temporal, Spectral, Psycho acoustic and Gammetonegram features. The classification algorithms are employed on samples collected at varied SNR ranges.


2017 ◽  
Vol 2017 ◽  
pp. 1-8 ◽  
Author(s):  
Chunying Fang ◽  
Haifeng Li ◽  
Lin Ma ◽  
Mancai Zhang

Pathological speech usually refers to speech distortion resulting from illness or other biological insults. The assessment of pathological speech plays an important role in assisting the experts, while automatic evaluation of speech intelligibility is difficult because it is usually nonstationary and mutational. In this paper, we carry out an independent innovation of feature extraction and reduction, and we describe a multigranularity combined feature scheme which is optimized by the hierarchical visual method. A novel method of generating feature set based on S-transform and chaotic analysis is proposed. There are BAFS (430, basic acoustics feature), local spectral characteristics MSCC (84, Mel S-transform cepstrum coefficients), and chaotic features (12). Finally, radar chart and F-score are proposed to optimize the features by the hierarchical visual fusion. The feature set could be optimized from 526 to 96 dimensions based on NKI-CCRT corpus and 104 dimensions based on SVD corpus. The experimental results denote that new features by support vector machine (SVM) have the best performance, with a recognition rate of 84.4% on NKI-CCRT corpus and 78.7% on SVD corpus. The proposed method is thus approved to be effective and reliable for pathological speech intelligibility evaluation.


2003 ◽  
Vol 1840 (1) ◽  
pp. 186-192 ◽  
Author(s):  
Lori Mann Bruce ◽  
Navaneethakrishnan Balraj ◽  
Yunlong Zhang ◽  
Qingyong Yu

A system for automated traffic accident detection in intersections was designed. The input to the system is a 3-s segment of audio signal. The system can be operated in two modes: the two-class and multiclass modes. The output of the two-class mode is a label of “crash” or “noncrash.” In the multiclass mode of operation, the system identifies crashes as well as several types of noncrash incidents, including normal traffic and construction sounds. The system is composed of three main signal processing stages: feature extraction, feature reduction, and classification. Five methods of feature extraction were investigated and compared; these are based on the discrete wavelet transform, fast Fourier transform, discrete cosine transform, real cepstral transform, and mel frequency cepstral transform. Statistical methods are used for feature optimization and classification. Three types of classifiers are investigated and compared; these are the nearest-mean, maximum-likelihood, and nearest-neighbor methods. The results of the study show that the optimum design uses wavelet-based features in combination with the maximum-likelihood classifier. The system is computationally inexpensive relative to the other methods investigated, and the system consistently results in accident detection accuracies of 95% to 100% when the audio signal has a signal-to-noise-ratio of at least 0 decibels.


2017 ◽  
Vol 10 (2) ◽  
pp. 400-406 ◽  
Author(s):  
Aziz Makandar ◽  
Anita Patrot

Malware is a malicious instructions which may harm to the unauthorized private access through internet. The types of malware are incresing day to day life, it is a challenging task for the antivius vendors to predict and caught on access time. This paper aims to design an automated analysis system for malware classes based on the features extracted by Discrete Wavelet Transformation (DWT) and then by applying four level decomposition of malware. The proposed system works in three stages, pre-processing, feature extraction and classification. In preprocessing, input image is normalized in to 256x256 by applying wavelet we are denoising the image which helps to enhance the image. In feature extraction, DWT is used to decompose image into four level. For classification the support vector machine (SVM) classifiers are used to discriminate the malware classes with statistical features extracted from level 4 decomposition of DWT such as Daubechies (db4), Coiflet (coif5) and Bi-orthogonal (bior 2.8). Among these wavelet features the db4 features effectively classify the malware class type with high accuracy 91.05% and 92.53% respectively on both dataset. The analysis of proposed method conducted on two dataset and the results are promising.


Sensors ◽  
2020 ◽  
Vol 20 (8) ◽  
pp. 2403
Author(s):  
Jakub Browarczyk ◽  
Adam Kurowski ◽  
Bozena Kostek

The aim of the study is to compare electroencephalographic (EEG) signal feature extraction methods in the context of the effectiveness of the classification of brain activities. For classification, electroencephalographic signals were obtained using an EEG device from 17 subjects in three mental states (relaxation, excitation, and solving logical task). Blind source separation employing independent component analysis (ICA) was performed on obtained signals. Welch’s method, autoregressive modeling, and discrete wavelet transform were used for feature extraction. Principal component analysis (PCA) was performed in order to reduce the dimensionality of feature vectors. k-Nearest Neighbors (kNN), Support Vector Machines (SVM), and Neural Networks (NN) were employed for classification. Precision, recall, F1 score, as well as a discussion based on statistical analysis, were shown. The paper also contains code utilized in preprocessing and the main part of experiments.


Sensors ◽  
2018 ◽  
Vol 18 (10) ◽  
pp. 3418 ◽  
Author(s):  
Juan Vera-Diaz ◽  
Daniel Pizarro ◽  
Javier Macias-Guarasa

This paper presents a novel approach for indoor acoustic source localization using microphone arrays, based on a Convolutional Neural Network (CNN). In the proposed solution, the CNN is designed to directly estimate the three-dimensional position of a single acoustic source using the raw audio signal as the input information and avoiding the use of hand-crafted audio features. Given the limited amount of available localization data, we propose, in this paper, a training strategy based on two steps. We first train our network using semi-synthetic data generated from close talk speech recordings. We simulate the time delays and distortion suffered in the signal that propagate from the source to the array of microphones. We then fine tune this network using a small amount of real data. Our experimental results, evaluated on a publicly available dataset recorded in a real room, show that this approach is able to produce networks that significantly improve existing localization methods based on SRP-PHAT strategies and also those presented in very recent proposals based on Convolutional Recurrent Neural Networks (CRNN). In addition, our experiments show that the performance of our CNN method does not show a relevant dependency on the speaker’s gender, nor on the size of the signal window being used.


2020 ◽  
Vol 9 (6) ◽  
pp. 2404-2410
Author(s):  
Khairul Anam ◽  
Cries Avian ◽  
Muhammad Nuh

Brain computer interface (BCI) technology connects humans with machines via electroencephalography (EEG). The mechanism of BCI is pattern recognition, which proceeds by feature extraction and classification. Various feature extraction and classification methods can differentiate human motor movements, especially those of the hand. Combinations of these methods can greatly improve the accuracy of the results. This article explores the performances of nine feature-extraction types computed by a multilayer extreme learning machine (ML-ELM). The proposed method was tested on different numbers of EEG channels and different ML-ELM structures. Moreover, the performance of ML-ELM was compared with those of ELM, Support Vector Machine and Naive Bayes in classifying real and imaginary hand movements in offline mode. The ML-ELM with discrete wavelet transform (DWT) as feature extraction outperformed the other classification methods with highest accuracy 0.98. So, the authors also found that the structures influenced the accuracy of ML-ELM for different task, feature extraction used and channel used.


Sign in / Sign up

Export Citation Format

Share Document