scholarly journals Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

2019 ◽  
Vol 9 (9) ◽  
pp. 1724 ◽  
Author(s):  
Sławomir K. Zieliński ◽  
Hyunkook Lee

The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.

Author(s):  
Victor O. Adegboye ◽  
Jason H. Rife

Abstract Whilst extensive work has been done on fault detection in bearings using sound, very little has been accomplished with other machine components and machinery partly due to the scarcity of datasets. The recent release of the Malfunctioning Industrial Machine Investigation and Inspection (MIMII) dataset opens the opportunity for research into malfunctioning machines like pumps, fans, slide rails, and valves. In this paper, we compare common features from audio recordings to investigate which best support the classification of malfunctioning pumps. We evaluate our results using the Area Under the Curve (AUC) as a performance metric and determine that the log mel spectrum is a very useful feature, at least for this dataset, but that other features can enhance detection performance when ambient noise is present (improving AUC from 0.88 to 0.94 in one case). Also, we find that mel Frequency Cepstral Coefficients (MFCC) perform substantially poorer as features than a sampled mel spectrogram.


2019 ◽  
Vol 62 (9) ◽  
pp. 3265-3275
Author(s):  
Heather L. Ramsdell-Hudock ◽  
Anne S. Warlaumont ◽  
Lindsey E. Foss ◽  
Candice Perry

Purpose To better enable communication among researchers, clinicians, and caregivers, we aimed to assess how untrained listeners classify early infant vocalization types in comparison to terms currently used by researchers and clinicians. Method Listeners were caregivers with no prior formal education in speech and language development. A 1st group of listeners reported on clinician/researcher-classified vowel, squeal, growl, raspberry, whisper, laugh, and cry vocalizations obtained from archived video/audio recordings of 10 infants from 4 through 12 months of age. A list of commonly used terms was generated based on listener responses and the standard research terminology. A 2nd group of listeners was presented with the same vocalizations and asked to select terms from the list that they thought best described the sounds. Results Classifications of the vocalizations by listeners largely overlapped with published categorical descriptors and yielded additional insight into alternate terms commonly used. The biggest discrepancies were found for the vowel category. Conclusion Prior research has shown that caregivers are accurate in identifying canonical babbling, a major prelinguistic vocalization milestone occurring at about 6–7 months of age. This indicates that caregivers are also well attuned to even earlier emerging vocalization types. This supports the value of continuing basic and clinical research on the vocal types infants produce in the 1st months of life and on their potential diagnostic utility, and may also help improve communication between speech-language pathologists and families.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Manab Kumar Das ◽  
Samit Ari

Classification of electrocardiogram (ECG) signals plays an important role in clinical diagnosis of heart disease. This paper proposes the design of an efficient system for classification of the normal beat (N), ventricular ectopic beat (V), supraventricular ectopic beat (S), fusion beat (F), and unknown beat (Q) using a mixture of features. In this paper, two different feature extraction methods are proposed for classification of ECG beats: (i) S-transform based features along with temporal features and (ii) mixture of ST and WT based features along with temporal features. The extracted feature set is independently classified using multilayer perceptron neural network (MLPNN). The performances are evaluated on several normal and abnormal ECG signals from 44 recordings of the MIT-BIH arrhythmia database. In this work, the performances of three feature extraction techniques with MLP-NN classifier are compared using five classes of ECG beat recommended by AAMI (Association for the Advancement of Medical Instrumentation) standards. The average sensitivity performances of the proposed feature extraction technique for N, S, F, V, and Q are 95.70%, 78.05%, 49.60%, 89.68%, and 33.89%, respectively. The experimental results demonstrate that the proposed feature extraction techniques show better performances compared to other existing features extraction techniques.


Author(s):  
D. Akbari ◽  
M. Moradizadeh ◽  
M. Akbari

<p><strong>Abstract.</strong> This paper describes a new framework for classification of hyperspectral images, based on both spectral and spatial information. The spatial information is obtained by an enhanced Marker-based Hierarchical Segmentation (MHS) algorithm. The hyperspectral data is first fed into the Multi-Layer Perceptron (MLP) neural network classification algorithm. Then, the MHS algorithm is applied in order to increase the accuracy of less-accurately classified land-cover types. In the proposed approach, the markers are extracted from the classification maps obtained by MLP and Support Vector Machines (SVM) classifiers. Experimental results on Washington DC Mall hyperspectral dataset, demonstrate the superiority of proposed approach compared to the MLP and the original MHS algorithms.</p>


TecnoLógicas ◽  
2019 ◽  
Vol 22 (46) ◽  
pp. 1-14 ◽  
Author(s):  
Jorge Luis Bacca ◽  
Henry Arguello

Spectral image clustering is an unsupervised classification method which identifies distributions of pixels using spectral information without requiring a previous training stage. The sparse subspace clustering-based methods (SSC) assume that hyperspectral images lie in the union of multiple low-dimensional subspaces.  Using this, SSC groups spectral signatures in different subspaces, expressing each spectral signature as a sparse linear combination of all pixels, ensuring that the non-zero elements belong to the same class. Although these methods have shown good accuracy for unsupervised classification of hyperspectral images, the computational complexity becomes intractable as the number of pixels increases, i.e. when the spatial dimension of the image is large. For this reason, this paper proposes to reduce the number of pixels to be classified in the hyperspectral image, and later, the clustering results for the missing pixels are obtained by exploiting the spatial information. Specifically, this work proposes two methodologies to remove the pixels, the first one is based on spatial blue noise distribution which reduces the probability to remove cluster of neighboring pixels, and the second is a sub-sampling procedure that eliminates every two contiguous pixels, preserving the spatial structure of the scene. The performance of the proposed spectral image clustering framework is evaluated in three datasets showing that a similar accuracy is obtained when up to 50% of the pixels are removed, in addition, it is up to 7.9 times faster compared to the classification of the data sets without incomplete pixels.


2018 ◽  
Vol 7 (3.3) ◽  
pp. 426
Author(s):  
Swagata Sarkar ◽  
Sanjana R ◽  
Rajalakshmi S ◽  
Harini T J

Automatic Speech reconstruction system is a topic of interest of many researchers. Since many online courses are come into the picture, so recent researchers are concentrating on speech accent recognition. Many works have been done in this field. In this paper speech accent recognition of Tamil speech from different zones of Tamilnadu is addressed. Hidden Markov Model (HMM) and Viterbi algorithms are very popularly used algorithms. Researchers have worked with Mel Frequency Cepstral Coefficients (MFCC) to identify speech as well as speech accent. In this paper speech accent features are identified by modified MFCC algorithm. The classification of features is done by back propagation algorithm.  


Sign in / Sign up

Export Citation Format

Share Document