Automatic Spatial Audio Scene Classification in Binaural Recordings of Music

The aim of the study was to develop a method for automatic classification of the three spatial audio scenes, differing in horizontal distribution of foreground and background audio content around a listener in binaurally rendered recordings of music. For the purpose of the study, audio recordings were synthesized using thirteen sets of binaural-room-impulse-responses (BRIRs), representing room acoustics of both semi-anechoic and reverberant venues. Head movements were not considered in the study. The proposed method was assumption-free with regards to the number and characteristics of the audio sources. A least absolute shrinkage and selection operator was employed as a classifier. According to the results, it is possible to automatically identify the spatial scenes using a combination of binaural and spectro-temporal features. The method exhibits a satisfactory classification accuracy when it is trained and then tested on different stimuli but synthesized using the same BRIRs (accuracy ranging from 74% to 98%), even in highly reverberant conditions. However, the generalizability of the method needs to be further improved. This study demonstrates that in addition to the binaural cues, the Mel-frequency cepstral coefficients constitute an important carrier of spatial information, imperative for the classification of spatial audio scenes.

Download Full-text

Feature Selection for Audio-Based Fault Detection in Pumps

Volume 14: Safety Engineering, Risk, and Reliability Analysis ◽

10.1115/imece2020-24536 ◽

2020 ◽

Author(s):

Victor O. Adegboye ◽

Jason H. Rife

Keyword(s):

Fault Detection ◽

Ambient Noise ◽

Area Under The Curve ◽

Mel Frequency Cepstral Coefficients ◽

Common Features ◽

Performance Metric ◽

Audio Recordings ◽

Selection For ◽

Industrial Machine

Abstract Whilst extensive work has been done on fault detection in bearings using sound, very little has been accomplished with other machine components and machinery partly due to the scarcity of datasets. The recent release of the Malfunctioning Industrial Machine Investigation and Inspection (MIMII) dataset opens the opportunity for research into malfunctioning machines like pumps, fans, slide rails, and valves. In this paper, we compare common features from audio recordings to investigate which best support the classification of malfunctioning pumps. We evaluate our results using the Area Under the Curve (AUC) as a performance metric and determine that the log mel spectrum is a very useful feature, at least for this dataset, but that other features can enhance detection performance when ambient noise is present (improving AUC from 0.88 to 0.94 in one case). Also, we find that mel Frequency Cepstral Coefficients (MFCC) perform substantially poorer as features than a sampled mel spectrogram.

Download Full-text

Classification of Infant Vocalizations by Untrained Listeners

Journal of Speech Language and Hearing Research ◽

10.1044/2019_jslhr-s-18-0494 ◽

2019 ◽

Vol 62 (9) ◽

pp. 3265-3275

Author(s):

Heather L. Ramsdell-Hudock ◽

Anne S. Warlaumont ◽

Lindsey E. Foss ◽

Candice Perry

Keyword(s):

Formal Education ◽

Additional Insight ◽

Listener Responses ◽

Infant Vocalization ◽

Early Infant ◽

Infant Vocalizations ◽

Audio Recordings ◽

Speech And Language Development ◽

Insight Into

Purpose To better enable communication among researchers, clinicians, and caregivers, we aimed to assess how untrained listeners classify early infant vocalization types in comparison to terms currently used by researchers and clinicians. Method Listeners were caregivers with no prior formal education in speech and language development. A 1st group of listeners reported on clinician/researcher-classified vowel, squeal, growl, raspberry, whisper, laugh, and cry vocalizations obtained from archived video/audio recordings of 10 infants from 4 through 12 months of age. A list of commonly used terms was generated based on listener responses and the standard research terminology. A 2nd group of listeners was presented with the same vocalizations and asked to select terms from the list that they thought best described the sounds. Results Classifications of the vocalizations by listeners largely overlapped with published categorical descriptors and yielded additional insight into alternate terms commonly used. The biggest discrepancies were found for the vowel category. Conclusion Prior research has shown that caregivers are accurate in identifying canonical babbling, a major prelinguistic vocalization milestone occurring at about 6–7 months of age. This indicates that caregivers are also well attuned to even earlier emerging vocalization types. This supports the value of continuing basic and clinical research on the vocal types infants produce in the 1st months of life and on their potential diagnostic utility, and may also help improve communication between speech-language pathologists and families.

Download Full-text

Spatial Information Enrichment using NLP-based Classification of Space Objects for School Bldgs. in Korea

10.22260/isarc2019/0056 ◽

2019 ◽

Author(s):

Jaeyeol Song ◽

Jinsung Kim ◽

Jin-Kook Lee

Keyword(s):

Spatial Information ◽

Space Objects

Download Full-text

Landcover classification of satellite images based on an adaptive interval fuzzy c-means algorithm coupled with spatial information

International Journal of Remote Sensing ◽

10.1080/01431161.2019.1685718 ◽

2019 ◽

Vol 41 (6) ◽

pp. 2189-2208

Author(s):

Jindong Xu ◽

Guozheng Feng ◽

Baode Fan ◽

Weiqing Yan ◽

Tianyu Zhao ◽

...

Keyword(s):

Satellite Images ◽

Spatial Information ◽

Fuzzy C Means ◽

Landcover Classification ◽

Fuzzy C Means Algorithm

Download Full-text

ECG Beats Classification Using Mixture of Features

International Scholarly Research Notices ◽

10.1155/2014/178436 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 21

Author(s):

Manab Kumar Das ◽

Samit Ari

Keyword(s):

Feature Extraction ◽

Extraction Methods ◽

Ectopic Beat ◽

Extraction Techniques ◽

Efficient System ◽

Ecg Signals ◽

Temporal Features ◽

S Transform ◽

Electrocardiogram Ecg

Classification of electrocardiogram (ECG) signals plays an important role in clinical diagnosis of heart disease. This paper proposes the design of an efficient system for classification of the normal beat (N), ventricular ectopic beat (V), supraventricular ectopic beat (S), fusion beat (F), and unknown beat (Q) using a mixture of features. In this paper, two different feature extraction methods are proposed for classification of ECG beats: (i) S-transform based features along with temporal features and (ii) mixture of ST and WT based features along with temporal features. The extracted feature set is independently classified using multilayer perceptron neural network (MLPNN). The performances are evaluated on several normal and abnormal ECG signals from 44 recordings of the MIT-BIH arrhythmia database. In this work, the performances of three feature extraction techniques with MLP-NN classifier are compared using five classes of ECG beat recommended by AAMI (Association for the Advancement of Medical Instrumentation) standards. The average sensitivity performances of the proposed feature extraction technique for N, S, F, V, and Q are 95.70%, 78.05%, 49.60%, 89.68%, and 33.89%, respectively. The experimental results demonstrate that the proposed feature extraction techniques show better performances compared to other existing features extraction techniques.

Download Full-text

SMCS: Automatic Real-Time Classification of Ambient Sounds, Based on a Deep Neural Network and Mel Frequency Cepstral Coefficients

Communications in Computer and Information Science - Applied Technologies ◽

10.1007/978-3-030-42520-3_20 ◽

2020 ◽

pp. 245-253

Author(s):

María José Mora-Regalado ◽

Omar Ruiz-Vivanco ◽

Alexandra González-Eras ◽

Pablo Torres-Carrión

Keyword(s):

Neural Network ◽

Real Time ◽

Deep Neural Network ◽

Mel Frequency Cepstral Coefficients ◽

Cepstral Coefficients ◽

Real Time Classification

Download Full-text

SPECTRAL-SPATIAL CLASSIFICATION OF HYPERSPECTRAL IMAGERY USING NEURAL NETWORK ALGORITHM AND HIERARCHICAL SEGMENTATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w12-1-2019 ◽

2019 ◽

Vol XLII-2/W12 ◽

pp. 1-5

Author(s):

D. Akbari ◽

M. Moradizadeh ◽

M. Akbari

Keyword(s):

Neural Network ◽

Spatial Information ◽

Hyperspectral Data ◽

Support Vector ◽

Washington Dc ◽

Hierarchical Segmentation ◽

Vector Machines ◽

Neural Network Algorithm ◽

New Framework

<p><strong>Abstract.</strong> This paper describes a new framework for classification of hyperspectral images, based on both spectral and spatial information. The spatial information is obtained by an enhanced Marker-based Hierarchical Segmentation (MHS) algorithm. The hyperspectral data is first fed into the Multi-Layer Perceptron (MLP) neural network classification algorithm. Then, the MHS algorithm is applied in order to increase the accuracy of less-accurately classified land-cover types. In the proposed approach, the markers are extracted from the classification maps obtained by MLP and Support Vector Machines (SVM) classifiers. Experimental results on Washington DC Mall hyperspectral dataset, demonstrate the superiority of proposed approach compared to the MLP and the original MHS algorithms.</p>

Download Full-text

Sparse Subspace Clustering in Hyperspectral Images using Incomplete Pixels

TecnoLógicas ◽

10.22430/22565337.1205 ◽

2019 ◽

Vol 22 (46) ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Jorge Luis Bacca ◽

Henry Arguello

Keyword(s):

Spatial Information ◽

Hyperspectral Image ◽

Subspace Clustering ◽

Unsupervised Classification ◽

Hyperspectral Images ◽

Image Clustering ◽

Spectral Signature ◽

Spectral Image ◽

Sparse Subspace Clustering

Spectral image clustering is an unsupervised classification method which identifies distributions of pixels using spectral information without requiring a previous training stage. The sparse subspace clustering-based methods (SSC) assume that hyperspectral images lie in the union of multiple low-dimensional subspaces. Using this, SSC groups spectral signatures in different subspaces, expressing each spectral signature as a sparse linear combination of all pixels, ensuring that the non-zero elements belong to the same class. Although these methods have shown good accuracy for unsupervised classification of hyperspectral images, the computational complexity becomes intractable as the number of pixels increases, i.e. when the spatial dimension of the image is large. For this reason, this paper proposes to reduce the number of pixels to be classified in the hyperspectral image, and later, the clustering results for the missing pixels are obtained by exploiting the spatial information. Specifically, this work proposes two methodologies to remove the pixels, the first one is based on spatial blue noise distribution which reduces the probability to remove cluster of neighboring pixels, and the second is a sub-sampling procedure that eliminates every two contiguous pixels, preserving the spatial structure of the scene. The performance of the proposed spectral image clustering framework is evaluated in three datasets showing that a similar accuracy is obtained when up to 50% of the pixels are removed, in addition, it is up to 7.9 times faster compared to the classification of the data sets without incomplete pixels.

Download Full-text

Simulation and detection of tamil speech accent using modified mel frequency cepstral coefficient algorithm

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.33.14202 ◽

2018 ◽

Vol 7 (3.3) ◽

pp. 426

Author(s):

Swagata Sarkar ◽

Sanjana R ◽

Rajalakshmi S ◽

Harini T J

Keyword(s):

Online Courses ◽

Hidden Markov ◽

Back Propagation ◽

Back Propagation Algorithm ◽

Mel Frequency Cepstral Coefficients ◽

Propagation Algorithm ◽

Accent Recognition ◽

Speech Reconstruction ◽

Mel Frequency Cepstral Coefficient

Automatic Speech reconstruction system is a topic of interest of many researchers. Since many online courses are come into the picture, so recent researchers are concentrating on speech accent recognition. Many works have been done in this field. In this paper speech accent recognition of Tamil speech from different zones of Tamilnadu is addressed. Hidden Markov Model (HMM) and Viterbi algorithms are very popularly used algorithms. Researchers have worked with Mel Frequency Cepstral Coefficients (MFCC) to identify speech as well as speech accent. In this paper speech accent features are identified by modified MFCC algorithm. The classification of features is done by back propagation algorithm.

Download Full-text

Constructed Temporal Features for Longitudinal Classification of Human Ageing Data

10.1109/ichi52183.2021.00027 ◽

2021 ◽

Author(s):

Caio Ribeiro ◽

Alex Freitas

Keyword(s):

Temporal Features ◽

Human Ageing

Download Full-text