Application of Symbolic Machine Learning to Audio Signal Segmentation

Double-Step Machine Learning Based Procedure for HFOs Detection and Classification

Brain Sciences ◽

10.3390/brainsci10040220 ◽

2020 ◽

Vol 10 (4) ◽

pp. 220 ◽

Cited By ~ 1

Author(s):

Nicolina Sciaraffa ◽

Manousos A. Klados ◽

Gianluca Borghini ◽

Gianluca Di Flumeri ◽

Fabio Babiloni ◽

...

Keyword(s):

Machine Learning ◽

Machine Learning Algorithms ◽

Double Step ◽

Optimal Length ◽

Signal Segmentation ◽

High Frequency Oscillations ◽

Specificity And Sensitivity ◽

Step Procedure ◽

Binary Classifiers

The need for automatic detection and classification of high-frequency oscillations (HFOs) as biomarkers of the epileptogenic tissue is strongly felt in the clinical field. In this context, the employment of artificial intelligence methods could be the missing piece to achieve this goal. This work proposed a double-step procedure based on machine learning algorithms and tested it on an intracranial electroencephalogram (iEEG) dataset available online. The first step aimed to define the optimal length for signal segmentation, allowing for an optimal discrimination of segments with HFO relative to those without. In this case, binary classifiers have been tested on a set of energy features. The second step aimed to classify these segments into ripples, fast ripples and fast ripples occurring during ripples. Results suggest that LDA applied to 10 ms segmentation could provide the highest sensitivity (0.874) and 0.776 specificity for the discrimination of HFOs from no-HFO segments. Regarding the three-class classification, non-linear methods provided the highest values (around 90%) in terms of specificity and sensitivity, significantly different to the other three employed algorithms. Therefore, this machine-learning-based procedure could help clinicians to automatically reduce the quantity of irrelevant data.

Download Full-text

Music Feature Extraction and Classification Algorithm Based on Deep Learning

Scientific Programming ◽

10.1155/2021/1651560 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Jingwen Zhang

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Large Scale ◽

Rapid Development ◽

Audio Signal ◽

Classification Model ◽

Machine Learning Classification ◽

Gating Mechanism ◽

Music Information ◽

Sound Spectrum

With the rapid development of information technology and communication, digital music has grown and exploded. Regarding how to quickly and accurately retrieve the music that users want from huge bulk of music repository, music feature extraction and classification are considered as an important part of music information retrieval and have become a research hotspot in recent years. Traditional music classification approaches use a large number of artificially designed acoustic features. The design of features requires knowledge and in-depth understanding in the domain of music. The features of different classification tasks are often not universal and comprehensive. The existing approach has two shortcomings as follows: ensuring the validity and accuracy of features by manually extracting features and the traditional machine learning classification approaches not performing well on multiclassification problems and not having the ability to be trained on large-scale data. Therefore, this paper converts the audio signal of music into a sound spectrum as a unified representation, avoiding the problem of manual feature selection. According to the characteristics of the sound spectrum, the research has combined 1D convolution, gating mechanism, residual connection, and attention mechanism and proposed a music feature extraction and classification model based on convolutional neural network, which can extract more relevant sound spectrum characteristics of the music category. Finally, this paper designs comparison and ablation experiments. The experimental results show that this approach is better than traditional manual models and machine learning-based approaches.

Download Full-text

Proximal detection of guide wire perforation using feature extraction from bispectral audio signal analysis combined with machine learning

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2019.02.001 ◽

2019 ◽

Vol 107 ◽

pp. 10-17 ◽

Cited By ~ 5

Author(s):

Naghmeh Mahmoodian ◽

Anna Schaufler ◽

Ali Pashazadeh ◽

Axel Boese ◽

Michael Friebe ◽

...

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Signal Analysis ◽

Guide Wire ◽

Audio Signal ◽

Audio Signal Analysis

Download Full-text

1D-TICapsNet: An audio signal processing algorithm for bolt early looseness detection

Structural Health Monitoring ◽

10.1177/1475921720976989 ◽

2020 ◽

pp. 147592172097698

Author(s):

Furui Wang ◽

Gangbing Song

Keyword(s):

Machine Learning ◽

Signal Processing ◽

Audio Signal ◽

Training Procedure ◽

Audio Signal Processing ◽

Signal Processing Algorithm ◽

One Dimensional ◽

Learning Techniques ◽

Core Issue ◽

Contact Sensors

Recently, for bolt looseness detection, percussion-based methods have attracted more attention due to their advantages of eliminating contact sensors. The core issue of percussion-based methods is audio signal processing to characterize different bolt preloads, while current percussion-based methods all depend on machine learning–based techniques that require hand-crafted features and overlook bolt looseness at the incipient stage. Thus, in this article, the main contribution is that we propose a novel one-dimensional training interference capsule neural network (1D-TICapsNet) to process and classify percussion-induced sound signals, thus achieving bolt early looseness detection. First, compared to machine learning–based techniques, 1D-TICapsNet can fuse feature extraction and classification in one frame to achieve better performance. In addition, due to two tricks (i.e. training interference), including wider kernels in the first convolutional layer and the targeted dropout technique, our proposed 1D-TICapsNet outperforms several state-of-the-art deep learning techniques in terms of classification accuracy, computational costs, and the denoising capacity. We call these two tricks as “training interference” since they work during training procedure. Finally, we confirm the effectiveness and superiorities of 1D-TICapsNet via experiments. Considering the efficacy of 1D-TICapsNet, we can expect its real-world applications on bolt early looseness detection and other classification of one-dimensional signals.

Download Full-text

Detection of abrupt spectral changes using support vector machines an application to audio signal segmentation

IEEE International Conference on Acoustics Speech and Signal Processing ◽

10.1109/icassp.2002.5744044 ◽

2002 ◽

Cited By ~ 7

Author(s):

Manuel Davy ◽

Simon Godsill

Keyword(s):

Support Vector Machines ◽

Audio Signal ◽

Support Vector ◽

Signal Segmentation ◽

Spectral Changes ◽

Vector Machines

Download Full-text

Audio signal segmentation and classification using fuzzy c-means clustering

Systems and Computers in Japan ◽

10.1002/scj.20491 ◽

2006 ◽

Vol 37 (4) ◽

pp. 23-34 ◽

Cited By ~ 8

Author(s):

Naoki Nitanda ◽

Miki Haseyama ◽

Hideo Kitajima

Keyword(s):

Audio Signal ◽

Signal Segmentation ◽

Fuzzy C Means ◽

Fuzzy C Means Clustering

Download Full-text

L3DAS21 Challenge: Machine Learning for 3D Audio Signal Processing

10.1109/mlsp52302.2021.9596248 ◽

2021 ◽

Author(s):

Eric Guizzo ◽

Riccardo F. Gramaccioni ◽

Saeid Jamili ◽

Christian Marinoni ◽

Edoardo Massaro ◽

...

Keyword(s):

Machine Learning ◽

Signal Processing ◽

Audio Signal ◽

Audio Signal Processing ◽

3D Audio

Download Full-text

A Machine Learning Approach to the Recognition of Brazilian Atlantic Forest Parrot Species

10.1101/2019.12.24.888180 ◽

2019 ◽

Author(s):

Bruno Tavares Padovese ◽

Linilson Rodrigues Padovese

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Random Forest ◽

Bird Species ◽

Monitoring Systems ◽

Random Forest Algorithm ◽

Signal Segmentation ◽

Detection And Identification ◽

Landscape Monitoring ◽

Passive Acoustic

AbstractAvian survey is a time-consuming and challenging task, often being conducted in remote and sometimes inhospitable locations. In this context, the development of automated acoustic landscape monitoring systems for bird survey is essential. We conducted a comparative study between two machine learning methods for the detection and identification of 2 endangered Brazilian bird species from the Psittacidae species, the Amazona brasiliensis and the Amazona vinacea. Specifically, we focus on the identification of these 2 species in an acoustic landscape where similar vocalizations from other Psittacidae species are present. A 3-step approach is presented, composed of signal segmentation and filtering, feature extraction, and classification. In the feature extraction step, the Mel-Frequency Cepstrum Coefficients features were extract and fed to the Random Forest Algorithm and the Multilayer Perceptron for training and classifying acoustic samples. The experiments showed promising results, particularly for the Random Forest algorithm, achieving accuracy of up to 99%. Using a combination of signal segmentation and filtering before the feature extraction steps greatly increased experimental results. Additionally, the results show that the proposed approach is robust and flexible to be adopted in passive acoustic monitoring systems.

Download Full-text

Acoustic Presence Detection in a Smart Home Environment

Encyclopedia of Information Science and Technology, Fifth Edition - Advances in Information Quality and Management ◽

10.4018/978-1-7998-3479-3.ch011 ◽

2021 ◽

pp. 138-153

Author(s):

Andrej Zgank ◽

Damjan Vlaj

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Home Environment ◽

Smart Home ◽

Audio Signal ◽

Digital Signal ◽

Detection Task ◽

Additional Information ◽

Smart Home Environment ◽

Voice Activity

The chapter presents acoustic presence detection, which can be applied to support the smart home system with information about the presence of humans in the environment. The acoustic presence detection is based on digital signal processing and machine learning methods, with the objective to classify the captured audio signal into the corresponding class. An analysis of different audio capturing devices for a smart home environment from the perspective of acoustic presence detection will be carried out. The presence detection task consists of voice activity detection, feature extraction, and classification. The extension of acoustic presence detection with additional information about the user's characteristics is proposed. This information can be used to optimize the smart home human-computer interface with personalization and customization functionalities.

Download Full-text

CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network

Mathematics ◽

10.3390/math8122133 ◽

2020 ◽

Vol 8 (12) ◽

pp. 2133

Author(s):

Mustaqeem ◽

Soonil Kwon

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Emotion Recognition ◽

Sequence Learning ◽

Learning Strategy ◽

Recognition Rate ◽

Audio Signal ◽

System Structure ◽

Speech Emotion Recognition ◽

Center Loss

Artificial intelligence, deep learning, and machine learning are dominant sources to use in order to make a system smarter. Nowadays, the smart speech emotion recognition (SER) system is a basic necessity and an emerging research area of digital audio signal processing. However, SER plays an important role with many applications that are related to human–computer interactions (HCI). The existing state-of-the-art SER system has a quite low prediction performance, which needs improvement in order to make it feasible for the real-time commercial applications. The key reason for the low accuracy and the poor prediction rate is the scarceness of the data and a model configuration, which is the most challenging task to build a robust machine learning technique. In this paper, we addressed the limitations of the existing SER systems and proposed a unique artificial intelligence (AI) based system structure for the SER that utilizes the hierarchical blocks of the convolutional long short-term memory (ConvLSTM) with sequence learning. We designed four blocks of ConvLSTM, which is called the local features learning block (LFLB), in order to extract the local emotional features in a hierarchical correlation. The ConvLSTM layers are adopted for input-to-state and state-to-state transition in order to extract the spatial cues by utilizing the convolution operations. We placed four LFLBs in order to extract the spatiotemporal cues in the hierarchical correlational form speech signals using the residual learning strategy. Furthermore, we utilized a novel sequence learning strategy in order to extract the global information and adaptively adjust the relevant global feature weights according to the correlation of the input features. Finally, we used the center loss function with the softmax loss in order to produce the probability of the classes. The center loss increases the final classification results and ensures an accurate prediction as well as shows a conspicuous role in the whole proposed SER scheme. We tested the proposed system over two standard, interactive emotional dyadic motion capture (IEMOCAP) and ryerson audio visual database of emotional speech and song (RAVDESS) speech corpora, and obtained a 75% and an 80% recognition rate, respectively.

Download Full-text