Application of Symbolic Machine Learning to Audio Signal Segmentation

Author(s):  
Arimantas Raškinis ◽  
Gailius Raškinis
2020 ◽  
Vol 10 (4) ◽  
pp. 220 ◽  
Author(s):  
Nicolina Sciaraffa ◽  
Manousos A. Klados ◽  
Gianluca Borghini ◽  
Gianluca Di Flumeri ◽  
Fabio Babiloni ◽  
...  

The need for automatic detection and classification of high-frequency oscillations (HFOs) as biomarkers of the epileptogenic tissue is strongly felt in the clinical field. In this context, the employment of artificial intelligence methods could be the missing piece to achieve this goal. This work proposed a double-step procedure based on machine learning algorithms and tested it on an intracranial electroencephalogram (iEEG) dataset available online. The first step aimed to define the optimal length for signal segmentation, allowing for an optimal discrimination of segments with HFO relative to those without. In this case, binary classifiers have been tested on a set of energy features. The second step aimed to classify these segments into ripples, fast ripples and fast ripples occurring during ripples. Results suggest that LDA applied to 10 ms segmentation could provide the highest sensitivity (0.874) and 0.776 specificity for the discrimination of HFOs from no-HFO segments. Regarding the three-class classification, non-linear methods provided the highest values (around 90%) in terms of specificity and sensitivity, significantly different to the other three employed algorithms. Therefore, this machine-learning-based procedure could help clinicians to automatically reduce the quantity of irrelevant data.


2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Jingwen Zhang

With the rapid development of information technology and communication, digital music has grown and exploded. Regarding how to quickly and accurately retrieve the music that users want from huge bulk of music repository, music feature extraction and classification are considered as an important part of music information retrieval and have become a research hotspot in recent years. Traditional music classification approaches use a large number of artificially designed acoustic features. The design of features requires knowledge and in-depth understanding in the domain of music. The features of different classification tasks are often not universal and comprehensive. The existing approach has two shortcomings as follows: ensuring the validity and accuracy of features by manually extracting features and the traditional machine learning classification approaches not performing well on multiclassification problems and not having the ability to be trained on large-scale data. Therefore, this paper converts the audio signal of music into a sound spectrum as a unified representation, avoiding the problem of manual feature selection. According to the characteristics of the sound spectrum, the research has combined 1D convolution, gating mechanism, residual connection, and attention mechanism and proposed a music feature extraction and classification model based on convolutional neural network, which can extract more relevant sound spectrum characteristics of the music category. Finally, this paper designs comparison and ablation experiments. The experimental results show that this approach is better than traditional manual models and machine learning-based approaches.


2019 ◽  
Vol 107 ◽  
pp. 10-17 ◽  
Author(s):  
Naghmeh Mahmoodian ◽  
Anna Schaufler ◽  
Ali Pashazadeh ◽  
Axel Boese ◽  
Michael Friebe ◽  
...  

2020 ◽  
pp. 147592172097698
Author(s):  
Furui Wang ◽  
Gangbing Song

Recently, for bolt looseness detection, percussion-based methods have attracted more attention due to their advantages of eliminating contact sensors. The core issue of percussion-based methods is audio signal processing to characterize different bolt preloads, while current percussion-based methods all depend on machine learning–based techniques that require hand-crafted features and overlook bolt looseness at the incipient stage. Thus, in this article, the main contribution is that we propose a novel one-dimensional training interference capsule neural network (1D-TICapsNet) to process and classify percussion-induced sound signals, thus achieving bolt early looseness detection. First, compared to machine learning–based techniques, 1D-TICapsNet can fuse feature extraction and classification in one frame to achieve better performance. In addition, due to two tricks (i.e. training interference), including wider kernels in the first convolutional layer and the targeted dropout technique, our proposed 1D-TICapsNet outperforms several state-of-the-art deep learning techniques in terms of classification accuracy, computational costs, and the denoising capacity. We call these two tricks as “training interference” since they work during training procedure. Finally, we confirm the effectiveness and superiorities of 1D-TICapsNet via experiments. Considering the efficacy of 1D-TICapsNet, we can expect its real-world applications on bolt early looseness detection and other classification of one-dimensional signals.


2006 ◽  
Vol 37 (4) ◽  
pp. 23-34 ◽  
Author(s):  
Naoki Nitanda ◽  
Miki Haseyama ◽  
Hideo Kitajima

2021 ◽  
Author(s):  
Eric Guizzo ◽  
Riccardo F. Gramaccioni ◽  
Saeid Jamili ◽  
Christian Marinoni ◽  
Edoardo Massaro ◽  
...  

2019 ◽  
Author(s):  
Bruno Tavares Padovese ◽  
Linilson Rodrigues Padovese

AbstractAvian survey is a time-consuming and challenging task, often being conducted in remote and sometimes inhospitable locations. In this context, the development of automated acoustic landscape monitoring systems for bird survey is essential. We conducted a comparative study between two machine learning methods for the detection and identification of 2 endangered Brazilian bird species from the Psittacidae species, the Amazona brasiliensis and the Amazona vinacea. Specifically, we focus on the identification of these 2 species in an acoustic landscape where similar vocalizations from other Psittacidae species are present. A 3-step approach is presented, composed of signal segmentation and filtering, feature extraction, and classification. In the feature extraction step, the Mel-Frequency Cepstrum Coefficients features were extract and fed to the Random Forest Algorithm and the Multilayer Perceptron for training and classifying acoustic samples. The experiments showed promising results, particularly for the Random Forest algorithm, achieving accuracy of up to 99%. Using a combination of signal segmentation and filtering before the feature extraction steps greatly increased experimental results. Additionally, the results show that the proposed approach is robust and flexible to be adopted in passive acoustic monitoring systems.


Author(s):  
Andrej Zgank ◽  
Damjan Vlaj

The chapter presents acoustic presence detection, which can be applied to support the smart home system with information about the presence of humans in the environment. The acoustic presence detection is based on digital signal processing and machine learning methods, with the objective to classify the captured audio signal into the corresponding class. An analysis of different audio capturing devices for a smart home environment from the perspective of acoustic presence detection will be carried out. The presence detection task consists of voice activity detection, feature extraction, and classification. The extension of acoustic presence detection with additional information about the user's characteristics is proposed. This information can be used to optimize the smart home human-computer interface with personalization and customization functionalities.


Mathematics ◽  
2020 ◽  
Vol 8 (12) ◽  
pp. 2133
Author(s):  
Mustaqeem ◽  
Soonil Kwon

Artificial intelligence, deep learning, and machine learning are dominant sources to use in order to make a system smarter. Nowadays, the smart speech emotion recognition (SER) system is a basic necessity and an emerging research area of digital audio signal processing. However, SER plays an important role with many applications that are related to human–computer interactions (HCI). The existing state-of-the-art SER system has a quite low prediction performance, which needs improvement in order to make it feasible for the real-time commercial applications. The key reason for the low accuracy and the poor prediction rate is the scarceness of the data and a model configuration, which is the most challenging task to build a robust machine learning technique. In this paper, we addressed the limitations of the existing SER systems and proposed a unique artificial intelligence (AI) based system structure for the SER that utilizes the hierarchical blocks of the convolutional long short-term memory (ConvLSTM) with sequence learning. We designed four blocks of ConvLSTM, which is called the local features learning block (LFLB), in order to extract the local emotional features in a hierarchical correlation. The ConvLSTM layers are adopted for input-to-state and state-to-state transition in order to extract the spatial cues by utilizing the convolution operations. We placed four LFLBs in order to extract the spatiotemporal cues in the hierarchical correlational form speech signals using the residual learning strategy. Furthermore, we utilized a novel sequence learning strategy in order to extract the global information and adaptively adjust the relevant global feature weights according to the correlation of the input features. Finally, we used the center loss function with the softmax loss in order to produce the probability of the classes. The center loss increases the final classification results and ensures an accurate prediction as well as shows a conspicuous role in the whole proposed SER scheme. We tested the proposed system over two standard, interactive emotional dyadic motion capture (IEMOCAP) and ryerson audio visual database of emotional speech and song (RAVDESS) speech corpora, and obtained a 75% and an 80% recognition rate, respectively.


Sign in / Sign up

Export Citation Format

Share Document