1-Dimensional Polynomial Neural Networks for audio signal related problems

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.

Download Full-text

Integrating Audio Signal Processing and Deep Learning Algorithms for Gait Pattern Classification in Brazilian Gaited Horses

Frontiers in Animal Science ◽

10.3389/fanim.2021.681557 ◽

2021 ◽

Vol 2 ◽

Author(s):

Anderson Antonio Carvalho Alves ◽

Lucas Tassoni Andrietta ◽

Rafael Zinni Lopes ◽

Fernando Oliveira Bussiman ◽

Fabyano Fonseca e Silva ◽

...

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Deep Learning ◽

Audio Signal ◽

Gait Pattern ◽

Classification Performance ◽

Audio Signal Processing ◽

Gait Patterns ◽

Audio Features ◽

Gaited Horses

This study focused on assessing the usefulness of using audio signal processing in the gaited horse industry. A total of 196 short-time audio files (4 s) were collected from video recordings of Brazilian gaited horses. These files were converted into waveform signals (196 samples by 80,000 columns) and divided into training (N = 164) and validation (N = 32) datasets. Twelve single-valued audio features were initially extracted to summarize the training data according to the gait patterns (Marcha Batida—MB and Marcha Picada—MP). After preliminary analyses, high-dimensional arrays of the Mel Frequency Cepstral Coefficients (MFCC), Onset Strength (OS), and Tempogram (TEMP) were extracted and used as input information in the classification algorithms. A principal component analysis (PCA) was performed using the 12 single-valued features set and each audio-feature dataset—AFD (MFCC, OS, and TEMP) for prior data visualization. Machine learning (random forest, RF; support vector machine, SVM) and deep learning (multilayer perceptron neural networks, MLP; convolution neural networks, CNN) algorithms were used to classify the gait types. A five-fold cross-validation scheme with 10 repetitions was employed for assessing the models' predictive performance. The classification performance across models and AFD was also validated with independent observations. The models and AFD were compared based on the classification accuracy (ACC), specificity (SPEC), sensitivity (SEN), and area under the curve (AUC). In the logistic regression analysis, five out of the 12 audio features extracted were significant (p < 0.05) between the gait types. ACC averages ranged from 0.806 to 0.932 for MFCC, from 0.758 to 0.948 for OS and, from 0.936 to 0.968 for TEMP. Overall, the TEMP dataset provided the best classification accuracies for all models. The most suitable method for audio-based horse gait pattern classification was CNN. Both cross and independent validation schemes confirmed that high values of ACC, SPEC, SEN, and AUC are expected for yet-to-be-observed labels, except for MFCC-based models, in which clear overfitting was observed. Using audio-generated data for describing gait phenotypes in Brazilian horses is a promising approach, as the two gait patterns were correctly distinguished. The highest classification performance was achieved by combining CNN and the rhythmic-descriptive AFD.

Download Full-text

A Comparison of Audio Signal Preprocessing Methods for Deep Neural Networks on Music Tagging

2018 26th European Signal Processing Conference (EUSIPCO) ◽

10.23919/eusipco.2018.8553106 ◽

2018 ◽

Cited By ~ 7

Author(s):

Keunwoo Choi ◽

Gyorgy Fazekas ◽

Mark Sandler ◽

Kyunghyun Cho

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Audio Signal ◽

Signal Preprocessing ◽

Music Tagging

Download Full-text

An examination of the application of multi-layer neural networks to audio signal processing

10.1109/ijcnn.1990.137586 ◽

1990 ◽

Cited By ~ 1

Author(s):

J.D. Hoyt ◽

H. Wechsler

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Audio Signal ◽

Audio Signal Processing

Download Full-text

Audio signal processing by neural networks

Neurocomputing ◽

10.1016/s0925-2312(03)00395-3 ◽

2003 ◽

Vol 55 (3-4) ◽

pp. 593-625 ◽

Cited By ~ 31

Author(s):

Aurelio Uncini

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Audio Signal ◽

Audio Signal Processing

Download Full-text

ViVoVAD: a Voice Activity Detection Tool based on Recurrent Neural Networks

Jornada de Jóvenes Investigadores del I3A ◽

10.26754/jji-i3a.003524 ◽

2019 ◽

Vol 7 ◽

Author(s):

Pablo Gimeno Jordán ◽

Ignacio Viñals Bailo ◽

Alfonso Ortega Giménez ◽

Antonio Miguel Artiaga ◽

Eduardo Lleida Solano

Keyword(s):

Neural Networks ◽

Recurrent Neural Networks ◽

Audio Signal ◽

Voice Activity Detection ◽

Activity Detection ◽

Detection Tool ◽

Voice Activity

Voice Activity Detection (VAD) aims to distinguishcorrectly those audio segments containing humanspeech. In this paper we present our latest approachto the VAD task that relies on the modellingcapabilities of Bidirectional Long Short TermMemory (BLSTM) layers to classify every frame inan audio signal as speech or non-speech

Download Full-text

AN OVERVIEW OF METHODS FOR GENERATING, AUGMENTING AND EVALUATING ROOM IMPULSE RESPONSE USING ARTIFICIAL NEURAL NETWORKS

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2021.15152 ◽

2021 ◽

Vol 13 (0) ◽

pp. 1-5

Author(s):

Mantas Tamulionis

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Impulse Response ◽

Automatic Speech Recognition ◽

Audio Signal ◽

Training Data ◽

Audio Signal Processing ◽

Artificial Neural

Methods based on artificial neural networks (ANN) are widely used in various audio signal processing tasks. This provides opportunities to optimize processes and save resources required for calculations. One of the main objects we need to get to numerically capture the acoustics of a room is the room impulse response (RIR). Increasingly, research authors choose not to record these impulses in a real room but to generate them using ANN, as this gives them the freedom to prepare unlimited-sized training datasets. Neural networks are also used to augment the generated impulses to make them similar to the ones actually recorded. The widest use of ANN so far is observed in the evaluation of the generated results, for example, in automatic speech recognition (ASR) tasks. This review also describes datasets of recorded RIR impulses commonly found in various studies that are used as training data for neural networks.

Download Full-text

Using Convolutional Neural Networks to Classify Audio Signal in Noisy Sound Scenes

2018 Global Smart Industry Conference (GloSIC) ◽

10.1109/glosic.2018.8570117 ◽

2018 ◽

Cited By ~ 1

Author(s):

M.V. Gubin

Keyword(s):

Neural Networks ◽

Convolutional Neural Networks ◽

Audio Signal

Download Full-text

Hybrid neural network based on novel audio feature for vehicle type identification

Scientific Reports ◽

10.1038/s41598-021-87399-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Haoze Chen ◽

Zhijie Zhang

Keyword(s):

Neural Networks ◽

Visual Information ◽

Recognition Rate ◽

Real Life ◽

Audio Signal ◽

Training Data ◽

Identification System ◽

The Novel ◽

Vehicle Information ◽

Audio Information

AbstractDue to the audio information of different types of vehicle models are distinct, the vehicle information can be identified by the audio signal of vehicle accurately. In real life, in order to determine the type of vehicle, we do not need to obtain the visual information of vehicles and just need to obtain the audio information. In this paper, we extract and stitching different features from different aspects: Mel frequency cepstrum coefficients in perceptual characteristics, pitch class profile in psychoacoustic characteristics and short-term energy in acoustic characteristics. In addition, we improve the neural networks classifier by fusing the LSTM unit into the convolutional neural networks. At last, we put the novel feature to the hybrid neural networks to recognize different vehicles. The results suggest the novel feature we proposed in this paper can increase the recognition rate by 7%; destroying the training data randomly by superimposing different kinds of noise can improve the anti-noise ability in our identification system; and LSTM has great advantages in modeling time series, adding LSTM to the networks can improve the recognition rate of 3.39%.

Download Full-text

Deep Neural Networks for Shimmer Approximation in Synthesized Audio Signal

Communications in Computer and Information Science - Computer Science – CACIC 2017 ◽

10.1007/978-3-319-75214-3_1 ◽

2018 ◽

pp. 3-12

Author(s):

Mario Alejandro García ◽

Eduardo Atilio Destéfanis

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Audio Signal

Download Full-text