Integrating Audio Signal Processing and Deep Learning Algorithms for Gait Pattern Classification in Brazilian Gaited Horses

This study focused on assessing the usefulness of using audio signal processing in the gaited horse industry. A total of 196 short-time audio files (4 s) were collected from video recordings of Brazilian gaited horses. These files were converted into waveform signals (196 samples by 80,000 columns) and divided into training (N = 164) and validation (N = 32) datasets. Twelve single-valued audio features were initially extracted to summarize the training data according to the gait patterns (Marcha Batida—MB and Marcha Picada—MP). After preliminary analyses, high-dimensional arrays of the Mel Frequency Cepstral Coefficients (MFCC), Onset Strength (OS), and Tempogram (TEMP) were extracted and used as input information in the classification algorithms. A principal component analysis (PCA) was performed using the 12 single-valued features set and each audio-feature dataset—AFD (MFCC, OS, and TEMP) for prior data visualization. Machine learning (random forest, RF; support vector machine, SVM) and deep learning (multilayer perceptron neural networks, MLP; convolution neural networks, CNN) algorithms were used to classify the gait types. A five-fold cross-validation scheme with 10 repetitions was employed for assessing the models' predictive performance. The classification performance across models and AFD was also validated with independent observations. The models and AFD were compared based on the classification accuracy (ACC), specificity (SPEC), sensitivity (SEN), and area under the curve (AUC). In the logistic regression analysis, five out of the 12 audio features extracted were significant (p < 0.05) between the gait types. ACC averages ranged from 0.806 to 0.932 for MFCC, from 0.758 to 0.948 for OS and, from 0.936 to 0.968 for TEMP. Overall, the TEMP dataset provided the best classification accuracies for all models. The most suitable method for audio-based horse gait pattern classification was CNN. Both cross and independent validation schemes confirmed that high values of ACC, SPEC, SEN, and AUC are expected for yet-to-be-observed labels, except for MFCC-based models, in which clear overfitting was observed. Using audio-generated data for describing gait phenotypes in Brazilian horses is a promising approach, as the two gait patterns were correctly distinguished. The highest classification performance was achieved by combining CNN and the rhythmic-descriptive AFD.

Download Full-text

Deep Learning for Audio Signal Processing

IEEE Journal of Selected Topics in Signal Processing ◽

10.1109/jstsp.2019.2908700 ◽

2019 ◽

Vol 13 (2) ◽

pp. 206-219 ◽

Cited By ~ 46

Author(s):

Hendrik Purwins ◽

Bo Li ◽

Tuomas Virtanen ◽

Jan Schluter ◽

Shuo-Yiin Chang ◽

...

Keyword(s):

Signal Processing ◽

Deep Learning ◽

Audio Signal ◽

Audio Signal Processing

Download Full-text

An examination of the application of multi-layer neural networks to audio signal processing

10.1109/ijcnn.1990.137586 ◽

1990 ◽

Cited By ~ 1

Author(s):

J.D. Hoyt ◽

H. Wechsler

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Audio Signal ◽

Audio Signal Processing

Download Full-text

Audio signal processing by neural networks

Neurocomputing ◽

10.1016/s0925-2312(03)00395-3 ◽

2003 ◽

Vol 55 (3-4) ◽

pp. 593-625 ◽

Cited By ~ 31

Author(s):

Aurelio Uncini

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Audio Signal ◽

Audio Signal Processing

Download Full-text

Intelligent Audio Signal Processing for Detecting Rainforest Species Using Deep Learning

Intelligent Automation & Soft Computing ◽

10.32604/iasc.2022.019811 ◽

2022 ◽

Vol 31 (2) ◽

pp. 693-706

Author(s):

Rakesh Kumar ◽

Meenu Gupta ◽

Shakeel Ahmed ◽

Abdulaziz Alhumam ◽

Tushar Aggarwal

Keyword(s):

Signal Processing ◽

Deep Learning ◽

Audio Signal ◽

Audio Signal Processing

Download Full-text

AN OVERVIEW OF METHODS FOR GENERATING, AUGMENTING AND EVALUATING ROOM IMPULSE RESPONSE USING ARTIFICIAL NEURAL NETWORKS

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2021.15152 ◽

2021 ◽

Vol 13 (0) ◽

pp. 1-5

Author(s):

Mantas Tamulionis

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Impulse Response ◽

Automatic Speech Recognition ◽

Audio Signal ◽

Training Data ◽

Audio Signal Processing ◽

Artificial Neural

Methods based on artificial neural networks (ANN) are widely used in various audio signal processing tasks. This provides opportunities to optimize processes and save resources required for calculations. One of the main objects we need to get to numerically capture the acoustics of a room is the room impulse response (RIR). Increasingly, research authors choose not to record these impulses in a real room but to generate them using ANN, as this gives them the freedom to prepare unlimited-sized training datasets. Neural networks are also used to augment the generated impulses to make them similar to the ones actually recorded. The widest use of ANN so far is observed in the evaluation of the generated results, for example, in automatic speech recognition (ASR) tasks. This review also describes datasets of recorded RIR impulses commonly found in various studies that are used as training data for neural networks.

Download Full-text