audio signal processing Latest Research Papers

Utterance clustering is one of the actively researched topics in audio signal processing and machine learning. This study aims to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. Processed audio signals were generated by combining left- and right-channel audio signals in a few different ways and then by extracting the embedded features (also called d-vectors) from those processed audio signals. This study applied the Gaussian mixture model for supervised utterance clustering. In the training phase, a parameter-sharing Gaussian mixture model was obtained to train the model for each speaker. In the testing phase, the speaker with the maximum likelihood was selected as the detected speaker. Results of experiments with real audio recordings of multiperson discussion sessions showed that the proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono-audio signals in more complicated conditions.

Download Full-text

Integrating Audio Signal Processing and Deep Learning Algorithms for Gait Pattern Classification in Brazilian Gaited Horses

Frontiers in Animal Science ◽

10.3389/fanim.2021.681557 ◽

2021 ◽

Vol 2 ◽

Author(s):

Anderson Antonio Carvalho Alves ◽

Lucas Tassoni Andrietta ◽

Rafael Zinni Lopes ◽

Fernando Oliveira Bussiman ◽

Fabyano Fonseca e Silva ◽

...

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Deep Learning ◽

Audio Signal ◽

Gait Pattern ◽

Classification Performance ◽

Audio Signal Processing ◽

Gait Patterns ◽

Audio Features ◽

Gaited Horses

This study focused on assessing the usefulness of using audio signal processing in the gaited horse industry. A total of 196 short-time audio files (4 s) were collected from video recordings of Brazilian gaited horses. These files were converted into waveform signals (196 samples by 80,000 columns) and divided into training (N = 164) and validation (N = 32) datasets. Twelve single-valued audio features were initially extracted to summarize the training data according to the gait patterns (Marcha Batida—MB and Marcha Picada—MP). After preliminary analyses, high-dimensional arrays of the Mel Frequency Cepstral Coefficients (MFCC), Onset Strength (OS), and Tempogram (TEMP) were extracted and used as input information in the classification algorithms. A principal component analysis (PCA) was performed using the 12 single-valued features set and each audio-feature dataset—AFD (MFCC, OS, and TEMP) for prior data visualization. Machine learning (random forest, RF; support vector machine, SVM) and deep learning (multilayer perceptron neural networks, MLP; convolution neural networks, CNN) algorithms were used to classify the gait types. A five-fold cross-validation scheme with 10 repetitions was employed for assessing the models' predictive performance. The classification performance across models and AFD was also validated with independent observations. The models and AFD were compared based on the classification accuracy (ACC), specificity (SPEC), sensitivity (SEN), and area under the curve (AUC). In the logistic regression analysis, five out of the 12 audio features extracted were significant (p < 0.05) between the gait types. ACC averages ranged from 0.806 to 0.932 for MFCC, from 0.758 to 0.948 for OS and, from 0.936 to 0.968 for TEMP. Overall, the TEMP dataset provided the best classification accuracies for all models. The most suitable method for audio-based horse gait pattern classification was CNN. Both cross and independent validation schemes confirmed that high values of ACC, SPEC, SEN, and AUC are expected for yet-to-be-observed labels, except for MFCC-based models, in which clear overfitting was observed. Using audio-generated data for describing gait phenotypes in Brazilian horses is a promising approach, as the two gait patterns were correctly distinguished. The highest classification performance was achieved by combining CNN and the rhythmic-descriptive AFD.

Download Full-text

AN OVERVIEW OF METHODS FOR GENERATING, AUGMENTING AND EVALUATING ROOM IMPULSE RESPONSE USING ARTIFICIAL NEURAL NETWORKS

Mokslas - Lietuvos ateitis ◽

10.3846/mla.2021.15152 ◽

2021 ◽

Vol 13 (0) ◽

pp. 1-5

Author(s):

Mantas Tamulionis

Keyword(s):

Neural Networks ◽

Signal Processing ◽

Artificial Neural Networks ◽

Speech Recognition ◽

Impulse Response ◽

Automatic Speech Recognition ◽

Audio Signal ◽

Training Data ◽

Audio Signal Processing ◽

Artificial Neural

Methods based on artificial neural networks (ANN) are widely used in various audio signal processing tasks. This provides opportunities to optimize processes and save resources required for calculations. One of the main objects we need to get to numerically capture the acoustics of a room is the room impulse response (RIR). Increasingly, research authors choose not to record these impulses in a real room but to generate them using ANN, as this gives them the freedom to prepare unlimited-sized training datasets. Neural networks are also used to augment the generated impulses to make them similar to the ones actually recorded. The widest use of ANN so far is observed in the evaluation of the generated results, for example, in automatic speech recognition (ASR) tasks. This review also describes datasets of recorded RIR impulses commonly found in various studies that are used as training data for neural networks.

Download Full-text

Estado da arte no monitoramento acústico de Cicadidae em lavouras de café

Revista Macambira ◽

10.35642/rm.v5i1.562 ◽

2021 ◽

Vol 5 (1) ◽

pp. e051007

Author(s):

João Paulo Lemos Escola ◽

Rodrigo Guido ◽

Ivan Nunes Da Silva ◽

Douglas Henrique Bottura Maccagnan ◽

Alexandre de Moraes Cardoso ◽

...

Keyword(s):

Signal Processing ◽

Audio Signal ◽

Acoustic Monitoring ◽

Emission Characteristic ◽

Digital Audio ◽

Audio Signal Processing ◽

Sound Emission ◽

Coffee Plantations ◽

Monitoring Software

As cigarras são uma praga-chave das lavouras de café com a característica marcante de emissão de sons. É de interesse do produtor o desenvolvimento de ferramentas para monitoramento das lavouras, buscando economia de recursos. Este trabalho realiza um levantamento bibliográfico dos trabalhos focados no monitoramento acústico de Cicadidae procurando soluções disponíveis para auxílio do produtor. Os resultados mostraram que a maioria dos trabalhos encontrados (52,3%) utiliza softwares em laboratório e apenas 4,7% utiliza algum software para monitoramento implantado em campo. Conclui-se, a partir desse dado, que essa pode ser uma importante lacuna a ser preenchida em trabalhos futuros, por meio do processamento digital de sinais de áudio em um possível dispositivo implantado em lavoura. Abstract: Cicadas are a key pest of coffee plantations with a marked sound emission characteristic. It is of interest to the producer to develop tools to monitor crops in order to save resources. This research performs a bibliographic survey of works focused on the acoustic monitoring of Cicadidae, in order to look for available solutions to help producers. The results showed that the majority of the works founded (52.3%) are related to the use of softwares in laboratory, and only 4.7% use some monitoring software implemented in the field. Therefore, that must be an important gap to be filled in future works, applying digital audio signal processing in a possible device implanted in a field. Keywords: Bioacoustics, Digital Audio Signal Processing, Monitoring.

Download Full-text