MDCT Sinusoidal Analysis for Audio Signals Analysis and Processing

The development of a scanning electron microscope (SEM) suitable for instructional purposes has created a large number of outreach opportunities for the Materials Science and Engineering (MSE) Department at Iowa State University. Several collaborative efforts are presently underway with local schools and the Department of Curriculum and Instruction (C&I) at ISU to bring SEM technology into the classroom in a near live-time, interactive manner. The SEM laboratory is shown in Figure 1.Interactions between the laboratory and the classroom use inexpensive digital cameras and shareware called CU-SeeMe, Figure 2. Developed by Cornell University and available over the internet, CUSeeMe provides inexpensive video conferencing capabilities. The software allows video and audio signals from Quikcam™ cameras to be sent and received between computers. A reflector site has been established in the MSE department that allows eight different computers to be interconnected simultaneously. This arrangement allows us to demonstrate SEM principles in the classroom. An Apple Macintosh has been configured to allow the SEM image to be seen using CU-SeeMe.

Download Full-text

Detection of Hidden Messages Embedded in Audio Signals by Hide4PGP

Journal of Automation and Information Sciences ◽

10.1615/jautomatinfscien.v45.i5.70 ◽

2013 ◽

Vol 45 (5) ◽

pp. 75-81

Author(s):

Natalya V. Koshkina

Keyword(s):

Audio Signals

Download Full-text

Accurate Tongue–Palate Pressure Sensing Device to Study Speech Production and Swallowing in Patients with Complete Denture

European Journal of Dentistry ◽

10.1055/s-0040-1717002 ◽

2021 ◽

Author(s):

Bharat Mirchandani ◽

Pascal Perrier ◽

Brigitte Grosgogeat ◽

Christophe Jeannin

Keyword(s):

Speech Production ◽

Temporal Variations ◽

Complete Denture ◽

Positive Pressure ◽

Single Subject ◽

Audio Signals ◽

Motor Tasks ◽

Pressure Sensing ◽

French Speaking ◽

Swallowing Pressure

Abstract Objectives The mechanical interactions between tongue and palate are crucial for speech production and swallowing. In this study, we present examples of pressure signals that can be recorded with our PRESLA system (PRESLA holds for the French expression “PRESsion de la LAngue” [Pressure from the tongue]) to assess these motor functions, and we illustrate which issues can be tackled with such a system. Materials and Methods A single French-speaking edentulous subject, old wearer of a complete denture, with no speech production and swallowing disorders, was recorded during the production of nonsense words including French alveolar fricatives, and during dry and water swallowing. The PRESLA system used strain-gauge transducers that were inserted into holes drilled in the palatal surface of a duplicate of the prosthesis at six locations that were relevant for speech production and swallowing. Pressure signals were postsynchronized with the motor tasks based on audio signals. Results Patterns of temporal variations of the pressure exerted by the tongue on the palate are shown for the two studied motor tasks. It is shown for our single subject that patterns for fricative /s/ are essentially bell shaped, whereas pressure signals observed for water swallow begin with a maximum followed by a slow decrease during the rest of the positive pressure phase. Pressure magnitude is almost 20 times larger for water swallow than for /s/ production. Conclusions This study illustrates the usefulness of our PRESLA system for studying speech production and swallowing motor control under normal and pathological conditions.

Download Full-text

Chaos Game Representation of Audio Signals

2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC) ◽

10.1109/i2mtc50364.2021.9459942 ◽

2021 ◽

Author(s):

Madison Cohen-McFarlane ◽

Kevin Dick ◽

James R. Green ◽

Rafik Goubran

Keyword(s):

Audio Signals ◽

Chaos Game Representation ◽

Chaos Game ◽

Game Representation

Download Full-text

End-to-end convolutional neural network enables COVID-19 detection from breath and cough audio: a pilot study

BMJ Innovations ◽

10.1136/bmjinnov-2021-000668 ◽

2021 ◽

Vol 7 (2) ◽

pp. 356-362

Author(s):

Harry Coppock ◽

Alex Gaskell ◽

Panagiotis Tzirakis ◽

Alice Baird ◽

Lyn Jones ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Low Cost ◽

Area Under The Curve ◽

Alternative Form ◽

Economic Damage ◽

Audio Signals ◽

Operating Characteristics ◽

Data Set ◽

Empirical Performance

BackgroundSince the emergence of COVID-19 in December 2019, multidisciplinary research teams have wrestled with how best to control the pandemic in light of its considerable physical, psychological and economic damage. Mass testing has been advocated as a potential remedy; however, mass testing using physical tests is a costly and hard-to-scale solution.MethodsThis study demonstrates the feasibility of an alternative form of COVID-19 detection, harnessing digital technology through the use of audio biomarkers and deep learning. Specifically, we show that a deep neural network based model can be trained to detect symptomatic and asymptomatic COVID-19 cases using breath and cough audio recordings.ResultsOur model, a custom convolutional neural network, demonstrates strong empirical performance on a data set consisting of 355 crowdsourced participants, achieving an area under the curve of the receiver operating characteristics of 0.846 on the task of COVID-19 classification.ConclusionThis study offers a proof of concept for diagnosing COVID-19 using cough and breath audio signals and motivates a comprehensive follow-up research study on a wider data sample, given the evident advantages of a low-cost, highly scalable digital COVID-19 diagnostic tool.

Download Full-text

IoT-Based Bee Swarm Activity Acoustic Classification Using Deep Neural Networks

Sensors ◽

10.3390/s21030676 ◽

2021 ◽

Vol 21 (3) ◽

pp. 676

Author(s):

Andrej Zgank

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Markov Models ◽

Audio Signal ◽

Audio Signals ◽

Mel Frequency Cepstral Coefficients ◽

Animal Activity ◽

The Impact ◽

Acoustic Classification ◽

Swarm Activity

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.

Download Full-text

Stochastic Restoration of Heavily Compressed Musical Audio Using Generative Adversarial Networks

Electronics ◽

10.3390/electronics10111349 ◽

2021 ◽

Vol 10 (11) ◽

pp. 1349

Author(s):

Stefan Lattner ◽

Javier Nistal

Keyword(s):

Data Storage ◽

Audio Signal ◽

Human Perception ◽

Generative Adversarial Networks ◽

Audio Signals ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Extensive Evaluation ◽

Listening Tests ◽

Musical Audio

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.

Download Full-text

Singing Transcription from Polyphonic Music Using Melody Contour Filtering

Applied Sciences ◽

10.3390/app11135913 ◽

2021 ◽

Vol 11 (13) ◽

pp. 5913

Author(s):

Zhuang He ◽

Yin Feng

Keyword(s):

Audio Signals ◽

Continuity Properties ◽

Information Retrieval Evaluation ◽

Unique Method ◽

Image Edge ◽

Salience Function ◽

Music Information ◽

Melody Contour ◽

Analysis Platform ◽

Polyphonic Music

Automatic singing transcription and analysis from polyphonic music records are essential in a number of indexing techniques for computational auditory scenes. To obtain a note-level sequence in this work, we divide the singing transcription task into two subtasks: melody extraction and note transcription. We construct a salience function in terms of harmonic and rhythmic similarity and a measurement of spectral balance. Central to our proposed method is the measurement of melody contours, which are calculated using edge searching based on their continuity properties. We calculate the mean contour salience by separating melody analysis from the adjacent breakpoint connective strength matrix, and we select the final melody contour to determine MIDI notes. This unique method, combining audio signals with image edge analysis, provides a more interpretable analysis platform for continuous singing signals. Experimental analysis using Music Information Retrieval Evaluation Exchange (MIREX) datasets shows that our technique achieves promising results both for audio melody extraction and polyphonic singing transcription.

Download Full-text

Digit Recognition Applied to Reconstructed Audio Signals Using Deep Learning

2020 25th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr48806.2021.9413183 ◽

2021 ◽

Author(s):

Anastasia-Sotiria Toufa ◽

Constantine Kotropoulos

Keyword(s):

Deep Learning ◽

Audio Signals ◽

Digit Recognition

Download Full-text

Sparse pursuit and dictionary learning for blind source separation in polyphonic music recordings

EURASIP Journal on Audio Speech and Music Processing ◽

10.1186/s13636-020-00190-4 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Sören Schulze ◽

Emily J. King

Keyword(s):

Single Channel ◽

Spectral Characteristics ◽

Musical Instruments ◽

Model Parameters ◽

Blind Separation ◽

Audio Signals ◽

Frequency Spectra ◽

Acoustic Instruments ◽

Invariant Properties ◽

Music Recordings

AbstractWe propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this algorithm to convert an STFT spectrogram from the recording into a novel form of log-frequency spectrogram whose resolution exceeds that of the mel spectrogram. We then make use of the pitch-invariant properties of that representation in order to identify the sounds of the instruments via the same sparse pursuit method. As the model parameters which characterize the musical instruments are not known beforehand, we train a dictionary that contains them, using a modified version of Adam. Applying the algorithm on various audio samples, we find that it is capable of producing high-quality separation results when the model assumptions are satisfied and the instruments are clearly distinguishable, but combinations of instruments with similar spectral characteristics pose a conceptual difficulty. While a key feature of the model is that it explicitly models inharmonicity, its presence can also still impede performance of the sparse pursuit algorithm. In general, due to its pitch-invariance, our method is especially suitable for dealing with spectra from acoustic instruments, requiring only a minimal number of hyperparameters to be preset. Additionally, we demonstrate that the dictionary that is constructed for one recording can be applied to a different recording with similar instruments without additional training.

Download Full-text