MDCT Sinusoidal Analysis for Audio Signals Analysis and Processing

2013 ◽  
Vol 21 (7) ◽  
pp. 1403-1414 ◽  
Author(s):  
Shuhua Zhang ◽  
Weibei Dou ◽  
Huazhong Yang
Author(s):  
L. S. Chumbley ◽  
M. Meyer ◽  
K. Fredrickson ◽  
F.C. Laabs

The development of a scanning electron microscope (SEM) suitable for instructional purposes has created a large number of outreach opportunities for the Materials Science and Engineering (MSE) Department at Iowa State University. Several collaborative efforts are presently underway with local schools and the Department of Curriculum and Instruction (C&I) at ISU to bring SEM technology into the classroom in a near live-time, interactive manner. The SEM laboratory is shown in Figure 1.Interactions between the laboratory and the classroom use inexpensive digital cameras and shareware called CU-SeeMe, Figure 2. Developed by Cornell University and available over the internet, CUSeeMe provides inexpensive video conferencing capabilities. The software allows video and audio signals from Quikcam™ cameras to be sent and received between computers. A reflector site has been established in the MSE department that allows eight different computers to be interconnected simultaneously. This arrangement allows us to demonstrate SEM principles in the classroom. An Apple Macintosh has been configured to allow the SEM image to be seen using CU-SeeMe.


Author(s):  
Bharat Mirchandani ◽  
Pascal Perrier ◽  
Brigitte Grosgogeat ◽  
Christophe Jeannin

Abstract Objectives The mechanical interactions between tongue and palate are crucial for speech production and swallowing. In this study, we present examples of pressure signals that can be recorded with our PRESLA system (PRESLA holds for the French expression “PRESsion de la LAngue” [Pressure from the tongue]) to assess these motor functions, and we illustrate which issues can be tackled with such a system. Materials and Methods A single French-speaking edentulous subject, old wearer of a complete denture, with no speech production and swallowing disorders, was recorded during the production of nonsense words including French alveolar fricatives, and during dry and water swallowing. The PRESLA system used strain-gauge transducers that were inserted into holes drilled in the palatal surface of a duplicate of the prosthesis at six locations that were relevant for speech production and swallowing. Pressure signals were postsynchronized with the motor tasks based on audio signals. Results Patterns of temporal variations of the pressure exerted by the tongue on the palate are shown for the two studied motor tasks. It is shown for our single subject that patterns for fricative /s/ are essentially bell shaped, whereas pressure signals observed for water swallow begin with a maximum followed by a slow decrease during the rest of the positive pressure phase. Pressure magnitude is almost 20 times larger for water swallow than for /s/ production. Conclusions This study illustrates the usefulness of our PRESLA system for studying speech production and swallowing motor control under normal and pathological conditions.


2021 ◽  
Vol 7 (2) ◽  
pp. 356-362
Author(s):  
Harry Coppock ◽  
Alex Gaskell ◽  
Panagiotis Tzirakis ◽  
Alice Baird ◽  
Lyn Jones ◽  
...  

BackgroundSince the emergence of COVID-19 in December 2019, multidisciplinary research teams have wrestled with how best to control the pandemic in light of its considerable physical, psychological and economic damage. Mass testing has been advocated as a potential remedy; however, mass testing using physical tests is a costly and hard-to-scale solution.MethodsThis study demonstrates the feasibility of an alternative form of COVID-19 detection, harnessing digital technology through the use of audio biomarkers and deep learning. Specifically, we show that a deep neural network based model can be trained to detect symptomatic and asymptomatic COVID-19 cases using breath and cough audio recordings.ResultsOur model, a custom convolutional neural network, demonstrates strong empirical performance on a data set consisting of 355 crowdsourced participants, achieving an area under the curve of the receiver operating characteristics of 0.846 on the task of COVID-19 classification.ConclusionThis study offers a proof of concept for diagnosing COVID-19 using cough and breath audio signals and motivates a comprehensive follow-up research study on a wider data sample, given the evident advantages of a low-cost, highly scalable digital COVID-19 diagnostic tool.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 676
Author(s):  
Andrej Zgank

Animal activity acoustic monitoring is becoming one of the necessary tools in agriculture, including beekeeping. It can assist in the control of beehives in remote locations. It is possible to classify bee swarm activity from audio signals using such approaches. A deep neural networks IoT-based acoustic swarm classification is proposed in this paper. Audio recordings were obtained from the Open Source Beehive project. Mel-frequency cepstral coefficients features were extracted from the audio signal. The lossless WAV and lossy MP3 audio formats were compared for IoT-based solutions. An analysis was made of the impact of the deep neural network parameters on the classification results. The best overall classification accuracy with uncompressed audio was 94.09%, but MP3 compression degraded the DNN accuracy by over 10%. The evaluation of the proposed deep neural networks IoT-based bee activity acoustic classification showed improved results if compared to the previous hidden Markov models system.


Electronics ◽  
2021 ◽  
Vol 10 (11) ◽  
pp. 1349
Author(s):  
Stefan Lattner ◽  
Javier Nistal

Lossy audio codecs compress (and decompress) digital audio streams by removing information that tends to be inaudible in human perception. Under high compression rates, such codecs may introduce a variety of impairments in the audio signal. Many works have tackled the problem of audio enhancement and compression artifact removal using deep-learning techniques. However, only a few works tackle the restoration of heavily compressed audio signals in the musical domain. In such a scenario, there is no unique solution for the restoration of the original signal. Therefore, in this study, we test a stochastic generator of a Generative Adversarial Network (GAN) architecture for this task. Such a stochastic generator, conditioned on highly compressed musical audio signals, could one day generate outputs indistinguishable from high-quality releases. Therefore, the present study may yield insights into more efficient musical data storage and transmission. We train stochastic and deterministic generators on MP3-compressed audio signals with 16, 32, and 64 kbit/s. We perform an extensive evaluation of the different experiments utilizing objective metrics and listening tests. We find that the models can improve the quality of the audio signals over the MP3 versions for 16 and 32 kbit/s and that the stochastic generators are capable of generating outputs that are closer to the original signals than those of the deterministic generators.


2021 ◽  
Vol 11 (13) ◽  
pp. 5913
Author(s):  
Zhuang He ◽  
Yin Feng

Automatic singing transcription and analysis from polyphonic music records are essential in a number of indexing techniques for computational auditory scenes. To obtain a note-level sequence in this work, we divide the singing transcription task into two subtasks: melody extraction and note transcription. We construct a salience function in terms of harmonic and rhythmic similarity and a measurement of spectral balance. Central to our proposed method is the measurement of melody contours, which are calculated using edge searching based on their continuity properties. We calculate the mean contour salience by separating melody analysis from the adjacent breakpoint connective strength matrix, and we select the final melody contour to determine MIDI notes. This unique method, combining audio signals with image edge analysis, provides a more interpretable analysis platform for continuous singing signals. Experimental analysis using Music Information Retrieval Evaluation Exchange (MIREX) datasets shows that our technique achieves promising results both for audio melody extraction and polyphonic singing transcription.


Author(s):  
Sören Schulze ◽  
Emily J. King

AbstractWe propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this algorithm to convert an STFT spectrogram from the recording into a novel form of log-frequency spectrogram whose resolution exceeds that of the mel spectrogram. We then make use of the pitch-invariant properties of that representation in order to identify the sounds of the instruments via the same sparse pursuit method. As the model parameters which characterize the musical instruments are not known beforehand, we train a dictionary that contains them, using a modified version of Adam. Applying the algorithm on various audio samples, we find that it is capable of producing high-quality separation results when the model assumptions are satisfied and the instruments are clearly distinguishable, but combinations of instruments with similar spectral characteristics pose a conceptual difficulty. While a key feature of the model is that it explicitly models inharmonicity, its presence can also still impede performance of the sparse pursuit algorithm. In general, due to its pitch-invariance, our method is especially suitable for dealing with spectra from acoustic instruments, requiring only a minimal number of hyperparameters to be preset. Additionally, we demonstrate that the dictionary that is constructed for one recording can be applied to a different recording with similar instruments without additional training.


Sign in / Sign up

Export Citation Format

Share Document