Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals

2021 ◽

Vol 2021 (1) ◽

Author(s):

Sören Schulze ◽

Emily J. King

Keyword(s):

Single Channel ◽

Spectral Characteristics ◽

Musical Instruments ◽

Model Parameters ◽

Blind Separation ◽

Audio Signals ◽

Frequency Spectra ◽

Acoustic Instruments ◽

Invariant Properties ◽

Music Recordings

AbstractWe propose an algorithm for the blind separation of single-channel audio signals. It is based on a parametric model that describes the spectral properties of the sounds of musical instruments independently of pitch. We develop a novel sparse pursuit algorithm that can match the discrete frequency spectra from the recorded signal with the continuous spectra delivered by the model. We first use this algorithm to convert an STFT spectrogram from the recording into a novel form of log-frequency spectrogram whose resolution exceeds that of the mel spectrogram. We then make use of the pitch-invariant properties of that representation in order to identify the sounds of the instruments via the same sparse pursuit method. As the model parameters which characterize the musical instruments are not known beforehand, we train a dictionary that contains them, using a modified version of Adam. Applying the algorithm on various audio samples, we find that it is capable of producing high-quality separation results when the model assumptions are satisfied and the instruments are clearly distinguishable, but combinations of instruments with similar spectral characteristics pose a conceptual difficulty. While a key feature of the model is that it explicitly models inharmonicity, its presence can also still impede performance of the sparse pursuit algorithm. In general, due to its pitch-invariance, our method is especially suitable for dealing with spectra from acoustic instruments, requiring only a minimal number of hyperparameters to be preset. Additionally, we demonstrate that the dictionary that is constructed for one recording can be applied to a different recording with similar instruments without additional training.

Download Full-text

Systematic Errors Occur in the Discrimination of Complex Audio Signals

Proceedings of the Human Factors Society Annual Meeting ◽

10.1518/107118192786751916 ◽

1992 ◽

Vol 36 (3) ◽

pp. 263-267

Author(s):

Jeffrey M. Gerth

Keyword(s):

Duty Cycle ◽

Temporal Pattern ◽

Temporal Patterns ◽

Systematic Errors ◽

Second Sound ◽

Audio Signals ◽

Complex Sounds ◽

Number Of Components ◽

Sound Category ◽

Sound Duration

Previous research suggests that the temporal pattern of dissimilar sounds may be a basis for confusion. To extend this research, the present study used complex sounds formed by simultaneously playing components drawn from four sound categories. Four temporal patterns, determined by sound duration and duty cycle were also used, producing a total of 16 basic components. The density (i.e., number of components played simultaneously) ranged from one to four. Subjects heard a sequence of two complex sounds and judged whether they were same of different. For trials in which the sounds differed, there were three possible manipulations: the addition of a component, the deletion of a component, and the substitution of one component for another. Overall accuracy was 94 percent across the 144 dissimilar sound complexes. As density increased, a significantly greater number of errors occurred for all classes of manipulations. Changes in individual temporal patterns across a variety of manipulations of sounds involving adding, deleting and substituting components were accurately discriminated. Subjects were least accurate in detecting substitutions of a pattern. A single sound category was identified in error prone sequences which was most often involved as the changing component from first to second sound presentation. Suggestions for the design of easily discriminated sounds are discussed.

Download Full-text

Blind Source Separation in Polyphonic Music Recordings Using Deep Neural Networks Trained via Policy Gradients

Signals ◽

10.3390/signals2040039 ◽

2021 ◽

Vol 2 (4) ◽

pp. 637-661

Author(s):

Sören Schulze ◽

Johannes Leuschner ◽

Emily J. King

Keyword(s):

Neural Network ◽

Spectral Characteristics ◽

Ground Truth ◽

Musical Instruments ◽

Model Parameters ◽

Blind Separation ◽

Audio Signals ◽

The Difference ◽

Model Training ◽

The Individual

We propose a method for the blind separation of sounds of musical instruments in audio signals. We describe the individual tones via a parametric model, training a dictionary to capture the relative amplitudes of the harmonics. The model parameters are predicted via a U-Net, which is a type of deep neural network. The network is trained without ground truth information, based on the difference between the model prediction and the individual time frames of the short-time Fourier transform. Since some of the model parameters do not yield a useful backpropagation gradient, we model them stochastically and employ the policy gradient instead. To provide phase information and account for inaccuracies in the dictionary-based representation, we also let the network output a direct prediction, which we then use to resynthesize the audio signals for the individual instruments. Due to the flexibility of the neural network, inharmonicity can be incorporated seamlessly and no preprocessing of the input spectra is required. Our algorithm yields high-quality separation results with particularly low interference on a variety of different audio samples, both acoustic and synthetic, provided that the sample contains enough data for the training and that the spectral characteristics of the musical instruments are sufficiently stable to be approximated by the dictionary.

Download Full-text

Outreach Opportunities Using the Instructional SEM at Iowa State University

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100164520 ◽

1996 ◽

Vol 54 ◽

pp. 412-413

Author(s):

L. S. Chumbley ◽

M. Meyer ◽

K. Fredrickson ◽

F.C. Laabs

Keyword(s):

Materials Science ◽

State University ◽

Digital Cameras ◽

Audio Signals ◽

Iowa State University ◽

Cornell University ◽

Sem Image ◽

Local Schools ◽

Materials Science And Engineering ◽

Video And Audio

The development of a scanning electron microscope (SEM) suitable for instructional purposes has created a large number of outreach opportunities for the Materials Science and Engineering (MSE) Department at Iowa State University. Several collaborative efforts are presently underway with local schools and the Department of Curriculum and Instruction (C&I) at ISU to bring SEM technology into the classroom in a near live-time, interactive manner. The SEM laboratory is shown in Figure 1.Interactions between the laboratory and the classroom use inexpensive digital cameras and shareware called CU-SeeMe, Figure 2. Developed by Cornell University and available over the internet, CUSeeMe provides inexpensive video conferencing capabilities. The software allows video and audio signals from Quikcam™ cameras to be sent and received between computers. A reflector site has been established in the MSE department that allows eight different computers to be interconnected simultaneously. This arrangement allows us to demonstrate SEM principles in the classroom. An Apple Macintosh has been configured to allow the SEM image to be seen using CU-SeeMe.

Download Full-text