Influence of input data representations for time-dependent instrument recognition

2021 ◽  
Vol 88 (5) ◽  
pp. 274-281
Author(s):  
Markus Schwabe ◽  
Michael Heizmann

Abstract An important preprocessing step for several music signal processing algorithms is the estimation of playing instruments in music recordings. To this aim, time-dependent instrument recognition is realized by a neural network with residual blocks in this approach. Since music signal processing tasks use diverse time-frequency representations as input matrices, the influence of different input representations for instrument recognition is analyzed in this work. Three-dimensional inputs of short-time Fourier transform (STFT) magnitudes and an additional time-frequency representation based on phase information are investigated as well as two-dimensional STFT or constant-Q transform (CQT) magnitudes. As additional phase representations, the product spectrum (PS), based on the modified group delay, and the frequency error (FE) matrix, related to the instantaneous frequency, are used. Training and evaluation processes are executed based on the MusicNet dataset, which enables the estimation of seven instruments. With a higher number of frequency bins in the input representations, an improved instrument recognition of about 2 % in F1-score can be achieved. Compared to the literature, frame-level instrument recognition can be improved for different input representations.

2020 ◽  
Vol 87 (s1) ◽  
pp. s62-s67
Author(s):  
Markus Schwabe ◽  
Omar Elaiashy ◽  
Fernando Puente León

AbstractTime-dependent estimation of playing instruments in music recordings is an important preprocessing for several music signal processing algorithms. In this approach, instrument recognition is realized by neural networks with a two-dimensional input of short-time Fourier transform (STFT) magnitudes and a time-frequency representation based on phase information. The modified group delay (MODGD) function and the product spectrum (PS), which is based on MODGD, are analysed as phase representations. Training and evaluation processes are executed based on the MusicNet dataset. By the incorporation of PS in the input, instrument recognition can be improved about 2% in F1-score.


Energies ◽  
2021 ◽  
Vol 14 (13) ◽  
pp. 3725
Author(s):  
Paweł Zimroz ◽  
Paweł Trybała ◽  
Adam Wróblewski ◽  
Mateusz Góralczyk ◽  
Jarosław Szrek ◽  
...  

The possibility of the application of an unmanned aerial vehicle (UAV) in search and rescue activities in a deep underground mine has been investigated. In the presented case study, a UAV is searching for a lost or injured human who is able to call for help but is not able to move or use any communication device. A UAV capturing acoustic data while flying through underground corridors is used. The acoustic signal is very noisy since during the flight the UAV contributes high-energetic emission. The main goal of the paper is to present an automatic signal processing procedure for detection of a specific sound (supposed to contain voice activity) in presence of heavy, time-varying noise from UAV. The proposed acoustic signal processing technique is based on time-frequency representation and Euclidean distance measurement between reference spectrum (UAV noise only) and captured data. As both the UAV and “injured” person were equipped with synchronized microphones during the experiment, validation has been performed. Two experiments carried out in lab conditions, as well as one in an underground mine, provided very satisfactory results.


Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5466 ◽  
Author(s):  
Xinrui Jiang ◽  
Ye Zhang ◽  
Qi Yang ◽  
Bin Deng ◽  
Hongqiang Wang

At present, there are two obvious problems in radar-based gait recognition. First, the traditional radar frequency band is difficult to meet the requirements of fine identification with due to its low carrier frequency and limited micro-Doppler resolution. Another significant problem is that radar signal processing is relatively complex, and the existing signal processing algorithms are poor in real-time usability, robustness and universality. This paper focuses on the two basic problems of human gait detection with radar and proposes a human gait classification and recognition method based on millimeter-wave array radar. Based on deep-learning technology, a multi-channel three-dimensional convolution neural network is proposed on the basis of improving the residual network, which completes the classification and recognition of human gait through the hierarchical extraction and fusion of multi-dimensional features. Taking the three-dimensional coordinates, motion speed and intensity of strong scattering points in the process of target motion as network inputs, multi-channel convolution is used to extract motion features, and the classification and recognition of typical daily actions are completed. The experimental results show that we have more than 92.5% recognition accuracy for common gait categories such as jogging and normal walking.


2006 ◽  
Vol 321-323 ◽  
pp. 1237-1240
Author(s):  
Sang Kwon Lee ◽  
Jung Soo Lee

Impulsive vibration signals in gearbox are often associated with faults, which lead to due to irregular impacting. Thus these impulsive vibration signals can be used as indicators of machinery faults. However it is often difficult to make objective measurement of impulsive signals because of background noise signals. In order to ease the measurement of impulsive signal embedded in background noise, we enhance the impulsive signals using adaptive signal processing and then analyze them in time and frequency domain by using time-frequency representation. This technique is applied to the diagnosis of faults within laboratory gearbox.


Author(s):  
Sang-Kwon Lee ◽  
Paul R. White

Abstract Impulsive sound and vibration signals in rotating machinery are often associated with faults which lead to due to irregular impacting. Thus these impulsive sound and vibration signals can be used as indicators of machinery faults. However it is often difficult to make objective measurement of impulsive signals because of background noise signals. In order to ease the measurement of impulsive sounds embedded in background noise, we enhance the impulsive signals using adaptive signal processing and then analyze them in time and frequency domain by using time-frequency representation. This technique is applied to the diagnosis of faults within internal combustion engine and industrial gear.


2020 ◽  
Author(s):  
Chen Ming ◽  
Stephanie Haro ◽  
Andrea Megela Simmons ◽  
James A. Simmons

AbstractComputational models of animal biosonar seek to identify critical aspects of echo processing responsible for the superior, real-time performance of echolocating bats and dolphins in target tracking and clutter rejection. The Spectrogram Correlation and Transformation (SCAT) model replicates aspects of biosonar imaging in both species by processing wideband biosonar sounds and echoes with auditory mechanisms identified from experiments with bats. The model acquires broadband biosonar broadcasts and echoes, represents them as time-frequency spectrograms using parallel bandpass filters, translates the filtered signals into ten parallel amplitude threshold levels, and then operates on the resulting time-of-occurrence values at each frequency to estimate overall echo range delay. It uses the structure of the echo spectrum by depicting it as a series of local frequency nulls arranged regularly along the frequency axis of the spectrograms after dechirping them relative to the broadcast. Computations take place entirely on the timing of threshold-crossing events for each echo relative to threshold-events for the broadcast. Threshold-crossing times take into account amplitude-latency trading, a physiological feature absent from conventional digital signal processing. Amplitude-latency trading transposes the profile of amplitudes across frequencies into a profile of time-registrations across frequencies. Target shape is extracted from the spacing of the object’s individual acoustic reflecting points, or glints, using the mutual interference pattern of peaks and nulls in the echo spectrum. These are merged with the overall range-delay estimate to produce a delay-based reconstruction of the object’s distance as well as its glints. Clutter echoes indiscriminately activate multiple parts in the null-detecting system, which then produces the equivalent glint-delay spacings in images, thus blurring the overall echo-delay estimates by adding spurious glint delays to the image. Blurring acts as an anticorrelation process that rejects clutter intrusion into perceptions.Author summaryBats and dolphins use their biological sonar as a versatile, high-resolution perceptual system that performs at levels desirable in man-made sonar or radar systems. To capture the superior real-time capabilities of biosonar so they can be imported into the design of new man-made systems, we developed a computer model of the sonar receiver used by echolocating bats and dolphins. Our intention was to discover the processing methods responsible for the animals’ ability to find and identify targets, guide locomotion, and prevent classic types of sonar or radar interference that hamper performance of man-made systems in complex, rapidly-changing surroundings. We have identified several features of the ears, hearing, time-frequency representation, and auditory processing that are critical for organizing echo-processing methods and display manifested in the animals’ perceptions.


Sign in / Sign up

Export Citation Format

Share Document