scholarly journals Incorporation of phase information for improved time-dependent instrument recognition

2020 ◽  
Vol 87 (s1) ◽  
pp. s62-s67
Author(s):  
Markus Schwabe ◽  
Omar Elaiashy ◽  
Fernando Puente León

AbstractTime-dependent estimation of playing instruments in music recordings is an important preprocessing for several music signal processing algorithms. In this approach, instrument recognition is realized by neural networks with a two-dimensional input of short-time Fourier transform (STFT) magnitudes and a time-frequency representation based on phase information. The modified group delay (MODGD) function and the product spectrum (PS), which is based on MODGD, are analysed as phase representations. Training and evaluation processes are executed based on the MusicNet dataset. By the incorporation of PS in the input, instrument recognition can be improved about 2% in F1-score.

2021 ◽  
Vol 88 (5) ◽  
pp. 274-281
Author(s):  
Markus Schwabe ◽  
Michael Heizmann

Abstract An important preprocessing step for several music signal processing algorithms is the estimation of playing instruments in music recordings. To this aim, time-dependent instrument recognition is realized by a neural network with residual blocks in this approach. Since music signal processing tasks use diverse time-frequency representations as input matrices, the influence of different input representations for instrument recognition is analyzed in this work. Three-dimensional inputs of short-time Fourier transform (STFT) magnitudes and an additional time-frequency representation based on phase information are investigated as well as two-dimensional STFT or constant-Q transform (CQT) magnitudes. As additional phase representations, the product spectrum (PS), based on the modified group delay, and the frequency error (FE) matrix, related to the instantaneous frequency, are used. Training and evaluation processes are executed based on the MusicNet dataset, which enables the estimation of seven instruments. With a higher number of frequency bins in the input representations, an improved instrument recognition of about 2 % in F1-score can be achieved. Compared to the literature, frame-level instrument recognition can be improved for different input representations.


2021 ◽  
Vol 11 (6) ◽  
pp. 2582
Author(s):  
Lucas M. Martinho ◽  
Alan C. Kubrusly ◽  
Nicolás Pérez ◽  
Jean Pierre von der Weid

The focused signal obtained by the time-reversal or the cross-correlation techniques of ultrasonic guided waves in plates changes when the medium is subject to strain, which can be used to monitor the medium strain level. In this paper, the sensitivity to strain of cross-correlated signals is enhanced by a post-processing filtering procedure aiming to preserve only strain-sensitive spectrum components. Two different strategies were adopted, based on the phase of either the Fourier transform or the short-time Fourier transform. Both use prior knowledge of the system impulse response at some strain level. The technique was evaluated in an aluminum plate, effectively providing up to twice higher sensitivity to strain. The sensitivity increase depends on a phase threshold parameter used in the filtering process. Its performance was assessed based on the sensitivity gain, the loss of energy concentration capability, and the value of the foreknown strain. Signals synthesized with the time–frequency representation, through the short-time Fourier transform, provided a better tradeoff between sensitivity gain and loss of energy concentration.


2015 ◽  
Vol 12 (03) ◽  
pp. 1550021 ◽  
Author(s):  
M. A. Al-Manie ◽  
W. J. Wang

Due to the advantages offered by the S-transform (ST) distribution, it has been recently successfully implemented for various applications such as seismic and image processing. The desirable properties of the ST include a globally referenced phase as the case with the short time Fourier transform (STFT) while offering a higher spectral resolution as the wavelet transform (WT). However, this estimator suffers from some inherent disadvantages seen as poor energy concentration with higher frequencies. In order to improve the performance of the distribution, a modification to the existing technique is proposed. Additional parameters are proposed to control the window's width which can greatly enhance the signal representation in the time–frequency plane. The new estimator's performance is evaluated using synthetic signals as well as biomedical data. The required features of the ST which include invertability and phase information are still preserved.


2020 ◽  
Vol 65 (4) ◽  
pp. 379-391 ◽  
Author(s):  
Hasan Polat ◽  
Mehmet Ufuk Aluçlu ◽  
Mehmet Siraç Özerdem

AbstractThe general uncertainty of epilepsy and its unpredictable seizures often affect badly the quality of life of people exposed to this disease. There are patients who can be considered fortunate in terms of prediction of any seizures. These are patients with epileptic auras. In this study, it was aimed to evaluate pre-seizure warning symptoms of the electroencephalography (EEG) signals by a convolutional neural network (CNN) inspired by the epileptic auras defined in the medical field. In this context, one-dimensional EEG signals were transformed into a spectrogram display form in the frequency-time domain by applying a short-time Fourier transform (STFT). Systemic changes in pre-epileptic seizure have been described by applying the CNN approach to the EEG signals represented in the image form, and the subjective EEG-Aura process has been tried to be determined for each patient. Considering all patients included in the evaluation, it was determined that the 1-min interval covering the time from the second minute to the third minute before the seizure had the highest mean and the lowest variance to determine the systematic changes before the seizure. Thus, the highest performing process is described as EEG-Aura. The average success for the EEG-Aura process was 90.38 ± 6.28%, 89.78 ± 8.34% and 90.47 ± 5.95% for accuracy, specificity and sensitivity, respectively. Through the proposed model, epilepsy patients who do not respond to medical treatment methods are expected to maintain their lives in a more comfortable and integrated way.


1997 ◽  
Vol 2 (3) ◽  
pp. 193-205 ◽  
Author(s):  
PAUL MASRI ◽  
ANDREW BATEMAN ◽  
NISHAN CANAGARAJAH

Analysis–resynthesis (A–R) systems gain their flexibility for creative transformation of sound by representing sound as a set of musically useful features. The analysis process extracts these features from the time domain signal by means of a time–frequency representation (TFR). The TFR provides an intermediate representation of sound that must make the features accessible and measurable to the rest of the analysis. Until very recently, the short-time Fourier transform (STFT) has been the obvious choice for time–frequency representation, despite its limitations in terms of resolution. Recent and ongoing developments are providing several alternative schemes that allow for a more considered choice of TFR. This paper reviews these contemporary approaches in comparison with the more classical ones and with reference to their applicability, merits and shortcomings for application to sound analysis. (Where they have been successfully applied, details are provided.) The techniques reviewed include linear, bilinear and higher-order spectra, nonparametric and parametric methods and some sound-model-specific TFRs.


2018 ◽  
Vol 173 ◽  
pp. 03054
Author(s):  
Xueqin Zhang ◽  
Ruolun Liu

The Chirplet Transform (CT) is effective in the characterization of IF for mono-component linear-frequency-modulated signal. However, During the initialization process, using the peak of the time-frequency map of the short-time Fourier transform to fit the line is greatly affected by noise. For the multi-component signals, it is more difficult to distinguish and fit different IF lines. Since the Hough is good at a common algorithm for the line detection, the ridge edge fitting is replaced by the Hough transform in this paper. The experiment results show significant improvement in the obtained time-frequency representation.


2017 ◽  
Vol 2017 ◽  
pp. 1-14 ◽  
Author(s):  
Junbo Long ◽  
Haibin Wang ◽  
Daifeng Zha ◽  
Hongshe Fan ◽  
Zefeng Lao ◽  
...  

The short time Fourier transform time-frequency representation (STFT-TFR) method degenerates, and the corresponding short time Fourier transform time-frequency filtering (STFT-TFF) method fails underαstable distribution noise environment. A fractional low order short time Fourier transform (FLOSTFT) which takes advantage of fractionalporder moment is proposed forαstable distribution noise environment, and the corresponding FLOSTFT time-frequency representation (FLOSTFT-TFR) algorithm is presented in this paper. We study vector formulation of the FLOSTFT and inverse FLOSTFT (IFLOSTFT) methods and propose a FLOSTFT time-frequency filtering (FLOSTFT-TFF) method which takes advantage of time-frequency localized spectra of the signal in time-frequency domain. The simulation results show that, employing the FLOSTFT-TFR method and the FLOSTFT-TFF method with an adaptive weight function, time-frequency distribution of the signals can be better gotten and time-frequency localized region of the signal can be effectively extracted fromαstable distribution noise, and also the original signal can be restored employing the IFLOSTFT method. Their performances are better than the STFT-TFR and STFT-TFF methods, and MSEs are smaller in differentαand GSNR cases. Finally, we apply the FLOSTFT-TFR and FLOSTFT-TFF methods to extract fault features of the bearing outer race fault signal and restore the original fault signal fromαstable distribution noise; the experimental results illustrate their performances.


Author(s):  
Prof. M. Senthil Vadivu ◽  
Saranya H ◽  
Vijay Kumar K S

The objective of the project is to improve maternal abdomen recording for better prediction of foetal Electrocardiogram (FECG). One of the most difficult tasks in observing foetal well-being is obtaining a clean foetal Electrocardiogram (FECG) using non-invasive abdominal recordings. The foetal graph's low signal quality, on the other hand, makes morphological examination of its wave structure in clinical follow-up difficult. The signal contains precise information that can help doctors to monitor fetal health during pregnancy and labor. The abdominal signal is normalized and separated in the pre-processing stage for wave shape analysis in clinical follow-up. The Kaiser window is used for spectral analysis and segmenting the signal. The two-dimensional (2D) time-frequency representation is obtained by short-time Fourier transform (STFT). The STFT enhances the abdominal recordings of maternal Electrocardiogram (MECG) for efficient separation of foetal electrocardiogram (FECG) to monitor the foetus well-being.


2020 ◽  
Vol 9 (1) ◽  
pp. 41-48
Author(s):  
Jans Hendry ◽  
Isnan Nur Rifai ◽  
Yoga Mileniandi

The Short-time Fourier transform (STFT) is a popular time-frequency representation in many source separation problems. In this work, the sampled and discretized version of Discrete Gabor Transform (DGT) is proposed to replace STFT within the single-channel source separation problem of the Non-negative Matrix Factorization (NMF) framework. The result shows that NMF-DGT is better than NMF-STFT according to Signal-to-Interference Ratio (SIR), Signal-to-Artifact Ratio (SAR), and Signal-to-Distortion Ratio (SDR). In the supervised scheme, NMF-DGT has a SIR of 18.60 dB compared to 16.24 dB in NMF-STFT, SAR of 13.77 dB to 13.69 dB, and SDR of 12.45 dB to 11.16 dB. In the unsupervised scheme, NMF-DGT has a SIR of 0.40 dB compared to 0.27 dB by NMF-STFT, SAR of -10.21 dB to -10.36 dB, and SDR of -15.01 dB to -15.23 dB.


Sign in / Sign up

Export Citation Format

Share Document