Monaural Singing Voice and Accompaniment Separation Based on Gated Nested U-Net Architecture

This paper proposes a separation model adopting gated nested U-Net (GNU-Net) architecture, which is essentially a deeply supervised symmetric encoder–decoder network that can generate full-resolution feature maps. Through a series of nested skip pathways, it can reduce the semantic gap between the feature maps of encoder and decoder subnetworks. In the GNU-Net architecture, only the backbone not including nested part is applied with gated linear units (GLUs) instead of conventional convolutional networks. The outputs of GNU-Net are further fed into a time-frequency (T-F) mask layer to generate two masks of singing voice and accompaniment. Then, those two estimated masks along with the magnitude and phase spectra of mixture can be transformed into time-domain signals. We explored two types of T-F mask layer, discriminative training network and difference mask layer. The experiment results show the latter to be better. We evaluated our proposed model by comparing with three models, and also with ideal T-F masks. The results demonstrate that our proposed model outperforms compared models, and it’s performance comes near to ideal ratio mask (IRM). More importantly, our proposed model can output separated singing voice and accompaniment simultaneously, while the three compared models can only separate one source with trained model.

Download Full-text

Multi-channel spectrograms for speech processing applications using deep learning methods

Pattern Analysis and Applications ◽

10.1007/s10044-020-00921-5 ◽

2020 ◽

Author(s):

T. Arias-Vergara ◽

P. Klumpp ◽

J. C. Vasquez-Correa ◽

E. Nöth ◽

J. R. Orozco-Arroyave ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Speech Processing ◽

Single Channel ◽

Audio Signal ◽

Frequency Component ◽

Feature Maps ◽

Time Frequency ◽

Convolutional Networks ◽

Channel Input

Abstract Time–frequency representations of the speech signals provide dynamic information about how the frequency component changes with time. In order to process this information, deep learning models with convolution layers can be used to obtain feature maps. In many speech processing applications, the time–frequency representations are obtained by applying the short-time Fourier transform and using single-channel input tensors to feed the models. However, this may limit the potential of convolutional networks to learn different representations of the audio signal. In this paper, we propose a methodology to combine three different time–frequency representations of the signals by computing continuous wavelet transform, Mel-spectrograms, and Gammatone spectrograms and combining then into 3D-channel spectrograms to analyze speech in two different applications: (1) automatic detection of speech deficits in cochlear implant users and (2) phoneme class recognition to extract phone-attribute features. For this, two different deep learning-based models are considered: convolutional neural networks and recurrent neural networks with convolution layers.

Download Full-text

Convolutional Neural Networks based OSA Event Prediction from ECG Scalograms and Spectrograms

10.21203/rs.3.rs-381294/v1 ◽

2021 ◽

Author(s):

Huseyin Nasifoglu ◽

Osman Erogul

Keyword(s):

Time Domain ◽

Continuous Wavelet ◽

Short Time Fourier Transform ◽

Time Frequency ◽

Ecg Signals ◽

Event Prediction ◽

Proposed Model ◽

Obstructive Sleep ◽

Short Time ◽

Frequency Features

Abstract In this study, we conducted a comparative analysis of deep convolutional neural network (CNN) models in predicting Obstructive Sleep Apnea (OSA) using electrocardiograms. Unlike other studies in the literature, this study automatically extracts time-frequency features by using CNNs instead of manual feature extraction from ECG recordings. For this purpose, the proposed model generates scalogram and spectrogram representations by transforming preprocessed 30-sec ECG segments from time domain to the frequency domain using Continuous Wavelet Transform (CWT) and Short Time Fourier transform (STFT), respectively. We examined AlexNet, GoogleNet and ResNet18 models in predicting OSA events. The effect of transfer learning on success is also investigated. Based on the observed results, we proposed a new model that is found more effective in estimation. In total, 152 ECG recordings were included in the study for training and evaluation of the models. The prediction using scalograms immediately 30 seconds before potential OSA onsets gave the best performance with 82.30% accuracy, 83.22% sensitivity, 82.27% specificity and 82.95% positive predictive value. On the other hand, the prediction using spectrograms also provided up to 80.13% accuracy and 81.99% sensitivity on prediction. The results show that the proposed CNN model can be used as a good indicator to accurately predict OSA events using ECG signals.

Download Full-text

Separating Multiple Moving Sources by Microphone Array Signals for Wayside Acoustic Fault Diagnosis

Journal of Vibration and Acoustics ◽

10.1115/1.4043508 ◽

2019 ◽

Vol 141 (5) ◽

Author(s):

Wei Xiong ◽

Qingbo He ◽

Zhike Peng

Keyword(s):

Fault Diagnosis ◽

Time Domain ◽

Feature Matching ◽

Matching Pursuit ◽

Microphone Array ◽

Time Frequency ◽

Moving Sources ◽

The Time Domain ◽

Envelope Spectrum ◽

Threshold Setting

Wayside acoustic defective bearing detector (ADBD) system is a potential technique in ensuring the safety of traveling vehicles. However, Doppler distortion and multiple moving sources aliasing in the acquired acoustic signals decrease the accuracy of defective bearing fault diagnosis. Currently, the method of constructing time-frequency (TF) masks for source separation was limited by an empirical threshold setting. To overcome this limitation, this study proposed a dynamic Doppler multisource separation model and constructed a time domain-separating matrix (TDSM) to realize multiple moving sources separation in the time domain. The TDSM was designed with two steps of (1) constructing separating curves and time domain remapping matrix (TDRM) and (2) remapping each element of separating curves to its corresponding time according to the TDRM. Both TDSM and TDRM were driven by geometrical and motion parameters, which would be estimated by Doppler feature matching pursuit (DFMP) algorithm. After gaining the source components from the observed signals, correlation operation was carried out to estimate source signals. Moreover, fault diagnosis could be carried out by envelope spectrum analysis. Compared with the method of constructing TF masks, the proposed strategy could avoid setting thresholds empirically. Finally, the effectiveness of the proposed technique was validated by simulation and experimental cases. Results indicated the potential of this method for improving the performance of the ADBD system.

Download Full-text

Methods to isolate retrograde and prograde Rayleigh-wave signals

Geophysical Journal International ◽

10.1093/gji/ggz341 ◽

2019 ◽

Vol 219 (2) ◽

pp. 975-994 ◽

Cited By ~ 1

Author(s):

Gabriel Gribler ◽

T Dylan Mikesell

Keyword(s):

Rayleigh Wave ◽

Time Domain ◽

Fundamental Mode ◽

Poisson Ratio ◽

Low Frequency ◽

Wave Dispersion ◽

Low Frequencies ◽

Retrograde Motion ◽

Mode Identification ◽

Time Frequency

SUMMARY Estimating shear wave velocity with depth from Rayleigh-wave dispersion data is limited by the accuracy of fundamental and higher mode identification and characterization. In many cases, the fundamental mode signal propagates exclusively in retrograde motion, while higher modes propagate in prograde motion. It has previously been shown that differences in particle motion can be identified with multicomponent recordings and used to separate prograde from retrograde signals. Here we explore the domain of existence of prograde motion of the fundamental mode, arising from a combination of two conditions: (1) a shallow, high-impedance contrast and (2) a high Poisson ratio material. We present solutions to isolate fundamental and higher mode signals using multicomponent recordings. Previously, a time-domain polarity mute was used with limited success due to the overlap in the time domain of fundamental and higher mode signals at low frequencies. We present several new approaches to overcome this low-frequency obstacle, all of which utilize the different particle motions of retrograde and prograde signals. First, the Hilbert transform is used to phase shift one component by 90° prior to summation or subtraction of the other component. This enhances either retrograde or prograde motion and can increase the mode amplitude. Secondly, we present a new time–frequency domain polarity mute to separate retrograde and prograde signals. We demonstrate these methods with synthetic and field data to highlight the improvements to dispersion images and the resulting dispersion curve extraction.

Download Full-text

Auditory scene analysis based on time-frequency integration of shared FM and AM (II): Optimum time-domain integration and stream sound reconstruction

Systems and Computers in Japan ◽

10.1002/scj.1160 ◽

2002 ◽

Vol 33 (10) ◽

pp. 83-94 ◽

Cited By ~ 2

Author(s):

Mototsugu Abe ◽

Shigeru Ando

Keyword(s):

Time Domain ◽

Auditory Scene Analysis ◽

Scene Analysis ◽

Optimum Time ◽

Time Frequency ◽

Auditory Scene ◽

Domain Integration

Download Full-text

A research on fiber-optic vibration pattern recognition based on time-frequency characteristics

Advances in Mechanical Engineering ◽

10.1177/1687814018813468 ◽

2018 ◽

Vol 10 (12) ◽

pp. 168781401881346 ◽

Cited By ~ 3

Author(s):

Tabi Fouda Bernard Marie ◽

Dezhi Han ◽

Bowen An ◽

Jingyun Li

Keyword(s):

Pattern Recognition ◽

Frequency Domain ◽

Time Domain ◽

Fiber Optic ◽

Pattern Recognition Method ◽

Recognition Method ◽

Time Frequency ◽

Mode Decomposition ◽

Vibration Pattern ◽

The Time Domain

To detect and recognize any type of events over the perimeter security system, this article proposes a fiber-optic vibration pattern recognition method based on the combination of time-domain features and time-frequency domain features. The performance parameters (event recognition, event location, and event classification) are very important and describe the validity of this article. The pattern recognition method is precisely based on the empirical mode decomposition of time-frequency entropy and center-of-gravity frequency. It implements the function of identifying and classifying the event (intrusions or non-intrusion) over the perimeter to secure. To achieve this method, the first-level prejudgment is performed according to the time-domain features of the vibration signal, and the second-level prediction is carried out through time-frequency analysis. The time-frequency distribution of the signal is obtained by empirical mode decomposition and Hilbert transform and then the time-frequency entropy and center-of-gravity frequency are used to form the time-frequency domain features, that is, combined with the time-domain features to form feature vectors. Multiple types of probabilistic neural networks are identified to determine whether there are intrusions and the intrusion types. The experimental results demonstrate that the proposed method is effective and reliable in identifying and classifying the type of event.

Download Full-text

Thruster Fault Identification for Autonomous Underwater Vehicle Based on Time-Domain Energy and Time-Frequency Entropy of Fusion Signal

Intelligent Robotics and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-27535-8_25 ◽

2019 ◽

pp. 264-275

Author(s):

Baoji Yin ◽

Xi Lin ◽

Wenxian Tang ◽

Zhikun Jin

Keyword(s):

Time Domain ◽

Autonomous Underwater Vehicle ◽

Underwater Vehicle ◽

Fault Identification ◽

Time Frequency ◽

Entropy Of Fusion ◽

Fusion Signal

Download Full-text

Permutation Entropy-Based Interpretability of Convolutional Neural Network Models for Interictal EEG Discrimination of Subjects with Epileptic Seizures vs. Psychogenic Non-Epileptic Seizures

Entropy ◽

10.3390/e24010102 ◽

2022 ◽

Vol 24 (1) ◽

pp. 102

Author(s):

Michele Lo Giudice ◽

Giuseppe Varone ◽

Cosimo Ieracitano ◽

Nadia Mammone ◽

Giovanbattista Gaspare Tripodi ◽

...

Keyword(s):

Neural Network ◽

Discriminant Analysis ◽

Convolutional Neural Network ◽

Epileptic Seizures ◽

Permutation Entropy ◽

Diagnostic Tools ◽

Support Vector ◽

Feature Maps ◽

Time Frequency ◽

Interictal Eeg

The differential diagnosis of epileptic seizures (ES) and psychogenic non-epileptic seizures (PNES) may be difficult, due to the lack of distinctive clinical features. The interictal electroencephalographic (EEG) signal may also be normal in patients with ES. Innovative diagnostic tools that exploit non-linear EEG analysis and deep learning (DL) could provide important support to physicians for clinical diagnosis. In this work, 18 patients with new-onset ES (12 males, 6 females) and 18 patients with video-recorded PNES (2 males, 16 females) with normal interictal EEG at visual inspection were enrolled. None of them was taking psychotropic drugs. A convolutional neural network (CNN) scheme using DL classification was designed to classify the two categories of subjects (ES vs. PNES). The proposed architecture performs an EEG time-frequency transformation and a classification step with a CNN. The CNN was able to classify the EEG recordings of subjects with ES vs. subjects with PNES with 94.4% accuracy. CNN provided high performance in the assigned binary classification when compared to standard learning algorithms (multi-layer perceptron, support vector machine, linear discriminant analysis and quadratic discriminant analysis). In order to interpret how the CNN achieved this performance, information theoretical analysis was carried out. Specifically, the permutation entropy (PE) of the feature maps was evaluated and compared in the two classes. The achieved results, although preliminary, encourage the use of these innovative techniques to support neurologists in early diagnoses.

Download Full-text

Recurrence Plot-Based Approach for Cardiac Arrhythmia Classification Using Inception-ResNet-v2

Frontiers in Physiology ◽

10.3389/fphys.2021.648950 ◽

2021 ◽

Vol 12 ◽

Author(s):

Hua Zhang ◽

Chengyu Liu ◽

Zhimin Zhang ◽

Yujie Xing ◽

Xinwen Liu ◽

...

Keyword(s):

Cardiac Arrhythmia ◽

Time Domain ◽

Original Data ◽

Classification Problem ◽

Recurrence Plot ◽

Two Dimensions ◽

Time Frequency ◽

Ecg Signals ◽

Physiological Signal ◽

The Time Domain

The present study addresses the cardiac arrhythmia (CA) classification problem using the deep learning (DL)-based method for electrocardiography (ECG) data analysis. Recently, various DL techniques have been utilized to classify arrhythmias, with one typical approach to developing a one-dimensional (1D) convolutional neural network (CNN) model to handle the ECG signals in the time domain. Although the CA classification in the time domain is very prevalent, current methods’ performances are still not robust or satisfactory. This study aims to develop a solution for CA classification in two dimensions by introducing the recurrence plot (RP) combined with an Inception-ResNet-v2 network. The proposed method for nine types of CA classification was tested on the 1st China Physiological Signal Challenge 2018 dataset. During implementation, the optimal leads (lead II and lead aVR) were selected, and then 1D ECG segments were transformed into 2D texture images by the RP approach. These RP-based images as input signals were passed into the Inception-ResNet-v2 for CA classification. In the CPSC, Georgia, and the PTB_XL ECG databases of the PhysioNet/Computing in Cardiology Challenge 2020, the RP-based method achieved an average F1-score of 0.8521, 0.8529, and 0.8862, respectively. The results suggested the excellent generalization ability of the proposed method. To further assess the performance of the proposed method, we compared the 2D RP-image-based solution with the published 1D ECG-based works on the same dataset. Also, it was compared with two traditional ECG transform into 2D image methods, including the time waveform of the ECG recordings and time-frequency images based on continuous wavelet transform (CWT). The proposed method achieved the highest average F1-score of 0.844, with only two leads of the 12-lead ECG original data, which outperformed other works. Therefore, the promising results indicate that the 2D RP-based method has a high clinical potential for CA classification using fewer lead ECG signals.

Download Full-text

Orthogonal Time Sequency Multiplexing Modulation

10.36227/techrxiv.13580201.v1 ◽

2021 ◽

Author(s):

Tharaj Thaj ◽

Emanuele Viterbo

Keyword(s):

Time Domain ◽

Orthogonal Frequency Division Multiplexing ◽

High Performance ◽

High Mobility ◽

Low Complexity ◽

Modulation Scheme ◽

Frequency Space ◽

Time Frequency ◽

Single Carrier ◽

The Time Domain

This paper proposes <i>orthogonal time sequency multiplexing</i> (OTSM), a novel single carrier modulation scheme based on the well known Walsh-Hadamard transform (WHT) combined with row-column interleaving, and zero padding (ZP) between blocks in the time-domain. The information symbols in OTSM are multiplexed in the delay and sequency domain using a cascade of time-division and Walsh-Hadamard (sequency) multiplexing. By using the WHT for transmission and reception, the modulation and demodulation steps do not require any complex multiplications. We then propose two low-complexity detectors: (i) a simpler non-iterative detector based on a single tap minimum mean square time-frequency domain equalizer and (ii) an iterative time-domain detector. We demonstrate, via numerical simulations, that the proposed modulation scheme offers high performance gains over orthogonal frequency division multiplexing (OFDM) and exhibits the same performance of orthogonal time frequency space (OTFS) modulation, but with lower complexity. In proposing OTSM, along with simple detection schemes, we offer the lowest complexity solution to achieving reliable communication in high mobility wireless channels, as compared to the available schemes published so far in the literature.

Download Full-text