X-DC: Explainable Deep Clustering Based on Learnable Spectrogram Templates

Abstract Deep neural networks (DNNs) have achieved substantial predictive performance in various speech processing tasks. Particularly, it has been shown that a monaural speech separation task can be successfully solved with a DNN-based method called deep clustering (DC), which uses a DNN to describe the process of assigning a continuous vector to each time-frequency (TF) bin and measure how likely each pair of TF bins is to be dominated by the same speaker. In DC, the DNN is trained so that the embedding vectors for the TF bins dominated by the same speaker are forced to get close to each other. One concern regarding DC is that the embedding process described by a DNN has a black-box structure, which is usually very hard to interpret. The potential weakness owing to the noninterpretable black box structure is that it lacks the flexibility of addressing the mismatch between training and test conditions (caused by reverberation, for instance). To overcome this limitation, in this letter, we propose the concept of explainable deep clustering (X-DC), whose network architecture can be interpreted as a process of fitting learnable spectrogram templates to an input spectrogram followed by Wiener filtering. During training, the elements of the spectrogram templates and their activations are constrained to be nonnegative, which facilitates the sparsity of their values and thus improves interpretability. The main advantage of this framework is that it naturally allows us to incorporate a model adaptation mechanism into the network thanks to its physically interpretable structure. We experimentally show that the proposed X-DC enables us to visualize and understand the clues for the model to determine the embedding vectors while achieving speech separation performance comparable to that of the original DC models.

Download Full-text

Segmented Time-Frequency Masking Algorithm for Speech Separation Based on Deep Neural Networks

2020 13th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI) ◽

10.1109/cisp-bmei51763.2020.9263673 ◽

2020 ◽

Author(s):

Xinyu Guo ◽

Shifeng Ou ◽

Meng Gao ◽

Ying Gao

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Speech Separation ◽

Time Frequency

Download Full-text

End-to-End Monaural Speech Separation with a Deep Complex U-Shaped Network

Journal of Circuits System and Computers ◽

10.1142/s0218126622500281 ◽

2021 ◽

pp. 2250028

Author(s):

Wen Zhang ◽

Xiaoyong Li ◽

Aolong Zhou ◽

Kefeng Deng ◽

Kaijun Ren ◽

...

Keyword(s):

Neural Network ◽

Network Architecture ◽

Source Separation ◽

Complex Signal ◽

Speech Separation ◽

Time Frequency ◽

Perceptual Evaluation ◽

End To End ◽

Complex Valued ◽

Signal Approximation

Conventional time–frequency (TF) domain source separation methods mainly focus on predicting TF-masks or speech spectrums, where complex ideal ratio mask (cIRM) is an effective target for speech enhancement and separation. However, some recent studies employ a real-valued network, such as a general convolutional neural network (CNN) and a recurrent neural network (RNN), to predict a complex-valued mask or a spectrogram target, leading to the unbalanced training results of real and imaginary parts. In this paper, to estimate the complex-valued target more accurately, a novel U-shaped complex network for the complex signal approximation (uCSA) method is proposed. The uCSA is an adaptive front-end time-domain separation method, which tackles the monaural source separation problem in three ways. First, we design and implement a complex U-shaped network architecture comprising well-defined complex-valued encoder and decoder blocks, as well as complex-valued bidirectional Long Short-Term Memory (BLSTM) layers, to process complex-valued operations. Second, the cIRM is the training target of our uCSA method, optimized by signal approximation (SA), which takes advantage of both real and imaginary components of the complex-valued spectrum. Third, we re-formulate STFT and inverse STFT into derivable formats, and the model is trained with the scale-invariant source-to-noise ratio (SI-SNR) loss, achieving end-to-end training of the speech source separation task. Moreover, the proposed uCSA models are evaluated on the WSJ0-2mix datasets, which is a valid corpus commonly used by many supervised speech separation methods. Extensive experimental results indicate that our proposed method obtains state-of-the-art performance on the basis of the perceptual evaluation of speech quality (PESQ) and the short-time objective intelligibility (STOI) metrics.

Download Full-text

Binaural Reverberant Speech Separation Based on Deep Neural Networks

10.21437/interspeech.2017-297 ◽

2017 ◽

Cited By ~ 1

Author(s):

Xueliang Zhang ◽

DeLiang Wang

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Speech Separation ◽

Reverberant Speech

Download Full-text

Time–frequency time–space LSTM for robust classification of physiological signals

Scientific Reports ◽

10.1038/s41598-021-86432-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Tuan D. Pham

Keyword(s):

Time Series ◽

Network Architecture ◽

Time Series Data ◽

Short Term Memory ◽

Physiological Signals ◽

Series Data ◽

Time Frequency ◽

Time Space ◽

Deep Recurrent Neural Network

AbstractAutomated analysis of physiological time series is utilized for many clinical applications in medicine and life sciences. Long short-term memory (LSTM) is a deep recurrent neural network architecture used for classification of time-series data. Here time–frequency and time–space properties of time series are introduced as a robust tool for LSTM processing of long sequential data in physiology. Based on classification results obtained from two databases of sensor-induced physiological signals, the proposed approach has the potential for (1) achieving very high classification accuracy, (2) saving tremendous time for data learning, and (3) being cost-effective and user-comfortable for clinical trials by reducing multiple wearable sensors for data recording.

Download Full-text

Hybrid deep neural networks to infer state models of black-box systems

Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering ◽

10.1145/3324884.3416559 ◽

2020 ◽

Author(s):

Mohammad Jafar Mashhadi ◽

Hadi Hemmati

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Black Box ◽

State Models

Download Full-text

A Speech Separation Method Combining Time-Frequency Masking and Independent Component Analysis

2008 3rd International Conference on Innovative Computing Information and Control ◽

10.1109/icicic.2008.92 ◽

2008 ◽

Author(s):

Xiaohong Ma ◽

Wenlong Liu ◽

Fuliang Yin ◽

Xiaohua Liu

Keyword(s):

Independent Component Analysis ◽

Component Analysis ◽

Independent Component ◽

Separation Method ◽

Speech Separation ◽

Time Frequency

Download Full-text

Reynolds averaged turbulence modelling using deep neural networks with embedded invariance

Journal of Fluid Mechanics ◽

10.1017/jfm.2016.615 ◽

2016 ◽

Vol 807 ◽

pp. 155-166 ◽

Cited By ~ 274

Author(s):

Julia Ling ◽

Andrew Kurzawski ◽

Jeremy Templeton

Keyword(s):

Neural Network ◽

Neural Networks ◽

Reynolds Stress ◽

Network Architecture ◽

Eddy Viscosity ◽

Deep Neural Networks ◽

Test Cases ◽

Neural Network Architecture ◽

Stress Anisotropy ◽

Anisotropy Tensor

There exists significant demand for improved Reynolds-averaged Navier–Stokes (RANS) turbulence models that are informed by and can represent a richer set of turbulence physics. This paper presents a method of using deep neural networks to learn a model for the Reynolds stress anisotropy tensor from high-fidelity simulation data. A novel neural network architecture is proposed which uses a multiplicative layer with an invariant tensor basis to embed Galilean invariance into the predicted anisotropy tensor. It is demonstrated that this neural network architecture provides improved prediction accuracy compared with a generic neural network architecture that does not embed this invariance property. The Reynolds stress anisotropy predictions of this invariant neural network are propagated through to the velocity field for two test cases. For both test cases, significant improvement versus baseline RANS linear eddy viscosity and nonlinear eddy viscosity models is demonstrated.

Download Full-text

Artificial Cognition: How Experimental Psychology Can Help Generate Explainable Artificial Intelligence

10.31234/osf.io/ygr4c ◽

2021 ◽

Author(s):

J. Eric T. Taylor ◽

Graham Taylor

Keyword(s):

Artificial Intelligence ◽

Machine Learning ◽

Deep Neural Networks ◽

Experimental Approach ◽

Well Being ◽

Human Mind ◽

Black Box ◽

Psychological Science ◽

Black Boxes ◽

Explainable Artificial Intelligence

Artificial intelligence powered by deep neural networks has reached a levelof complexity where it can be difficult or impossible to express how a modelmakes its decisions. This black-box problem is especially concerning when themodel makes decisions with consequences for human well-being. In response,an emerging field called explainable artificial intelligence (XAI) aims to increasethe interpretability, fairness, and transparency of machine learning. In thispaper, we describe how cognitive psychologists can make contributions to XAI.The human mind is also a black box, and cognitive psychologists have overone hundred and fifty years of experience modeling it through experimentation.We ought to translate the methods and rigour of cognitive psychology to thestudy of artificial black boxes in the service of explainability. We provide areview of XAI for psychologists, arguing that current methods possess a blindspot that can be complemented by the experimental cognitive tradition. Wealso provide a framework for research in XAI, highlight exemplary cases ofexperimentation within XAI inspired by psychological science, and provide atutorial on experimenting with machines. We end by noting the advantages ofan experimental approach and invite other psychologists to conduct research inthis exciting new field.

Download Full-text

Low Latency Convolutive Blind Source Separation

10.26686/wgtn.17136158 ◽

2021 ◽

Author(s):

◽

Jiawen Chua

Keyword(s):

Frequency Domain ◽

Real Time ◽

Impulse Response ◽

Source Separation ◽

Frequency Resolution ◽

Separation Performance ◽

Window Length ◽

Time Frequency ◽

Time Systems ◽

Separation Parameters

<p>In most real-time systems, particularly for applications involving system identification, latency is a critical issue. These applications include, but are not limited to, blind source separation (BSS), beamforming, speech dereverberation, acoustic echo cancellation and channel equalization. The system latency consists of an algorithmic delay and an estimation computational time. The latter can be avoided by using a multi-thread system, which runs the estimation process and the processing procedure simultaneously. The former, which consists of a delay of one window length, is usually unavoidable for the frequency-domain approaches. For frequency-domain approaches, a block of data is acquired by using a window, transformed and processed in the frequency domain, and recovered back to the time domain by using an overlap-add technique. In the frequency domain, the convolutive model, which is usually used to describe the process of a linear time-invariant (LTI) system, can be represented by a series of multiplicative models to facilitate estimation. To implement frequency-domain approaches in real-time applications, the short-time Fourier transform (STFT) is commonly used. The window used in the STFT must be at least twice the room impulse response which is long, so that the multiplicative model is sufficiently accurate. The delay constraint caused by the associated blockwise processing window length makes most the frequency-domain approaches inapplicable for real-time systems. This thesis aims to design a BSS system that can be used in a real-time scenario with minimal latency. Existing BSS approaches can be integrated into our system to perform source separation with low delay without affecting the separation performance. The second goal is to design a BSS system that can perform source separation in a non-stationary environment. We first introduce a subspace approach to directly estimate the separation parameters in the low-frequency-resolution time-frequency (LFRTF) domain. In the LFRTF domain, a shorter window is used to reduce the algorithmic delay of the system during the signal acquisition, e.g., the window length is shorter than the room impulse response. The subspace method facilitates the deconvolution of a convolutive mixture to a new instantaneous mixture and simplifies the estimation process. Second, we propose an alternative approach to address the algorithmic latency problem. The alternative method enables us to obtain the separation parameters in the LFRTF domain based on parameters estimated in the high-frequency-resolution time-frequency (HFRTF) domain, where the window length is longer than the room impulse response, without affecting the separation performance. The thesis also provides a solution to address the BSS problem in a non-stationary environment. We utilize the ``meta-information" that is obtained from previous BSS operations to facilitate the separation in the future without performing the entire BSS process again. Repeating a BSS process can be computationally expensive. Most conventional BSS algorithms require sufficient signal samples to perform analysis and this prolongs the estimation delay. By utilizing information from the entire spectrum, our method enables us to update the separation parameters with only a single snapshot of observation data. Hence, our method minimizes the estimation period, reduces the redundancy and improves the efficacy of the system. The final contribution of the thesis is a non-iterative method for impulse response shortening. This method allows us to use a shorter representation to approximate the long impulse response. It further improves the computational efficiency of the algorithm and yet achieves satisfactory performance.</p>

Download Full-text

USE OF GENETIC ALGORITHM IN DEEP NEURAL NETWORKS CONFIGURATION FOR THE PURPOSES OF COMPUTER ATTACKS CLASSIFICATION

10.22250/isu.2020.66.104-117 ◽

2020 ◽

pp. 104-117

Author(s):

O.S. Amosov ◽

◽

S.G. Amosova ◽

D.S. Magola ◽

◽

...

Keyword(s):

Neural Network ◽

Genetic Algorithm ◽

Network Architecture ◽

Deep Neural Network ◽

Deep Neural Networks ◽

Classification Problem ◽

Neural Network Architecture ◽

Computer Attacks ◽

Neural Network Technology

The task of multiclass network classification of computer attacks is given. The applicability of deep neural network technology in problem solving has been considered. Deep neural network architecture was chosen based on the strategy of combining a set of convolution and recurrence LSTM layers. Op-timization of neural network parameters based on genetic algorithm is proposed. The presented results of modeling show the possibility of solving the network classification problem in real time.

Download Full-text