scholarly journals Evaluation of Mixed Deep Neural Networks for Reverberant Speech Enhancement

Biomimetics ◽  
2019 ◽  
Vol 5 (1) ◽  
pp. 1 ◽  
Author(s):  
Michelle Gutiérrez-Muñoz ◽  
Astryd González-Salazar ◽  
Marvin Coto-Jiménez

Speech signals are degraded in real-life environments, as a product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions. To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combinations of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation was made based on quality measurements of the signal’s spectrum, the training time of the networks, and statistical validation of results. In total, 120 artificial neural networks of eight different types were trained and compared. The results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, given that reduction in training time is on the order of 30%, in processes that can normally take several days or weeks, depending on the amount of data. The results also present advantages in efficiency, but without a significant drop in quality.

Author(s):  
Michelle Gutiérrez-Muñoz ◽  
Astryd González-Salazar ◽  
Marvin Coto-Jiménez

Speech signals are degraded in real-life environments, product of background noise or other factors. The processing of such signals for voice recognition and voice analysis systems presents important challenges. One of the conditions that make adverse quality difficult to handle in those systems is reverberation, produced by sound wave reflections that travel from the source to the microphone in multiple directions.To enhance signals in such adverse conditions, several deep learning-based methods have been proposed and proven to be effective. Recently, recurrent neural networks, especially those with long and short-term memory (LSTM), have presented surprising results in tasks related to time-dependent processing of signals, such as speech. One of the most challenging aspects of LSTM networks is the high computational cost of the training procedure, which has limited extended experimentation in several cases. In this work, we present a proposal to evaluate the hybrid models of neural networks to learn different reverberation conditions without any previous information. The results show that some combination of LSTM and perceptron layers produce good results in comparison to those from pure LSTM networks, given a fixed number of layers. The evaluation has been made based on quality measurements of the signal's spectrum, training time of the networks and statistical validation of results. Results help to affirm the fact that hybrid networks represent an important solution for speech signal enhancement, with advantages in efficiency, but without a significan drop in quality.


2020 ◽  
Vol 20 (11) ◽  
pp. 6603-6608 ◽  
Author(s):  
Sung-Tae Lee ◽  
Suhwan Lim ◽  
Jong-Ho Bae ◽  
Dongseok Kwon ◽  
Hyeong-Su Kim ◽  
...  

Deep learning represents state-of-the-art results in various machine learning tasks, but for applications that require real-time inference, the high computational cost of deep neural networks becomes a bottleneck for the efficiency. To overcome the high computational cost of deep neural networks, spiking neural networks (SNN) have been proposed. Herein, we propose a hardware implementation of the SNN with gated Schottky diodes as synaptic devices. In addition, we apply L1 regularization for connection pruning of the deep spiking neural networks using gated Schottky diodes as synap-tic devices. Applying L1 regularization eliminates the need for a re-training procedure because it prunes the weights based on the cost function. The compressed hardware-based SNN is energy efficient while achieving a classification accuracy of 97.85% which is comparable to 98.13% of the software deep neural networks (DNN).


1977 ◽  
Vol 196 (1123) ◽  
pp. 171-195 ◽  

Cycloheximide injected into the brains of chickens 10 min before training does not effect their learning of a visual discrimination task, or memory of that task for at least 1 h after training. When tested 24 h later no memory of the training procedure is detectable. In contrast, ouabain injected 10 min before training prevents the expression of learning during training. The block lasts for up to 1 h, but from that time on memory begins to appear. Ouabain does not affect performance when injected just before testing for memory retention 24 h after training. It therefore affects neither the readout of long-term memory nor motivation nor perceptual abilities necessary for performance of the learning task. In birds treated with ouabain, after training on an operant task for heat reward by a procedure requiring a fixed number of reinforcements, memory is absent 20 min later but is well established at 24 h. Cycloheximide blocks long-term memory of this task. Like ouabain, ethacrynic acid, injected into the brain of chickens 10 min before training prevents the expression of learning of visual discrimination. Ethacrynic acid hastens the decline of memory after one-trial passive avoidance learning. It also blocks observational learning. We conclude that ouabain and ethacrynic acid block access to short-term memory, whereas cycloheximide interferes with the registration of long-term memory. Comparing the pharmacology of ethacrynic acid and ouabain their common known actions are on the Na/K fluxes across cell membranes. We suggest that long lasting changes in distribution of these ions in recently active nerve cells may be at the basis of access to memory during and shortly after learning.


2020 ◽  
Vol 12 (8) ◽  
pp. 3177 ◽  
Author(s):  
Dimitrios Kontogiannis ◽  
Dimitrios Bargiotas ◽  
Aspassia Daskalopulu

Power forecasting is an integral part of the Demand Response design philosophy for power systems, enabling utility companies to understand the electricity consumption patterns of their customers and adjust price signals accordingly, in order to handle load demand more effectively. Since there is an increasing interest in real-time automation and more flexible Demand Response programs that monitor changes in the residential load profiles and reflect them according to changes in energy pricing schemes, high granularity time series forecasting is at the forefront of energy and artificial intelligence research, aimed at developing machine learning models that can produce accurate time series predictions. In this study we compared the baseline performance and structure of different types of neural networks on residential energy data by formulating a suitable supervised learning problem, based on real world data. After training and testing long short-term memory (LSTM) network variants, a convolutional neural network (CNN), and a multi-layer perceptron (MLP), we observed that the latter performed better on the given problem, yielding the lowest mean absolute error and achieving the fastest training time.


Author(s):  
Chunyuan Li ◽  
Changyou Chen ◽  
Yunchen Pu ◽  
Ricardo Henao ◽  
Lawrence Carin

Learning probability distributions on the weights of neural networks has recently proven beneficial in many applications. Bayesian methods such as Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) offer an elegant framework to reason about model uncertainty in neural networks. However, these advantages usually come with a high computational cost. We propose accelerating SG-MCMC under the masterworker framework: workers asynchronously and in parallel share responsibility for gradient computations, while the master collects the final samples. To reduce communication overhead, two protocols (downpour and elastic) are developed to allow periodic interaction between the master and workers. We provide a theoretical analysis on the finite-time estimation consistency of posterior expectations, and establish connections to sample thinning. Our experiments on various neural networks demonstrate that the proposed algorithms can greatly reduce training time while achieving comparable (or better) test accuracy/log-likelihood levels, relative to traditional SG-MCMC. When applied to reinforcement learning, it naturally provides exploration for asynchronous policy optimization, with encouraging performance improvement.


2020 ◽  
Vol 12 (12) ◽  
pp. 219
Author(s):  
Pin Yang ◽  
Huiyu Zhou ◽  
Yue Zhu ◽  
Liang Liu ◽  
Lei Zhang

The emergence of a large number of new malicious code poses a serious threat to network security, and most of them are derivative versions of existing malicious code. The classification of malicious code is helpful to analyze the evolutionary trend of malicious code families and trace the source of cybercrime. The existing methods of malware classification emphasize the depth of the neural network, which has the problems of a long training time and large computational cost. In this work, we propose the shallow neural network-based malware classifier (SNNMAC), a malware classification model based on shallow neural networks and static analysis. Our approach bridges the gap between precise but slow methods and fast but less precise methods in existing works. For each sample, we first generate n-grams from their opcode sequences of the binary file with a decompiler. An improved n-gram algorithm based on control transfer instructions is designed to reduce the n-gram dataset. Then, the SNNMAC exploits a shallow neural network, replacing the full connection layer and softmax with the average pooling layer and hierarchical softmax, to learn from the dataset and perform classification. We perform experiments on the Microsoft malware dataset. The evaluation result shows that the SNNMAC outperforms most of the related works with 99.21% classification precision and reduces the training time by more than half when compared with the methods using DNN (Deep Neural Networks).


Author(s):  
Yang Yi ◽  
Feng Ni ◽  
Yuexin Ma ◽  
Xinge Zhu ◽  
Yuankai Qi ◽  
...  

State-of-the-art hand gesture recognition methods have investigated the spatiotemporal features based on 3D convolutional neural networks (3DCNNs) or convolutional long short-term memory (ConvLSTM). However, they often suffer from the inefficiency due to the high computational complexity of their network structures. In this paper, we focus instead on the 1D convolutional neural networks and propose a simple and efficient architectural unit, Multi-Kernel Temporal Block (MKTB), that models the multi-scale temporal responses by explicitly applying different temporal kernels. Then, we present a Global Refinement Block (GRB), which is an attention module for shaping the global temporal features based on the cross-channel similarity. By incorporating the MKTB and GRB, our architecture can effectively explore the spatiotemporal features within tolerable computational cost. Extensive experiments conducted on public datasets demonstrate that our proposed model achieves the state-of-the-art with higher efficiency. Moreover, the proposed MKTB and GRB are plug-and-play modules and the experiments on other tasks, like video understanding and video-based person re-identification, also display their good performance in efficiency and capability of generalization.


2021 ◽  
Vol 29 (3) ◽  
Author(s):  
Bennilo Fernandes ◽  
Kasiprasad Mannepalli

Deep Neural Networks (DNN) are more than just neural networks with several hidden units that gives better results with classification algorithm in automated voice recognition activities. Then spatial correlation was considered in traditional feedforward neural networks and which do not manage speech signal properly to it extend, so recurrent neural networks (RNNs) were implemented. Long Short-Term Memory (LSTM) systems is a unique case of RNNs for speech processing, thus considering long-term dependencies Deep Hierarchical LSTM and BiLSTM is designed with dropout layers to reduce the gradient and long-term learning error in emotional speech analysis. Thus, four different combinations of deep hierarchical learning architecture Deep Hierarchical LSTM and LSTM (DHLL), Deep Hierarchical LSTM and BiLSTM (DHLB), Deep Hierarchical BiLSTM and LSTM (DHBL) and Deep Hierarchical dual BiLSTM (DHBB) is designed with dropout layers to improve the networks. The performance test of all four model were compared in this paper and better efficiency of classification is attained with minimal dataset of Tamil Language. The experimental results show that DHLB reaches the best precision of about 84% in recognition of emotions for Tamil database, however, the DHBL gives 83% of efficiency. Other design layers also show equal performance but less than the above models DHLL & DHBB shows 81% of efficiency for lesser dataset and minimal execution and training time.


Author(s):  
Hang Wu ◽  
Jinwei Chen ◽  
Huisheng Zhang

Abstract Monitoring and diagnosis of a gas turbine is a critical issue in equipment maintenance field. Traditional diagnosis methods are established on the basis of physical models. However, the complexity and degradation of gas turbine limit both comprehensiveness and accuracy of these physical models, making the diagnosis less effective. Therefore, data-driven models are introduced to supplement and revise previous models. Benefitting from the prosperous development of machine learning, neural network has been greatly improved and widely used in various fields of data mining. Three neural networks, Multilayer Perceptron, Convolutional Neural Network and Long Short-term Memory Network are applied in data-driven model establishment. Their training time and prediction accuracy are the two most important factors in judging the effectiveness. An active real time training which means training and predicting simultaneously is applied as the main modelling method for an on-line diagnosis system. Three periods are defined according to the time line: data preparation period, model establishing period and stable prediction period. From the three above neural networks, the most effective data-driven models that corresponding to the last two periods are tested and selected, the purpose is to ensure the high level of accuracy. When high level of accuracy is demanded, neural network always need large computing time and memory space in data learning process. To avoid prediction delay and keep rapid response for the coming fault, distributed training on a 1-master 2-workers computer cluster is designed and applied in this system. Two types of data parallelism are realized on the cluster through Apache Spark and Shell Script for Linux. Comparing with each other and the local training mode, the results shows that dispensing data at first and averaging parameters at last reaches a better outcome both in high accuracy and low training time.


Author(s):  
Yu Pan ◽  
Jing Xu ◽  
Maolin Wang ◽  
Jinmian Ye ◽  
Fei Wang ◽  
...  

Recurrent Neural Networks (RNNs) and their variants, such as Long-Short Term Memory (LSTM) networks, and Gated Recurrent Unit (GRU) networks, have achieved promising performance in sequential data modeling. The hidden layers in RNNs can be regarded as the memory units, which are helpful in storing information in sequential contexts. However, when dealing with high dimensional input data, such as video and text, the input-to-hidden linear transformation in RNNs brings high memory usage and huge computational cost. This makes the training of RNNs very difficult. To address this challenge, we propose a novel compact LSTM model, named as TR-LSTM, by utilizing the low-rank tensor ring decomposition (TRD) to reformulate the input-to-hidden transformation. Compared with other tensor decomposition methods, TR-LSTM is more stable. In addition, TR-LSTM can complete an end-to-end training and also provide a fundamental building block for RNNs in handling large input data. Experiments on real-world action recognition datasets have demonstrated the promising performance of the proposed TR-LSTM compared with the tensor-train LSTM and other state-of-the-art competitors.


Sign in / Sign up

Export Citation Format

Share Document