A comparison between recurrent neural architectures for real-time nonlinear prediction of speech signals

The most challenging process in recent Speech Enhancement (SE) systems is to exclude the non-stationary noises and additive white Gaussian noise in real-time applications. Several SE techniques suggested were not successful in real-time scenarios to eliminate noises in the speech signals due to the high utilization of resources. So, a Sliding Window Empirical Mode Decomposition including a Variant of Variational Model Decomposition and Hurst (SWEMD-VVMDH) technique was developed for minimizing the difficulty in real-time applications. But this is the statistical framework that takes a long time for computations. Hence in this article, this SWEMD-VVMDH technique is extended using Deep Neural Network (DNN) that learns the decomposed speech signals via SWEMD-VVMDH efficiently to achieve SE. At first, the noisy speech signals are decomposed into Intrinsic Mode Functions (IMFs) by the SWEMD Hurst (SWEMDH) technique. Then, the Time-Delay Estimation (TDE)-based VVMD was performed on the IMFs to elect the most relevant IMFs according to the Hurst exponent and lessen the low- as well as high-frequency noise elements in the speech signal. For each signal frame, the target features are chosen and fed to the DNN that learns these features to estimate the Ideal Ratio Mask (IRM) in a supervised manner. The abilities of DNN are enhanced for the categories of background noise, and the Signal-to-Noise Ratio (SNR) of the speech signals. Also, the noise category dimension and the SNR dimension are chosen for training and testing manifold DNNs since these are dimensions often taken into account for the SE systems. Further, the IRM in each frequency channel for all noisy signal samples is concatenated to reconstruct the noiseless speech signal. At last, the experimental outcomes exhibit considerable improvement in SE under different categories of noises.

Download Full-text

Method of Multistage Mixing Speech Signals for the Real-Time Multimedia Systems

2005 IEEE Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications ◽

10.1109/idaacs.2005.283066 ◽

2005 ◽

Author(s):

A. Melnyk ◽

T. Korkishko ◽

R. Shevchuk

Keyword(s):

Real Time ◽

Multimedia Systems ◽

Speech Signals ◽

The Real

Download Full-text

Real-Time Implementation of Parallel Architecture Based Noise Minimization from Speech Signals on FPGA

Wireless Personal Communications ◽

10.1007/s11277-018-5889-9 ◽

2018 ◽

Vol 103 (3) ◽

pp. 1941-1963 ◽

Cited By ~ 1

Author(s):

Deepak Kumar Gupta ◽

Vijay Kumar Gupta ◽

Mahesh Chandra ◽

Gaurav Verma

Keyword(s):

Real Time ◽

Parallel Architecture ◽

Speech Signals

Download Full-text

RPSOVF Prediction Model for Speech Signal Series Based on UPSO

International Journal of Bifurcation and Chaos ◽

10.1142/s0218127419500755 ◽

2019 ◽

Vol 29 (06) ◽

pp. 1950075

Author(s):

Yumei Zhang ◽

Xiangying Guo ◽

Xia Wu ◽

Suzhen Shi ◽

Xiaojun Wu

Keyword(s):

Prediction Model ◽

Speech Signal ◽

Signal Reconstruction ◽

Volterra Model ◽

Speech Signals ◽

Mean Square ◽

Least Mean Square ◽

Nonlinear Prediction ◽

Absolute Deviation ◽

Low Performance

In this paper, we propose a nonlinear prediction model of speech signal series with an explicit structure. In order to overcome some intrinsic shortcomings, such as traps at the local minimum, improper selection of parameters, and slow convergence rate, which are always caused by improper parameters generated by, typically, the low performance of least mean square (LMS) in updating kernel coefficients of the Volterra model, a uniform searching particle swarm optimization (UPSO) algorithm to optimize the kernel coefficients of the Volterra model is proposed. The second-order Volterra filter (SOVF) speech prediction model based on UPSO is established by using English phonemes, words, and phrases. In order to reduce the complexity of the model, given a user-designed tolerance of errors, we extract the reduced parameter of SOVF (RPSOVF) for acceleration. The experimental results show that in the tasks of single-frame and multiframe speech signals, both UPSO-SOVF and UPSO-RPSOVF are better than LMS-SOVF and PSO-SOVF in terms of root mean square error (RMSE) and mean absolute deviation (MAD). UPSO-SOVF and UPSO-RPSOVF can better reflect trends and regularity of speech signals, which can fully meet the requirements of speech signal prediction. The proposed model presents a nonlinear analysis and valuable model structure for speech signal series, and can be further employed in speech signal reconstruction or compression coding.

Download Full-text

Design and implementation of encryption algorithm for real time speech signals

2016 Conference on Advances in Signal Processing (CASP) ◽

10.1109/casp.2016.7746172 ◽

2016 ◽

Author(s):

Prakhar Sharma ◽

R.K. Sharma

Keyword(s):

Real Time ◽

Encryption Algorithm ◽

Speech Signals ◽

Design And Implementation

Download Full-text

MIXING SPEECH SIGNALS FOR THE REAL-TIME MULTIMEDIA SYSTEMS

International Journal of Computing ◽

10.47839/ijc.5.1.382 ◽

2014 ◽

pp. 57-65

Author(s):

A. Melnyk ◽

T. Korkishko ◽

R. Shevchuk

Keyword(s):

Real Time ◽

Basic Principle ◽

Multimedia Systems ◽

Speech Signals ◽

The Real

In this work we considered basic principle of mixing and offered the method of multistage mixing which allows the mixing speech samples on the measure of receipt of data blocks in mixer, and also mixing speech samples, that was got from decompression of compression speech signals of different formats.

Download Full-text

Low-latency smartphone app for real-time noise reduction of noisy speech signals

2017 IEEE 26th International Symposium on Industrial Electronics (ISIE) ◽

10.1109/isie.2017.8001429 ◽

2017 ◽

Cited By ~ 2

Author(s):

Aditya Bhattacharya ◽

Abhishek Sehgal ◽

Nasser Kehtarnavaz

Keyword(s):

Noise Reduction ◽

Real Time ◽

Smartphone App ◽

Low Latency ◽

Speech Signals ◽

Noisy Speech

Download Full-text

DAVID: An open-source platform for real-time emotional speech transformation. With 25 applications in the behavioral sciences

10.1101/038133 ◽

2016 ◽

Author(s):

Laura Rachman ◽

Marco Liuni ◽

Pablo Arias ◽

Andreas Lind ◽

Petter Johansson ◽

...

Keyword(s):

Open Source ◽

Real Time ◽

Open Source Software ◽

Speech Signals ◽

Emotional Speech ◽

Behavioral Sciences ◽

Social Situations ◽

Audio File ◽

High Level ◽

Validation Experiments

We present an open-source software platform that transforms the emotions expressed by speech signals using audio effects like pitch shifting, inflection, vibrato, and filtering. The emotional transformations can be applied to any audio file, but can also run in real-time (with less than 20-millisecond latency), using live input from a microphone. We anticipate that this tool will be useful for the study of emotions in psychology and neuroscience, because it enables a high level of control over the acoustical and emotional content of experimental stimuli in a variety of laboratory situations, including real-time social situations. We present here results of a series of validation experiments showing that transformed emotions are recognized at above-chance levels in the French, English, Swedish and Japanese languages, with a naturalness comparable to natural speech. Then, we provide a list of twenty-five experimental ideas applying this new tool to important topics in the behavioral sciences.

Download Full-text