scholarly journals A Supervised Speech Enhancement Approach with Residual Noise Control for Voice Communication

2020 ◽  
Vol 10 (8) ◽  
pp. 2894 ◽  
Author(s):  
Andong Li ◽  
Renhua Peng ◽  
Chengshi Zheng ◽  
Xiaodong Li

For voice communication, it is important to extract the speech from its noisy version without introducing unnaturally artificial noise. By studying the subband mean-squared error (MSE) of the speech for unsupervised speech enhancement approaches and revealing its relationship with the existing loss function for supervised approaches, this paper derives a generalized loss function that takes residual noise control into account with a supervised approach. Our generalized loss function contains the well-known MSE loss function and many other often-used loss functions as special cases. Compared with traditional loss functions, our generalized loss function is more flexible to make a good trade-off between speech distortion and noise reduction. This is because a group of well-studied noise shaping schemes can be introduced to control residual noise for practical applications. Objective and subjective test results verify the importance of residual noise control for the supervised speech enhancement approach.

2001 ◽  
Vol 03 (02n03) ◽  
pp. 203-211
Author(s):  
K. HELMES ◽  
C. SRINIVASAN

Let Y(t), t∈[0,1], be a stochastic process modelled as dYt=θ(t)dt+dW(t), where W(t) denotes a standard Wiener process, and θ(t) is an unknown function assumed to belong to a given set Θ⊂L2[0,1]. We consider the problem of estimating the value ℒ(θ), where ℒ is a continuous linear function defined on Θ, using linear estimators of the form <m,y>=∫m(t)dY(t), m∈L2[0,1]. The distance between the quantity ℒ(θ) and the estimated value is measured by a loss function. In this paper, we consider the loss function to be an arbitrary even power function. We provide a characterisation of the best linear mini-max estimator for a general power function which implies the characterisation for two special cases which have previously been considered in the literature, viz. the case of a quadratic loss function and the case of a quartic loss function.


Author(s):  
Seifeldeen Eteifa ◽  
Hesham A. Rakha ◽  
Hoda Eldardiry

Vehicle acceleration and deceleration maneuvers at traffic signals result in significant fuel and energy consumption levels. Green light optimal speed advisory systems require reliable estimates of signal switching times to improve vehicle energy/fuel efficiency. Obtaining these estimates is difficult for actuated signals where the length of each green indication changes to accommodate varying traffic conditions and pedestrian requests. This study details a four-step long short-term memory (LSTM) deep learning based methodology that can be used to provide reasonable switching time estimates from green to red and vice versa while being robust to missing data. The four steps are data gathering, data preparation, machine learning model tuning, and model testing and evaluation. The input to the models includes controller logic, signal timing parameters, time of day, traffic state from detectors, vehicle actuation data, and pedestrian actuation data. The methodology is applied and evaluated on data from an intersection in Northern Virginia. A comparative analysis is conducted between different loss functions including the mean squared error, mean absolute error, and mean relative error used in LSTM and a new loss function that is proposed in this paper. The results show that while the proposed loss function outperforms conventional loss functions in overall absolute error values, the choice of the loss function is dependent on the prediction horizon. Specifically, the proposed loss function is slightly outperformed by the mean relative error for very short prediction horizons (less than 20 s) and the mean squared error for very long prediction horizons (greater than 120 s).


Energies ◽  
2020 ◽  
Vol 13 (12) ◽  
pp. 3123
Author(s):  
Jun-Lin Lin ◽  
Yiqing Zhang ◽  
Kunhuang Zhu ◽  
Binbin Chen ◽  
Feng Zhang

For high-voltage and extra-high-voltage consumers, the electricity cost depends not only on the power consumed but also on the contract capacity. For the same amount of power consumed, the smaller the difference between the contract capacity and the power consumed, the smaller the electricity cost. Thus, predicting the future power demand for setting the contract capacity is of great economic interest. In the literature, most works predict the future power demand based on a symmetric loss function, such as mean squared error. However, the electricity pricing structure is asymmetric to the under- and overestimation of the actual power demand. In this work, we proposed several loss functions derived from the asymmetric electricity pricing structure. We experimented with the Long Short-Term Memory neural network with these loss functions using a real dataset from a large manufacturing company in the electronics industry in Taiwan. The results show that the proposed asymmetric loss functions outperform the commonly used symmetric loss function, with a saving on the electricity cost ranging from 0.88% to 2.42%.


Author(s):  
Jorge Llombart ◽  
Dayana Ribas ◽  
Antonio Miguel ◽  
Luis Vicente ◽  
Alfonso Ortega ◽  
...  

AbstractThe progressive paradigm is a promising strategy to optimize network performance for speech enhancement purposes. Recent works have shown different strategies to improve the accuracy of speech enhancement solutions based on this mechanism. This paper studies the progressive speech enhancement using convolutional and residual neural network architectures and explores two criteria for loss function optimization: weighted and uniform progressive. This work carries out the evaluation on simulated and real speech samples with reverberation and added noise using REVERB and VoiceHome datasets. Experimental results show a variety of achievements among the loss function optimization criteria and the network architectures. Results show that the progressive design strengthens the model and increases the robustness to distortions due to reverberation and noise.


Author(s):  
N. J. Hassan ◽  
J. Mahdi Hadad ◽  
A. Hawad Nasar

In this paper, we derive the generalized Bayesian shrinkage estimator of parameter of Burr XII distribution under three loss functions: squared error, LINEX, and weighted balance loss functions. Therefore, we obtain three generalized Bayesian shrinkage estimators (GBSEs). In this approach, we find the posterior risk function (PRF) of the generalized Bayesian shrinkage estimator (GBSE) with respect to each loss function. The constant formula of GBSE is computed by minimizing the PRF. In special cases, we derive two new GBSEs under the weighted loss function. Finally, we give our conclusion.


Author(s):  
Yuxuan Ke ◽  
Andong Li ◽  
Chengshi Zheng ◽  
Renhua Peng ◽  
Xiaodong Li

AbstractDeep learning-based speech enhancement algorithms have shown their powerful ability in removing both stationary and non-stationary noise components from noisy speech observations. But they often introduce artificial residual noise, especially when the training target does not contain the phase information, e.g., ideal ratio mask, or the clean speech magnitude and its variations. It is well-known that once the power of the residual noise components exceeds the noise masking threshold of the human auditory system, the perceptual speech quality may degrade. One intuitive way is to further suppress the residual noise components by a postprocessing scheme. However, the highly non-stationary nature of this kind of residual noise makes the noise power spectral density (PSD) estimation a challenging problem. To solve this problem, the paper proposes three strategies to estimate the noise PSD frame by frame, and then the residual noise can be removed effectively by applying a gain function based on the decision-directed approach. The objective measurement results show that the proposed postfiltering strategies outperform the conventional postfilter in terms of segmental signal-to-noise ratio (SNR) as well as speech quality improvement. Moreover, the AB subjective listening test shows that the preference percentages of the proposed strategies are over 60%.


2017 ◽  
Vol 9 (1) ◽  
pp. 67-78
Author(s):  
M. R. Hasan ◽  
A. R. Baizid

The Bayesian estimation approach is a non-classical estimation technique in statistical inference and is very useful in real world situation. The aim of this paper is to study the Bayes estimators of the parameter of exponential distribution under different loss functions and compared among them as well as with the classical estimator named maximum likelihood estimator (MLE). Since exponential distribution is the life time distribution, we have studied exponential distribution using gamma prior. Here the gamma prior is used as the prior distribution of exponential distribution for finding the Bayes estimator. In our study we also used different symmetric and asymmetric loss functions such as squared error loss function, quadratic loss function, modified linear exponential (MLINEX) loss function and non-linear exponential (NLINEX) loss function. We have used simulated data using R-coding to find out the mean squared error (MSE) of different loss functions and hence found that non-classical estimator is better than classical estimator. Finally, mean square error (MSE) of the estimators of different loss functions are presented graphically.


Author(s):  
A. Howie ◽  
D.W. McComb

The bulk loss function Im(-l/ε (ω)), a well established tool for the interpretation of valence loss spectra, is being progressively adapted to the wide variety of inhomogeneous samples of interest to the electron microscopist. Proportionality between n, the local valence electron density, and ε-1 (Sellmeyer's equation) has sometimes been assumed but may not be valid even in homogeneous samples. Figs. 1 and 2 show the experimentally measured bulk loss functions for three pure silicates of different specific gravity ρ - quartz (ρ = 2.66), coesite (ρ = 2.93) and a zeolite (ρ = 1.79). Clearly, despite the substantial differences in density, the shift of the prominent loss peak is very small and far less than that predicted by scaling e for quartz with Sellmeyer's equation or even the somewhat smaller shift given by the Clausius-Mossotti (CM) relation which assumes proportionality between n (or ρ in this case) and (ε - 1)/(ε + 2). Both theories overestimate the rise in the peak height for coesite and underestimate the increase at high energies.


2021 ◽  
Vol 13 (9) ◽  
pp. 1779
Author(s):  
Xiaoyan Yin ◽  
Zhiqun Hu ◽  
Jiafeng Zheng ◽  
Boyong Li ◽  
Yuanyuan Zuo

Radar beam blockage is an important error source that affects the quality of weather radar data. An echo-filling network (EFnet) is proposed based on a deep learning algorithm to correct the echo intensity under the occlusion area in the Nanjing S-band new-generation weather radar (CINRAD/SA). The training dataset is constructed by the labels, which are the echo intensity at the 0.5° elevation in the unblocked area, and by the input features, which are the intensity in the cube including multiple elevations and gates corresponding to the location of bottom labels. Two loss functions are applied to compile the network: one is the common mean square error (MSE), and the other is a self-defined loss function that increases the weight of strong echoes. Considering that the radar beam broadens with distance and height, the 0.5° elevation scan is divided into six range bands every 25 km to train different models. The models are evaluated by three indicators: explained variance (EVar), mean absolute error (MAE), and correlation coefficient (CC). Two cases are demonstrated to compare the effect of the echo-filling model by different loss functions. The results suggest that EFnet can effectively correct the echo reflectivity and improve the data quality in the occlusion area, and there are better results for strong echoes when the self-defined loss function is used.


Sign in / Sign up

Export Citation Format

Share Document