Speech Enhancement Using Heterogeneous Information

2018 ◽  
Vol 10 (3) ◽  
pp. 46-59
Author(s):  
Yan Xiong ◽  
Fang Xu ◽  
Qiang Chen ◽  
Jun Zhang

This article describes how to use heterogeneous information in speech enhancement. In most of the current speech enhancement systems, clean speeches are recovered only from the signals collected by acoustic microphones, which will be greatly affected by the acoustic noises. However, heterogeneous information from different kinds of sensors, which is usually called the “multi-stream,” are seldom used in speech enhancement because the speech waveforms cannot be recovered from the signals provided by many kinds of sensors. In this article, the authors propose a new model-based multi-stream speech enhancement framework that can make use of the heterogeneous information provided by the signals from different kinds of sensors even when some of them are not directly related to the speech waveform. Then a new speech enhancement scheme using the acoustic and throat microphone recordings is also proposed based on the new speech enhancement framework. Experimental results show that the proposed scheme outperforms several single-stream speech enhancement methods in different noisy environments.

2020 ◽  
pp. 1060-1074
Author(s):  
Yan Xiong ◽  
Fang Xu ◽  
Qiang Chen ◽  
Jun Zhang

This article describes how to use heterogeneous information in speech enhancement. In most of the current speech enhancement systems, clean speeches are recovered only from the signals collected by acoustic microphones, which will be greatly affected by the acoustic noises. However, heterogeneous information from different kinds of sensors, which is usually called the “multi-stream,” are seldom used in speech enhancement because the speech waveforms cannot be recovered from the signals provided by many kinds of sensors. In this article, the authors propose a new model-based multi-stream speech enhancement framework that can make use of the heterogeneous information provided by the signals from different kinds of sensors even when some of them are not directly related to the speech waveform. Then a new speech enhancement scheme using the acoustic and throat microphone recordings is also proposed based on the new speech enhancement framework. Experimental results show that the proposed scheme outperforms several single-stream speech enhancement methods in different noisy environments.


2020 ◽  
Vol 10 (3) ◽  
pp. 1167 ◽  
Author(s):  
Lu Zhang ◽  
Mingjiang Wang ◽  
Qiquan Zhang ◽  
Ming Liu

The performance of speech enhancement algorithms can be further improved by considering the application scenarios of speech products. In this paper, we propose an attention-based branchy neural network framework by incorporating the prior environmental information for noise reduction. In the whole denoising framework, first, an environment classification network is trained to distinguish the noise type of each noisy speech frame. Guided by this classification network, the denoising network gradually learns respective noise reduction abilities in different branches. Unlike most deep neural network (DNN)-based methods, which learn speech reconstruction capabilities with a common neural structure from all training noises, the proposed branchy model obtains greater performance benefits from the specially trained branches of prior known noise interference types. Experimental results show that the proposed branchy DNN model not only preserved better enhanced speech quality and intelligibility in seen noisy environments, but also obtained good generalization in unseen noisy environments.


2013 ◽  
Vol 24 (05) ◽  
pp. 1350032 ◽  
Author(s):  
QIANG GUO ◽  
YANG LI ◽  
JIAN-GUO LIU

The process of heat conduction (HC) has recently found application in the information filtering [Zhang et al., Phys. Rev. Lett.99, 154301 (2007)], which is of high diversity but low accuracy. The classical HC model predicts users' potential interested objects based on their interesting objects regardless to the negative opinions. In terms of the users' rating scores, we present an improved user-based HC (UHC) information model by taking into account users' positive and negative opinions. Firstly, the objects rated by users are divided into positive and negative categories, then the predicted interesting and dislike object lists are generated by the UHC model. Finally, the recommendation lists are constructed by filtering out the dislike objects from the interesting lists. By implementing the new model based on nine similarity measures, the experimental results for MovieLens and Netflix datasets show that the new model considering negative opinions could greatly enhance the accuracy, measured by the average ranking score, from 0.049 to 0.036 for Netflix and from 0.1025 to 0.0570 for Movielens dataset, reduced by 26.53% and 44.39%, respectively. Since users prefer to give positive ratings rather than negative ones, the negative opinions contain much more information than the positive ones, the negative opinions, therefore, are very important for understanding users' online collective behaviors and improving the performance of HC model.


2009 ◽  
Vol 44 (5) ◽  
pp. 1047-1052 ◽  
Author(s):  
José-Luis Vivancos ◽  
Juan Soto ◽  
Israel Perez ◽  
Jose V. Ros-Lis ◽  
Ramón Martínez-Máñez

2015 ◽  
Vol 23 (21) ◽  
pp. 27376 ◽  
Author(s):  
Mitradeep Sarkar ◽  
Jean-François Bryche ◽  
Julien Moreau ◽  
Mondher Besbes ◽  
Grégory Barbillon ◽  
...  

2021 ◽  
Vol 11 (2) ◽  
pp. 721
Author(s):  
Hyung Yong Kim ◽  
Ji Won Yoon ◽  
Sung Jun Cheon ◽  
Woo Hyun Kang ◽  
Nam Soo Kim

Recently, generative adversarial networks (GANs) have been successfully applied to speech enhancement. However, there still remain two issues that need to be addressed: (1) GAN-based training is typically unstable due to its non-convex property, and (2) most of the conventional methods do not fully take advantage of the speech characteristics, which could result in a sub-optimal solution. In order to deal with these problems, we propose a progressive generator that can handle the speech in a multi-resolution fashion. Additionally, we propose a multi-scale discriminator that discriminates the real and generated speech at various sampling rates to stabilize GAN training. The proposed structure was compared with the conventional GAN-based speech enhancement algorithms using the VoiceBank-DEMAND dataset. Experimental results showed that the proposed approach can make the training faster and more stable, which improves the performance on various metrics for speech enhancement.


Langmuir ◽  
2004 ◽  
Vol 20 (23) ◽  
pp. 10055-10061 ◽  
Author(s):  
Kurosch Rezwan ◽  
Lorenz P. Meier ◽  
Mandana Rezwan ◽  
Janos Vörös ◽  
Marcus Textor ◽  
...  

2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Yan Jin ◽  
Wenyu Jiang ◽  
Jianlong Shao ◽  
Jin Lu

The nonlocal means filter plays an important role in image denoising. We propose in this paper an image denoising model which is a suitable improvement of the nonlocal means filter. We compare this model with the nonlocal means filter, both theoretically and experimentally. Experiment results show that this new model provides good results for image denoising. Particularly, it is better than the nonlocal means filter when we consider the denoising for natural images with high textures.


2021 ◽  
Vol 11 (15) ◽  
pp. 7104
Author(s):  
Xu Yang ◽  
Ziyi Huan ◽  
Yisong Zhai ◽  
Ting Lin

Nowadays, personalized recommendation based on knowledge graphs has become a hot spot for researchers due to its good recommendation effect. In this paper, we researched personalized recommendation based on knowledge graphs. First of all, we study the knowledge graphs’ construction method and complete the construction of the movie knowledge graphs. Furthermore, we use Neo4j graph database to store the movie data and vividly display it. Then, the classical translation model TransE algorithm in knowledge graph representation learning technology is studied in this paper, and we improved the algorithm through a cross-training method by using the information of the neighboring feature structures of the entities in the knowledge graph. Furthermore, the negative sampling process of TransE algorithm is improved. The experimental results show that the improved TransE model can more accurately vectorize entities and relations. Finally, this paper constructs a recommendation model by combining knowledge graphs with ranking learning and neural network. We propose the Bayesian personalized recommendation model based on knowledge graphs (KG-BPR) and the neural network recommendation model based on knowledge graphs(KG-NN). The semantic information of entities and relations in knowledge graphs is embedded into vector space by using improved TransE method, and we compare the results. The item entity vectors containing external knowledge information are integrated into the BPR model and neural network, respectively, which make up for the lack of knowledge information of the item itself. Finally, the experimental analysis is carried out on MovieLens-1M data set. The experimental results show that the two recommendation models proposed in this paper can effectively improve the accuracy, recall, F1 value and MAP value of recommendation.


Sign in / Sign up

Export Citation Format

Share Document