An improved voice activity detection method based on spectral features and neural network

2021 ◽  
Vol 263 (2) ◽  
pp. 4570-4580
Author(s):  
Liu Ting ◽  
Luo Xinwei

The recognition accuracy of speech signal and noise signal is greatly affected under low signal-to-noise ratio. The neural network with parameters obtained from the training set can achieve good results in the existing data, but is poor for the samples with different the environmental noises. This method firstly extracts the features based on the physical characteristics of the speech signal, which have good robustness. It takes the 3-second data as samples, judges whether there is speech component in the data under low signal-to-noise ratios, and gives a decision tag for the data. If a reasonable trajectory which is like the trajectory of speech is found, it is judged that there is a speech segment in the 3-second data. Then, the dynamic double threshold processing is used for preliminary detection, and then the global double threshold value is obtained by K-means clustering. Finally, the detection results are obtained by sequential decision. This method has the advantages of low complexity, strong robustness, and adaptability to multi-national languages. The experimental results show that the performance of the method is better than that of traditional methods under various signal-to-noise ratios, and it has good adaptability to multi language.

2017 ◽  
Vol 14 (1) ◽  
pp. 149-160
Author(s):  
Lazar Cokic ◽  
Aleksandra Marjanovic ◽  
Sanja Vujnovic ◽  
Zeljko Djurovic

In this paper a short theoretical overview of differential quantizer and its implementations is given. Afterward, the effect of the order of prediction in differential quantizer and the effect of the difference in order of predictor in the input and output of differential quantizer is analyzed. Then it was proceeded with the examination of the robustness of the differential quantizer in the case in which a noise signal is brought to the input of the differential quantizer, instead of the clean speech signal. The analysis was conducted with a uniform distribution, as well as the noise with the gaussian distribution, and the obtained results were adequately commented on. Also, experimentally a limit was set which refers to the intensity of the noise and still enable results which are better that a regular uniform quantizer. The whole analysis is done by using the fixed number of bits in quantization, i.e. 12-bit quantizer is used in all the implementations of differential quantizer. In the conclusion of this paper there is a discussion about the possibility of implementing a differential quantizer which will be able to recognize which noise attacks the system, and in addition to that, in what form it adapts its coefficients so that it at any moment acquires the optimal signal to noise ratio.


Author(s):  
Satvir Singh

Steganography is the special art of hidding important and confidential information in appropriate multimedia carrier. It also restrict the detection of  hidden messages. In this paper we proposes steganographic method based on dct and entropy thresholding technique. The steganographic algorithm uses random function in order to select block of the image where the elements of the binary sequence of a secret message will be inserted. Insertion takes place at the lower frequency  AC coefficients of the  block. Before we insert the secret  message. Image under goes dc transformations after insertion of the secret message we apply inverse dc transformations. Secret message will only be inserted into a particular block if  entropy value of that particular block is greater then threshold value of the entropy and if block is selected by the random function. In  Experimental work we calculated the peak signal to noise ratio(PSNR), Absolute difference , Relative entropy. Proposed algorithm give high value of PSNR  and low value of Absolute difference which clearly indicate level of distortion in image due to insertion of secret message is reduced. Also value of  relative entropy is close to zero which clearly indicate proposed algorithm is sufficiently secure. 


Author(s):  
Navaamsini Boopalan ◽  
Agileswari K. Ramasamy ◽  
Farrukh Hafiz Nagi

Array sensors are widely used in various fields such as radar, wireless communications, autonomous vehicle applications, medical imaging, and astronomical observations fault diagnosis. Array signal processing is accomplished with a beam pattern which is produced by the signal's amplitude and phase at each element of array. The beam pattern can get rigorously distorted in case of failure of array element and effect its Signal to Noise Ratio (SNR) badly. This paper proposes on a Hybrid Neural Network layer weight Goal Attain Optimization (HNNGAO) method to generate a recovery beam pattern which closely resembles the original beam pattern with remaining elements in the array. The proposed HNNGAO method is compared with classic synthesize beam pattern goal attain method and failed beam pattern generated in MATLAB environment. The results obtained proves that the proposed HNNGAO method gives better SNR ratio with remaining working element in linear array compared to classic goal attain method alone. Keywords: Backpropagation; Feed-forward neural network; Goal attain; Neural networks; Radiation pattern; Sensor arrays; Sensor failure; Signal-to-Noise Ratio (SNR)


Author(s):  
Mourad Talbi ◽  
Med Salim Bouhlel

Background: In this paper, we propose a secure image watermarking technique which is applied to grayscale and color images. It consists in applying the SVD (Singular Value Decomposition) in the Lifting Wavelet Transform domain for embedding a speech image (the watermark) into the host image. Methods: It also uses signature in the embedding and extraction steps. Its performance is justified by the computation of PSNR (Pick Signal to Noise Ratio), SSIM (Structural Similarity), SNR (Signal to Noise Ratio), SegSNR (Segmental SNR) and PESQ (Perceptual Evaluation Speech Quality). Results: The PSNR and SSIM are used for evaluating the perceptual quality of the watermarked image compared to the original image. The SNR, SegSNR and PESQ are used for evaluating the perceptual quality of the reconstructed or extracted speech signal compared to the original speech signal. Conclusion: The Results obtained from computation of PSNR, SSIM, SNR, SegSNR and PESQ show the performance of the proposed technique.


2021 ◽  
Author(s):  
S.V. Zimina

Setting up artificial neural networks using iterative algorithms is accompanied by fluctuations in weight coefficients. When an artificial neural network solves the problem of allocating a useful signal against the background of interference, fluctuations in the weight vector lead to a deterioration of the useful signal allocated by the network and, in particular, losses in the output signal-to-noise ratio. The goal of the research is to perform a statistical analysis of an artificial neural network, that includes analysis of losses in the output signal-to-noise ratio associated with fluctuations in the weight coefficients of an artificial neural network. We considered artificial neural networks that are configured using discrete gradient, fast recurrent algorithms with restrictions, and the Hebb algorithm. It is shown that fluctuations lead to losses in the output signal/noise ratio, the level of which depends on the type of algorithm under consideration and the speed of setting up an artificial neural network. Taking into account the fluctuations of the weight vector in the analysis of the output signal-to-noise ratio allows us to correlate the permissible level of loss in the output signal-to-noise ratio and the speed of network configuration corresponding to this level when working with an artificial neural network.


2020 ◽  
Author(s):  
chaofeng lan ◽  
yuanyuan Zhang ◽  
hongyun Zhao

Abstract This paper draws on the training method of Recurrent Neural Network (RNN), By increasing the number of hidden layers of RNN and changing the layer activation function from traditional Sigmoid to Leaky ReLU on the input layer, the first group and the last set of data are zero-padded to enhance the effective utilization of data such that the improved reduction model of Denoise Recurrent Neural Network (DRNN) with high calculation speed and good convergence is constructed to solve the problem of low speaker recognition rate in noisy environment. According to this model, the random semantic speech signal with a sampling rate of 16 kHz and a duration of 5 seconds in the speech library is studied. The experimental settings of the signal-to-noise ratios are − 10dB, -5dB, 0dB, 5dB, 10dB, 15dB, 20dB, 25dB. In the noisy environment, the improved model is used to denoise the Mel Frequency Cepstral Coefficients (MFCC) and the Gammatone Frequency Cepstral Coefficents (GFCC), impact of the traditional model and the improved model on the speech recognition rate is analyzed. The research shows that the improved model can effectively eliminate the noise of the feature parameters and improve the speech recognition rate. When the signal-to-noise ratio is low, the speaker recognition rate can be more obvious. Furthermore, when the signal-to-noise ratio is 0dB, the speaker recognition rate of people is increased by 40%, which can be 85% improved compared with the traditional speech model. On the other hand, with the increase in the signal-to-noise ratio, the recognition rate is gradually increased. When the signal-to-noise ratio is 15dB, the recognition rate of speakers is 93%.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Fayadh Alenezi ◽  
K. C. Santosh

One of the major shortcomings of Hopfield neural network (HNN) is that the network may not always converge to a fixed point. HNN, predominantly, is limited to local optimization during training to achieve network stability. In this paper, the convergence problem is addressed using two approaches: (a) by sequencing the activation of a continuous modified HNN (MHNN) based on the geometric correlation of features within various image hyperplanes via pixel gradient vectors and (b) by regulating geometric pixel gradient vectors. These are achieved by regularizing proposed MHNNs under cohomology, which enables them to act as an unconventional filter for pixel spectral sequences. It shifts the focus to both local and global optimizations in order to strengthen feature correlations within each image subspace. As a result, it enhances edges, information content, contrast, and resolution. The proposed algorithm was tested on fifteen different medical images, where evaluations were made based on entropy, visual information fidelity (VIF), weighted peak signal-to-noise ratio (WPSNR), contrast, and homogeneity. Our results confirmed superiority as compared to four existing benchmark enhancement methods.


2013 ◽  
Vol 427-429 ◽  
pp. 1718-1722
Author(s):  
Lin Sun ◽  
Ran Wei ◽  
Fu Ting Bao ◽  
Xian Zhang Tian

To reduce the amount of computing resources, a fast algorithm of the average power spectrum and signal-to-noise ratio was presented based on rigorous derivation of the formula. Also, it proved the rule gained from computational experiments. Besides, a method called fitting-optimization to determine the classification threshold value was proposed. It improves the accuracy by about 7% for human gene.


2019 ◽  
Vol 9 (21) ◽  
pp. 4624
Author(s):  
Uzokboy Ummatov ◽  
Kyungchun Lee

This paper proposes an adaptive threshold-aided K-best sphere decoding (AKSD) algorithm for large multiple-input multiple-output systems. In the proposed scheme, to reduce the average number of visited nodes compared to the conventional K-best sphere decoding (KSD), the threshold for retaining the nodes is adaptively determined at each layer of the tree. Specifically, we calculate the adaptive threshold based on the signal-to-noise ratio and index of the layer. The ratio between the first and second smallest accumulated path metrics at each layer is also exploited to determine the threshold value. In each layer, in addition to the K paths associated with the smallest path metrics, we also retain the paths whose path metrics are within the threshold from the Kth smallest path metric. The simulation results show that the proposed AKSD provides nearly the same bit error rate performance as the conventional KSD scheme while achieving a significant reduction in the average number of visited nodes, especially at high signal-to-noise ratios.


2018 ◽  
Vol 7 (2.29) ◽  
pp. 700 ◽  
Author(s):  
O Hayat ◽  
R Ngah ◽  
Yasser Zahedi

Device to Device (D2D) communication is a new paradigm for next-generation wireless systems to offload data traffic. A device needs to discover neighbor devices on the certain channel to initiate the D2D communication within the minimum period. A device discovery technique based on Global Positioning System (GPS) and neighbor awareness base are proposed for in-band cellular networks. This method is called network-centric approach, and it improves the device discovery efficiency, accuracy, and channel capacity. The differential code is applied to measure the signal to noise ratio of each discovered device. In the case that the signal to noise ratio (SNR) of two devices is above a specified threshold value, then these two devices are qualified for D2D communication. Two procedures are explored for device discovery; discovery by CN (core network) and eNB (evolved node B) cooperation with the help of GPS and neighbor awareness. Using ‘Haversine’ formula, SNR base distance is calculated. Results show an increment in the channel capacity relative to SNR obtained for each device.  


Sign in / Sign up

Export Citation Format

Share Document