scholarly journals Orthoepical potential of speaking in the Yakut language

2018 ◽  
Vol 55 ◽  
pp. 04019
Author(s):  
Ivan Alexeyev ◽  
Irina Sorova

The article deals with the study of the general Yakut speaking base – articulation as a model. It forms the orthoepical potential of the Yakut language. Patterns of interaction of language and intonational structures of a separate word, word-combination and communicational types of phrases are defined. Criteria of speaking behavior are also defined, on the basis of which indicators of correctness of perceived acoustic parameters of articulated speech signals are revealed. This whole complex of articulatory-acoustic speech indicators allows to consider speech signal as not only physiological formation, but also as indispensable designer of cogitative sides of speaking act. Therefore, in standard peculiarities of Yakut speaking, specific parameters are formed, due to which judgmental potential of speech model is typologically formed. Logically-grammar peculiarity of speaking act as semantically important object forms standard feature of idea. Consequently, discussed acoustic parameters of existing Yakut speaking establish defining their communicational formula of types of utterances.

2021 ◽  
pp. 1-15
Author(s):  
Poovarasan Selvaraj ◽  
E. Chandra

The most challenging process in recent Speech Enhancement (SE) systems is to exclude the non-stationary noises and additive white Gaussian noise in real-time applications. Several SE techniques suggested were not successful in real-time scenarios to eliminate noises in the speech signals due to the high utilization of resources. So, a Sliding Window Empirical Mode Decomposition including a Variant of Variational Model Decomposition and Hurst (SWEMD-VVMDH) technique was developed for minimizing the difficulty in real-time applications. But this is the statistical framework that takes a long time for computations. Hence in this article, this SWEMD-VVMDH technique is extended using Deep Neural Network (DNN) that learns the decomposed speech signals via SWEMD-VVMDH efficiently to achieve SE. At first, the noisy speech signals are decomposed into Intrinsic Mode Functions (IMFs) by the SWEMD Hurst (SWEMDH) technique. Then, the Time-Delay Estimation (TDE)-based VVMD was performed on the IMFs to elect the most relevant IMFs according to the Hurst exponent and lessen the low- as well as high-frequency noise elements in the speech signal. For each signal frame, the target features are chosen and fed to the DNN that learns these features to estimate the Ideal Ratio Mask (IRM) in a supervised manner. The abilities of DNN are enhanced for the categories of background noise, and the Signal-to-Noise Ratio (SNR) of the speech signals. Also, the noise category dimension and the SNR dimension are chosen for training and testing manifold DNNs since these are dimensions often taken into account for the SE systems. Further, the IRM in each frequency channel for all noisy signal samples is concatenated to reconstruct the noiseless speech signal. At last, the experimental outcomes exhibit considerable improvement in SE under different categories of noises.


2019 ◽  
Vol 29 (06) ◽  
pp. 1950075
Author(s):  
Yumei Zhang ◽  
Xiangying Guo ◽  
Xia Wu ◽  
Suzhen Shi ◽  
Xiaojun Wu

In this paper, we propose a nonlinear prediction model of speech signal series with an explicit structure. In order to overcome some intrinsic shortcomings, such as traps at the local minimum, improper selection of parameters, and slow convergence rate, which are always caused by improper parameters generated by, typically, the low performance of least mean square (LMS) in updating kernel coefficients of the Volterra model, a uniform searching particle swarm optimization (UPSO) algorithm to optimize the kernel coefficients of the Volterra model is proposed. The second-order Volterra filter (SOVF) speech prediction model based on UPSO is established by using English phonemes, words, and phrases. In order to reduce the complexity of the model, given a user-designed tolerance of errors, we extract the reduced parameter of SOVF (RPSOVF) for acceleration. The experimental results show that in the tasks of single-frame and multiframe speech signals, both UPSO-SOVF and UPSO-RPSOVF are better than LMS-SOVF and PSO-SOVF in terms of root mean square error (RMSE) and mean absolute deviation (MAD). UPSO-SOVF and UPSO-RPSOVF can better reflect trends and regularity of speech signals, which can fully meet the requirements of speech signal prediction. The proposed model presents a nonlinear analysis and valuable model structure for speech signal series, and can be further employed in speech signal reconstruction or compression coding.


2011 ◽  
Vol 121-126 ◽  
pp. 815-819 ◽  
Author(s):  
Yu Qiang Qin ◽  
Xue Ying Zhang

Ensemble empirical mode decomposition(EEMD) is a newly developed method aimed at eliminating mode mixing present in the original empirical mode decomposition (EMD). To evaluate the performance of this new method, this paper investigates the effect of two parameters pertinent to EEMD: the emotional envelop and the number of emotional ensemble trials. At the same time, the proposed technique has been utilized for four kinds of emotional(angry、happy、sad and neutral) speech signals, and compute the number of each emotional ensemble trials. We obtain an emotional envelope by transforming the IMFe of emotional speech signals, and obtain a new method of emotion recognition according to different emotional envelop and emotional ensemble trials.


Author(s):  
Johan J. Hanekom

The masking property of the auditory system is well known in the context of two-tone masking. For complex (speech) signals, the effects of masking are less well known. This paper explores the masking of speech signals, by calculating which parts of the speech signal is inaudible because of masking. The theory for the masking of one tone by another is expanded, to establish an equation for the masking threshold. This masking threshold takes into account the masking of each frequency component on all other frequency components. Speech is then synthesized in which the supposedly inaudible parts of the speech signal are discarded, and the effects are evaluated in a very simple psychoacoustic experiment. It is shown that the information below the masking threshold is indeed redundant.


2019 ◽  
Vol 33 (35) ◽  
pp. 1950438 ◽  
Author(s):  
Manish Gupta ◽  
Shambhu Shankar Bharti ◽  
Suneeta Agarwal

Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: “IIT-Madras speech synthesis and recognition” (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and “ELSDSR” (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30–32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.


1996 ◽  
Vol 8 (2) ◽  
pp. 144-148
Author(s):  
Manabu Ishihara ◽  
◽  
Jun Shirataki ◽  

In this study, a signal was synthesized by removing a speech signal at a certain uniform interval and inserting noise into those signal–absent parts. An auditory experiment was conducted to make clear how humans can hear such synthesized signals. In other words, the relationship between the size of noise and the intensity of signal sound and the relationship between the size of noise and clearness degree were made clear. On the basis of the result of the experiment, in case the size of the white noise inserted is smaller than OdB, a degree of sentence comprehension of over 90 percent is obtained as long as the removed intervals amount to around 60 to 50 percent. In this case, the degree of sentence comprehension is seen to have improved by over 30 percent, in view of the fact that the single syllable comprehension is around 50 to 60 percent. Starting with the region where the removed intervals exceed 50 percent, the degree of sentence comprehension goes down sharply, but this is considered to be due to an effect of the insertion of the white noise. On the basis of the results of this experiment, one of the auditory characteristics to be realized by a digital circuit was made clear.


2017 ◽  
Vol 2017 ◽  
pp. 1-9 ◽  
Author(s):  
Guihua Wen ◽  
Huihui Li ◽  
Jubing Huang ◽  
Danyang Li ◽  
Eryang Xun

Now the human emotions can be recognized from speech signals using machine learning methods; however, they are challenged by the lower recognition accuracies in real applications due to lack of the rich representation ability. Deep belief networks (DBN) can automatically discover the multiple levels of representations in speech signals. To make full of its advantages, this paper presents an ensemble of random deep belief networks (RDBN) method for speech emotion recognition. It firstly extracts the low level features of the input speech signal and then applies them to construct lots of random subspaces. Each random subspace is then provided for DBN to yield the higher level features as the input of the classifier to output an emotion label. All outputted emotion labels are then fused through the majority voting to decide the final emotion label for the input speech signal. The conducted experimental results on benchmark speech emotion databases show that RDBN has better accuracy than the compared methods for speech emotion recognition.


2017 ◽  
Vol 42 (2) ◽  
pp. 287-295 ◽  
Author(s):  
Yash Vardhan Varshney ◽  
Zia Ahmad Abbasi ◽  
Musiur Raza Abidi ◽  
Omar Farooq

Abstract Application of wavelet decomposition is described to speed up the mixed speech signal separation with the help of non-negative matrix factorisation (NMF). It is assumed that the basis vectors of training data of individual speakers had been recorded. In this paper, the spectrogram magnitude of a mixed signal has been factorised with the help of NMF with consideration of sparseness of speech signals. The high frequency components of signal contain very small amount of signal energy. By rejecting the high frequency components, the size of input signal is reduced, which reduces the computational time of matrix factorisation. The signal of lower energy has been separated by using wavelet decomposition. The present work is done for wideband microphone speech signal and standard audio signal from digital video equipment. This shows an improvement in the separation capability using the proposed model as compared with an existing one in terms of correlation between separated and original signals. Obtained signal to distortion ratio (SDR) and signal to interference ratio (SIR) are also larger as compare of the existing model. The proposed model also shows a reduction in computational time, which results in faster operation.


Sign in / Sign up

Export Citation Format

Share Document