speech signal
Recently Published Documents


TOTAL DOCUMENTS

1630
(FIVE YEARS 355)

H-INDEX

34
(FIVE YEARS 6)

Author(s):  
Nsiri Benayad ◽  
Zayrit Soumaya ◽  
Belhoussine Drissi Taoufiq ◽  
Ammoumou Abdelkrim

<span lang="EN-US">Among the several ways followed for detecting Parkinson's disease, there is the one based on the speech signal, which is a symptom of this disease. In this paper focusing on the signal analysis, a data of voice records has been used. In these records, the patients were asked to utter vowels “a”, “o”, and “u”. Discrete wavelet transforms (DWT) applied to the speech signal to fetch the variable resolution that could hide the most important information about the patients. From the approximation a3 obtained by Daubechies wavelet at the scale 2 level 3, 21 features have been extracted: a <a name="_Hlk88480766"></a>linear predictive coding (LPC), energy, zero-crossing rate (ZCR), mel frequency cepstral coefficient (MFCC), and wavelet Shannon entropy. Then for the classification, the K-nearest neighbour (KNN) has been used. The KNN is a type of instance-based learning that can make a decision based on approximated local functions, besides the ensemble learning. However, through the learning process, the choice of the training features can have a significant impact on overall the process. So, here it stands out the role of the genetic algorithm (GA) to select the best training features that give the best accurate classification.</span>


2022 ◽  
Vol 3 (4) ◽  
pp. 295-307
Author(s):  
Subarna Shakya

Personal computer-based data collection and analysis systems may now be more resilient due to the recent advances in digital signal processing technology. The signal processing approach known as Speaker Recognition, uses the specific information contained in voice waves to automatically identify the speaker. For a single source, this study examines systems that can recognize a wide range of emotional states in speech. Since it offers insight into human brain states, it's a hot issue in the development during the interface between human and computer arrangement for speech processing. Mostly, it is necessary to recognize the emotional state of people in the arrangement. This research analyses an effort to discern various emotional stages such as anger, joy, neutral, fear and sadness by classification methods. The acoustic feature, a measure of unpredictability, is used in conjunction with a non-linear signal quantification approach to identify emotions. The unpredictability of all the emotional signals is included in a feature vector constructed from the calculated entropy measurements. In the next step, the acoustic features through speech signal are used for the training in the proposed neural network that are given to linear discriminator analysis approach for further greater classification with acoustic feature extraction. Besides, this research article compares the proposed work with various modern classifiers such as K- nearest neighbor, support vector machine and linear discriminator approach. Moreover, this proposed algorithm is based on acoustic features in Linear Discriminant Analysis (LDA) with acoustic feature extraction machine algorithm. The great advantage of this proposed algorithm is that it separates negative and positive features of emotions and provides good results during classification. According to the results from efficient cross-validation in the proposed framework, accessible sample of dataset of Emotional Speech, a single-source LDA classifier can recognize emotions in speech signals with above 90 percent of accuracy for various emotional stages.


2022 ◽  
Author(s):  
Isabelle Franz ◽  
Christine A. Knoop ◽  
Gerrit Kentner ◽  
Sascha Rothbart ◽  
Vanessa Kegel ◽  
...  

Current systems for predicting prosodic prominence and boundaries in texts focus on syntax/semantic-based automatic decoding of sentences that need to be annotated syntactically (Atterer &amp; Klein 2002; Windmann et al. 2011). However, to date, there is no phonetically validated replicable system for manually coding prosodic boundaries and syllable prominence in longer sentences or texts. Based on work in the fields of metrical phonology (Liberman &amp; Prince 1977), phrase formation (Hayes 1989) and existing pause coding systems (Gee and Grosjean 1983), we developed a manual for coding prosodic boundaries (with 6 degrees of juncture) and syllable prominence (8 degrees). Three independent annotators applied the coding system to the beginning pages of four German novels and to four short stories (20 058 syllables, Fleiss kappa .82). For the phonetic validation, eight professional speakers read the excerpts of the novels aloud. We annotated the speech signal automatically with MAUS (Schiel 1999). Using PRAAT (Boersma &amp; Weenink 2019), we extracted pitch, duration, and intensity for each syllable, as well as several phonetic parameters for pauses, and compared all measures obtained to the theoretically predicted levels of syllable prominence and prosodic boundary strength. The validation with the speech signal shows that our annotation system reliably predicts syllable prominence and prosodic boundaries. Since our annotation works with plain text, there are many potential applications of the coding system, covering research on prose rhythm, synthetic speech and (psycho)linguistic research on prosody.


2022 ◽  
Vol 70 (2) ◽  
pp. 2953-2969
Author(s):  
Omar M. El-Habbak ◽  
Abdelrahman M. Abdelalim ◽  
Nour H. Mohamed ◽  
Habiba M. Abd-Elaty ◽  
Mostafa A. Hammouda ◽  
...  

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Mourad Talbi ◽  
Med Salim Bouhlel

Speech enhancement has gained considerable attention in the employment of speech transmission via the communication channel, speaker identification, speech-based biometric systems, video conference, hearing aids, mobile phones, voice conversion, microphones, and so on. The background noise processing is needed for designing a successful speech enhancement system. In this work, a new speech enhancement technique based on Stationary Bionic Wavelet Transform (SBWT) and Minimum Mean Square Error (MMSE) Estimate of Spectral Amplitude is proposed. This technique consists at the first step in applying the SBWT to the noisy speech signal, in order to obtain eight noisy wavelet coefficients. The denoising of each of those coefficients is performed through the application of the denoising method based on MMSE Estimate of Spectral Amplitude. The SBWT inverse, S B W T − 1 , is applied to the obtained denoised stationary wavelet coefficients for finally obtaining the enhanced speech signal. The proposed technique’s performance is proved by the calculation of the Signal to Noise Ratio (SNR), the Segmental SNR (SSNR), and the Perceptual Evaluation of Speech Quality (PESQ).


Webology ◽  
2021 ◽  
Vol 19 (1) ◽  
pp. 70-82
Author(s):  
Zeina Hassan Razaq

Securing any communication system where important data may be transmitted through the channel is a very crucial issue. One of the good solutions in providing security for the speech is to use speech scrambling techniques. The chaotic system used in security has properties that make it a good choice for scrambling speech signal and the optimisation algorithm can provide a perfect performance when used to enhance the hybrid of more than one method. In this paper, we suggest a system that uses an optimisation method, namely, particle swarm optimisation. The evaluation measures prove that the output of the optimisation method has better performance among the methods used in the comparison, including chaotic maps and hybrid chaotic maps.


2021 ◽  
Vol 15 ◽  
Author(s):  
Florine L. Bachmann ◽  
Ewen N. MacDonald ◽  
Jens Hjortkjær

Linearized encoding models are increasingly employed to model cortical responses to running speech. Recent extensions to subcortical responses suggest clinical perspectives, potentially complementing auditory brainstem responses (ABRs) or frequency-following responses (FFRs) that are current clinical standards. However, while it is well-known that the auditory brainstem responds both to transient amplitude variations and the stimulus periodicity that gives rise to pitch, these features co-vary in running speech. Here, we discuss challenges in disentangling the features that drive the subcortical response to running speech. Cortical and subcortical electroencephalographic (EEG) responses to running speech from 19 normal-hearing listeners (12 female) were analyzed. Using forward regression models, we confirm that responses to the rectified broadband speech signal yield temporal response functions consistent with wave V of the ABR, as shown in previous work. Peak latency and amplitude of the speech-evoked brainstem response were correlated with standard click-evoked ABRs recorded at the vertex electrode (Cz). Similar responses could be obtained using the fundamental frequency (F0) of the speech signal as model predictor. However, simulations indicated that dissociating responses to temporal fine structure at the F0 from broadband amplitude variations is not possible given the high co-variance of the features and the poor signal-to-noise ratio (SNR) of subcortical EEG responses. In cortex, both simulations and data replicated previous findings indicating that envelope tracking on frontal electrodes can be dissociated from responses to slow variations in F0 (relative pitch). Yet, no association between subcortical F0-tracking and cortical responses to relative pitch could be detected. These results indicate that while subcortical speech responses are comparable to click-evoked ABRs, dissociating pitch-related processing in the auditory brainstem may be challenging with natural speech stimuli.


Sign in / Sign up

Export Citation Format

Share Document