EMD-Based Voiced Speech Processing Method for Intelligent Recognition Systems of Stressed States in Humans

Author(s):  
Alan K. Alimuradov ◽  
Alexander Yu. Tychkov ◽  
Pyotr P. Churakov
Author(s):  
Mr. Ashish Uplenchwar

Modern day technology demands sophisticated technology to give input commands to computational devices. Prominent techniques has to be introduced to make human machine interface smooth and compatible especially speech signals. Establishing efficient communication between computer and machine plays a vital role in speech processing. This article uses one of current technologies in the Continuous Speech Recognition systems which is Reservoir Computing based Neural Network followed by likelihood conversion. Our aim is to build a stand-alone system which understands the terminology of languages. Throughout the development, measures will be taken to keep the memory requirement and the processing time of the software as small as possible.


2007 ◽  
Vol 2007 ◽  
pp. 1-5 ◽  
Author(s):  
Aïcha Bouzid ◽  
Noureddine Ellouze

This paper describes a multiscale product method (MPM) for open quotient measure in voiced speech. The method is based on determining the glottal closing and opening instants. The proposed approach consists of making the products of wavelet transform of speech signal at different scales in order to enhance the edge detection and parameter estimation. We show that the proposed method is effective and robust for detecting speech singularity. Accurate estimation of glottal closing instants (GCIs) and opening instants (GOIs) is important in a wide range of speech processing tasks. In this paper, accurate estimation of GCIs and GOIs is used to measure the local open quotient (Oq) which is the ratio of the open time by the pitch period. Multiscale product operates automatically on speech signal; the reference electroglottogram (EGG) signal is used for performance evaluation. The ratio of good GCI detection is 95.5% and that of GOI is 76%. The pitch period relative error is 2.6% and the open phase relative error is 5.6%. The relative error measured on open quotient reaches 3% for the whole Keele database.


1985 ◽  
Vol 78 (5) ◽  
pp. 1928-1928
Author(s):  
James F. Patrick ◽  
Peter Seligman ◽  
Yit C. Tong ◽  
Graeme M. Clark

Author(s):  
ZHENXUE CHEN ◽  
CHENGYUN LIU ◽  
FALIANG CHANG ◽  
XUZHEN HAN ◽  
KAIFANG WANG

Changes in light intensity and angle present a major challenge to the creation of reliable face recognition systems. The existence of bright regions and dark regions has been shown to have a serious negative impact on the performance of face recognition systems. This paper proposes a solution to this problem based on self-quotient image (SQI) processing method. In this method, bright and dark areas are processed separately without changing the essential characteristics of the image of the face. The dark and light areas are processed separately by SQI. Experimental results indicate that this Single-Light-Region and Single-Dark-Region SQI method removes the adverse effect of multi-bright and multi-dark areas better than competing methods.


Author(s):  
Vincent Wan

This chapter describes the adaptation and application of kernel methods for speech processing. It is divided into two sections dealing with speaker verification and isolated-word speech recognition applications. Significant advances in kernel methods have been realised in the field of speaker verification, particularly relating to the direct scoring of variable-length speech utterances by sequence kernel SVMs. The improvements are so substantial that most state-of-the-art speaker recognition systems now incorporate SVMs. We describe the architecture of some of these sequence kernels. Speech recognition presents additional challenges to kernel methods and their application in this area is not as straightforward as for speaker verification. We describe a sequence kernel that uses dynamic time warping to capture temporal information within the kernel directly. The formulation also extends the standard dynamic time-warping algorithm by enabling the dynamic alignment to be computed in a high-dimensional space induced by a kernel function. This kernel is shown to work well in an application for recognising low-intelligibility speech of severely dysarthric individuals.


Author(s):  
Vincent Wan

This chapter describes the adaptation and application of kernel methods for speech processing. It is divided into two sections dealing with speaker verification and isolated-word speech recognition applications. Significant advances in kernel methods have been realised in the field of speaker verification, particularly relating to the direct scoring of variable-length speech utterances by sequence kernel SVMs. The improvements are so substantial that most state-of-the-art speaker recognition systems now incorporate SVMs. We describe the architecture of some of these sequence kernels. Speech recognition presents additional challenges to kernel methods and their application in this area is not as straightforward as for speaker verification. We describe a sequence kernel that uses dynamic time warping to capture temporal information within the kernel directly. The formulation also extends the standard dynamic time-warping algorithm by enabling the dynamic alignment to be computed in a high-dimensional space induced by a kernel function. This kernel is shown to work well in an application for recognising low-intelligibility speech of severely dysarthric individuals.


Author(s):  
Setareh Safavi

This study investigated a computer-assisted pronunciation training (CAPT) software that utilized automatic speech recognition (ASR) and accent conversion technology to improve pronunciation of second language learners. Such speech processing method is capable of addressing the typical shortcoming of ASR technology for L2 pronunciation training, which is providing meaningful corrective feedback. Thirty-six student participants were involved in the treatment group. For the treatment, they worked on a CAPT tool that utilized ASR and AC to provide the participants with corrective feedback. A comparison group was also used and consisted of 36 students but worked on a different type of CAPT tool. Two trained raters took part in rating each monologue completed for the pretest, posttest, and comparison data. Findings showed preliminary statistical significance in regards to improved pronunciation for the treatment group. Additional results also showed no statistical differences in the rater scores between the control group and the experimental posttest scores.


Sign in / Sign up

Export Citation Format

Share Document