timit database
Recently Published Documents


TOTAL DOCUMENTS

26
(FIVE YEARS 5)

H-INDEX

8
(FIVE YEARS 2)

2021 ◽  
Author(s):  
Danoush Hosseinzadeh

This work presents two hardware independent and ubiquitous biometric solutions that can significantly improve security for computer and telephone related applications. Firstly, for computer security, a GMM based keystroke verification method is proposed along with the up-up keystroke latency (UUKL) feature which is being used for the first time. This method can verify the identity of users based on their typing pattern and achieved a FAR of 5.1%, a FRR of 6.5%, and a EER of 5.8% for a database of 41 users. Due to many inconsistencies in previous works, a new keystroke protocol has also been proposed. This protocol makes a number of recommendations concerning how to improve performance, reliability, and accuracy of any keystroke recognition system. Secondly, a GMM based text-independent speaker identification scheme is also proposed that utilizes novel spectral features for better speaker discrimination. Based on 100 users from the TIMIT database, these features achieved an identification error of 1.22% by incorporating information about the source of the speech signal. This represents a 6% improvement over the MFCC based features.


2021 ◽  
Author(s):  
Danoush Hosseinzadeh

This work presents two hardware independent and ubiquitous biometric solutions that can significantly improve security for computer and telephone related applications. Firstly, for computer security, a GMM based keystroke verification method is proposed along with the up-up keystroke latency (UUKL) feature which is being used for the first time. This method can verify the identity of users based on their typing pattern and achieved a FAR of 5.1%, a FRR of 6.5%, and a EER of 5.8% for a database of 41 users. Due to many inconsistencies in previous works, a new keystroke protocol has also been proposed. This protocol makes a number of recommendations concerning how to improve performance, reliability, and accuracy of any keystroke recognition system. Secondly, a GMM based text-independent speaker identification scheme is also proposed that utilizes novel spectral features for better speaker discrimination. Based on 100 users from the TIMIT database, these features achieved an identification error of 1.22% by incorporating information about the source of the speech signal. This represents a 6% improvement over the MFCC based features.


Author(s):  
C. Carmona-Duarte ◽  
M. A. Ferrer ◽  
R. Plamondon ◽  
A. Gómez-Rodellar ◽  
P. Gómez-Vilda

AbstractHuman movement studies and analyses have been fundamental in many scientific domains, ranging from neuroscience to education, pattern recognition to robotics, health care to sports, and beyond. Previous speech motor models were proposed to understand how speech movement is produced and how the resulting speech varies when some parameters are changed. However, the inverse approach, in which the muscular response parameters and the subject’s age are derived from real continuous speech, is not possible with such models. Instead, in the handwriting field, the kinematic theory of rapid human movements and its associated Sigma-lognormal model have been applied successfully to obtain the muscular response parameters. This work presents a speech kinematics-based model that can be used to study, analyze, and reconstruct complex speech kinematics in a simplified manner. A method based on the kinematic theory of rapid human movements and its associated Sigma-lognormal model are applied to describe and to parameterize the asymptotic impulse response of the neuromuscular networks involved in speech as a response to a neuromotor command. The method used to carry out transformations from formants to a movement observation is also presented. Experiments carried out with the (English) VTR-TIMIT database and the (German) Saarbrucken Voice Database, including people of different ages, with and without laryngeal pathologies, corroborate the link between the extracted parameters and aging, on the one hand, and the proportion between the first and second formants required in applying the kinematic theory of rapid human movements, on the other. The results should drive innovative developments in the modeling and understanding of speech kinematics.


2019 ◽  
Vol 22 (3) ◽  
pp. 851-863 ◽  
Author(s):  
Musab T. S. Al-Kaltakchi ◽  
Raid Rafi Omar Al-Nima ◽  
Mohammed A. M. Abdullah ◽  
Hikmat N. Abdullah

2019 ◽  
Vol 4 (4) ◽  
pp. 719-732 ◽  
Author(s):  
Steven Sandoval ◽  
Rene L. Utianski ◽  
Heike Lehnert-LeHouillier

Purpose The use and study of formant frequencies for the description of vowels is commonplace in acoustic phonetics and in attempts to understand results of speech perception studies. Numerous studies have shown that listeners are better able to distinguish vowels when the acoustic parameters are based on spectral information extracted at multiple time points during the duration of the vowel, rather than at a single point in time. The purpose of this study was to validate an automated method for extracting formant trajectories, using information across the time course of production, and subsequently characterize the formant trajectories of vowels using a large, diverse corpus of speech samples. Method Using software tools, we automatically extract the 1st 2 formant frequencies (F1/F2) at 10 equally spaced points over a vowel's duration. Then, we compute the average trajectory for each vowel token. The 1,600 vowel observations in the Hillenbrand database and the more than 50,000 vowel observations in the TIMIT database are analyzed. Results First, we validate the automated method by comparing against the manually obtained values in the Hillenbrand database. Analyses reveal a strong correlation between the automated and manual formant estimates. Then, we use the automated method on the 630 speakers in the TIMIT database to compute average formant trajectories. We noted that phonemes that have close F1 and F2 values at the temporal midpoint often exhibit formant trajectories progressing in different directions, hence highlighting the importance of formant trajectory progression. Conclusions The results of this study support the importance of formant trajectories over single-point measurements for the successful discrimination of vowels. Furthermore, this study provides a baseline for the formant trajectories for men and women across a broad range of dialects of Standard American English.


2014 ◽  
Vol 1044-1045 ◽  
pp. 1370-1374
Author(s):  
Qiang Li ◽  
Yan Hong Liu

Although a great success has been achieved under the environment of lab where the training data is sufficient and the surroundings are quiet, speaker identification (SI) in practical use still remains a challenge because of the complicated environment. To tackle this challenge, a hybrid system of Gaussian mixture model-support vector machines (GMM-SVM) is proposed in this paper. SVM can do well with less data but is computationally expensive while GMM is computationally inexpensive but needs more data to perform adequately. In this paper, SVM and GMM are parallel in both the training and testing phase, the judgment of them are fused to make the final decision: the person with the largest score is identified as the true speaker. Universal background model (UBM) is used in GMM to improve the recognition accuracy. The system is evaluated on part of the TIMIT database and a Chinese database which is recorded by our own. Experiments have shown that the method proposed in this paper is effective. The system has better performance and robustness than the baseline systems.


2014 ◽  
Vol 2014 ◽  
pp. 1-8 ◽  
Author(s):  
Yan Zhang ◽  
Zhen-min Tang ◽  
Yan-ping Li ◽  
Yang Luo

Accurate and effective voice activity detection (VAD) is a fundamental step for robust speech or speaker recognition. In this study, we proposed a hierarchical framework approach for VAD and speech enhancement. The modified Wiener filter (MWF) approach is utilized for noise reduction in the speech enhancement block. For the feature selection and voting block, several discriminating features were employed in a voting paradigm for the consideration of reliability and discriminative power. Effectiveness of the proposed approach is compared and evaluated to other VAD techniques by using two well-known databases, namely, TIMIT database and NOISEX-92 database. Experimental results show that the proposed method performs well under a variety of noisy conditions.


2013 ◽  
Vol 26 (3) ◽  
pp. 215-225 ◽  
Author(s):  
Mitar Milacic ◽  
Sima Dimitrijev

Current research into classification methods is almost exclusively software based, resulting in systems that perform well but are invariably slow when faced with large databases. The goal is therefore to create a hardware classification system that is much faster. In this paper, we introduce the concept of template matching logic and propose the use of a standard flash memory cell array to perform bit by bit template matching. The proposed system is based on a novel architecture that is unique and separate from existing architectures that make use of flash memory cell arrays. Verification is achieved by speech recognition simulations on the TIMIT database. Simulations of the system show results of 94.5 % recognition accuracy on clean words and 88.0 % recognition accuracy on test words with a signal-to-noise ratio of 5 dB. The results compare favorably to similar isolated word recognition tasks performed with software based methods.


Sign in / Sign up

Export Citation Format

Share Document