Study on Characteristic Parameters of Speech Anti-Deliberate Imitation System

2013 ◽  
Vol 475-476 ◽  
pp. 388-393
Author(s):  
Ping Zhou ◽  
Jing Jing Ke ◽  
Xin Xing Jing ◽  
Zhao Guo Cui

Deliberate imitation which is the reproduction of another speakers voice and speech behavior can pose a threat to the security of the voice authentication system. Therefore effective characteristic parameters are the key to the anti-deliberate imitation. The study chose speech database of anti-deliberate imitation and investigated some common feature parameters separating capacity and descriptive power against voice deliberate imitation. The study compared the ranking of subjective evaluation and feature parameters Euclidean distance of imitators. The comparison results indicate that Mel frequency cepstrum coefficient (MFCC) combined with its differential cepstrum parameters (WMFCC) have the best performance for anti-deliberate imitation. And the results were validated by the experiment based on VQ speaker verification system.

2006 ◽  
Vol 65 (5) ◽  
pp. 427-439
Author(s):  
O. N. Katkov ◽  
V. A. Pimenov ◽  
A. P. Ryzhkov

Author(s):  
Sonal Anilkumar Tiwari

Abstract: This can be quite interesting when we think that we commanding something to in-animated objects. Yes it is possible with the help of ASR systems. Speech recognition system is a system that can make humans to talk with machineries. Nowadays speech recognition is such a technique that without it, a person cannot do any of his work properly. People get addicted of it. And it has become a habit for humans like we use mobile phones but when we want to type something, then we immediately can pass the voice commands. With which our Efforts are reduced, as well as a lot of our time. Keywords: Speech, Speech Recognition, ASR, Corpus, PRAAT


THE BULLETIN ◽  
2020 ◽  
Vol 5 (387) ◽  
pp. 6-15
Author(s):  
O. Mamyrbayev ◽  
◽  
A. Akhmediyarova ◽  
A. Kydyrbekova ◽  
N. O. Mekebayev ◽  
...  

Biometrics offers more security and convenience than traditional methods of identification. Recently, DNN has become a means of a more reliable and efficient authentication scheme. In this work, we compare two modern teaching methods: these two methods are methods based on the Gaussian mixture model (GMM) (denoted by the GMM i-vector) and methods based on deep neural networks (DNN) (denoted as the i-vector DNN). The results show that the DNN system with an i-vector is superior to the GMM system with an i-vector for various durations (from full length to 5s). DNNs have proven to be the most effective features for text-independent speaker verification in recent studies. In this paper, a new scheme is proposed that allows using DNN when checking text using hints in a simple and effective way. Experiments show that the proposed scheme reduces EER by 24.32% compared with the modern method and is evaluated for its reliability using noisy data, as well as data collected in real conditions. In addition, it is shown that the use of DNN instead of GMM for universal background modeling leads to a decrease in EER by 15.7%.


2020 ◽  
Vol 8 (5) ◽  
pp. 3676-3680

This present paper aims to extract robust dynamic features used to spoofing detection and countermeasure in ASV system. ASV is a biometric person authentication system. Researchers are aiming to develop spoofing detection and countermeasure techniques to protect this system against different spoofing attacks. For this, replayed attack is considered, because of very common accessibility of recording devices. In replay spoofing, the speech utterances of target (genuine) speakers are recorded and played against ASV system for gaining access unauthorizedly. For this purpose, as a first step, different dynamic features will be extracted for each speech sample. For feature extraction MFCC, LFCC, and MGDCC feature extraction techniques are used. As a second step, a classifier is used to classify whether the given speech sample is genuine or not. As a classifier, GMM and universal background model is used. In this present work, GMM based ASV system and Countermeasure systems using different feature extraction techniques are developed, and the performance of the methods is evaluated using EER and t- DCF. Basing on the performance values, the best feature extraction technique is selected.


2020 ◽  
Author(s):  
Susanto Susanto ◽  
Wang Zhenhua ◽  
Wang Yingli ◽  
Deri Sis Nanda

In the provision of linguistic evidence as one of the foci in Forensic Linguistics, Forensic Speaker Verification (FSV) includes an analysis of speech recordings to verify the voice of a criminal. As an inquiry into the validity of the available FSV, we present the analysis on Indonesian FSV system. The system consists of pairing, tagging, acoustic features extraction, and statistical analysis. There is a claim that the system meets the demand for presenting legal evidence in Indonesian court. In the system, one of the acoustic features extracted from the speech data is fundamental frequency (F0). Then, the paper aims at reviewing the method in Indonesian FSV system in terms of fundamental frequency (F0) used as the discriminatory potential. The results show that F0 has not represented an adequate interpretation of the linguistic evidence in our experimental data. It leads us to suggest that more experimental studies are required to scrutinize F0 in the system.


2022 ◽  
pp. 61-77
Author(s):  
Jie Lien ◽  
Md Abdullah Al Momin ◽  
Xu Yuan

Voice assistant systems (e.g., Siri, Alexa) have attracted wide research attention. However, such systems could receive voice information from malicious sources. Recent work has demonstrated that the voice authentication system is vulnerable to different types of attacks. The attacks are categorized into two main types: spoofing attacks and hidden voice commands. In this chapter, how to launch and defend such attacks is explored. For the spoofing attack, there are four main types, such as replay attacks, impersonation attacks, speech synthesis attacks, and voice conversion attacks. Although such attacks could be accurate on the speech recognition system, they could be easily identified by humans. Thus, the hidden voice commands have attracted a lot of research interest in recent years.


2014 ◽  
Vol 543-547 ◽  
pp. 2192-2195 ◽  
Author(s):  
Chen Chen Huang ◽  
Wei Gong ◽  
Wen Long Fu ◽  
Dong Yu Feng

As the most important medium of communication in human beings life, speech carries abundant emotional information. In recent years, how to recognize the speakers emotional state automatically from the speech is attracting extensive attention of researchers in various fields. In this paper, we studied the method of speech emotion recognition. We collected a total of 360 sentences from four speakers with the emotional statement about happiness, anger, surprise, sadness, and extracted eight emotional characteristics from these voice data. Contribution analysis method is proposed to determine the value of emotion characteristic parameters. We also have used the weighted Euclidean distance template matching to identify the speech emotion, got more than 80% of the average emotional recognition rate.


2020 ◽  
Vol 90 (17-18) ◽  
pp. 2085-2096 ◽  
Author(s):  
Xiaorui Hu ◽  
Fengxin Sun ◽  
Qicai Wang ◽  
Weidong Gao

Wrinkling is one of the most common flaws of woven fabrics in domestic use and industrial applications. It is necessary to develop an objective evaluation method to quantify the smoothness appearance of fabrics effectively. Herein, a fabric multi-deformation tester (FMDT) was designed to evaluate the smoothness appearance of garment fabrics by one sequential mechanical test, overcoming the main difficulties of the existing visual measurement methods for wrinkling evaluation of fabrics with complex colors and patterns. The k-means clustering algorithm was used to objectively cluster the fabric samples based on the characteristic parameters, including the wrinkle-induced residual force ( F wr), hysteresis distance ( H fr), position deflection ( D fr) and stretching recovery slope ( S tr), from the testing curve and the thickness and weight of fabrics, and comparisons with subjective evaluation were also conducted. The results reveal that the k-means clustering is able to classify the smoothness appearance of fabrics using the selected characteristic parameters, showing a good consistency with the subjective clustering results. The feasibility of using the mechanical and deformation properties of textiles to characterize fabric smoothness appearance is proved, and the FMDT provides a potential method to analyze the wrinkling of fibrous materials in a convenient way.


2014 ◽  
Vol 644-650 ◽  
pp. 4346-4350
Author(s):  
Hong En Xie ◽  
Qiang Li ◽  
Qin Jun Shu

In order to improve the utilization of transmission bandwidth in voice communication, this paper proposes a discontinuous transmission method for LPC speech codec. Firstly, by using the algorithm of voice activity detection (VAD), the received signal is divided into voice frame and mute frame. Transitional frame is introduced when the voice frame is converted to mute frame. Then voice frames and transitional frames are encoded at a normal rate, but mute frames are encoded into silence description (SID) frame at a lower rate, which is sent by a method of discontinuous transmission mode. The transmission frequency of SID frame is adjusted automatically according to the fluctuation of characteristic parameters of background noise in mute frames. Finally, the method is applied to the simulation in the MELP vocoder, and the results show that this method has better adaptability in the transmission of mute signal and the synthesized background noise presents good comfort and continuity in the auditory perception.


Sign in / Sign up

Export Citation Format

Share Document