Development of Spectral Speech Features for Deception Detection Using Neural Networks

Author(s):  
Sinead V. Fernandes ◽  
Muhammad S. Ullah
Symmetry ◽  
2019 ◽  
Vol 11 (4) ◽  
pp. 525 ◽  
Author(s):  
SON ◽  
KWON ◽  
PARK

Automatic gender classification in speech is a challenging research field with a wide range of applications in HCI (humancomputer interaction). A couple of decades of research have shown promising results, but there is still a need for improvement. Until now, gender classification has been made using differences in the spectral characteristics of males and females. We assumed that a neutral margin exists between the male and female spectral range. This margin causes misclassification of gender. To address this limitation, we studied three non-lexical speech features (fillers, overlapping, and lengthening). From the statistical analysis, we found that overlapping and lengthening are effective in gender classification. Next, we performed gender classification using overlapping, lengthening, and the baseline acoustic feature, Mel Frequency Cepstral Coefficient (MFCC). We have tried to achieve the best results by using various combinations of features at the same time or sequentially. We used two types of machine-learning methods, support vector machine (SVM) and recurrent neural networks (RNN), to classify the gender. We achieved 89.61% with RNN using a feature set including MFCC, overlapping, and lengthening at the same time. Also, we have reclassified using non-lexical features with only data belonging to the neutral margin which was empirically selected based on the result of gender classification with only MFCC. As a result, we determined that the accuracy of classification with RNN using lengthening was 1.83% better than when MFCC alone was used. We concluded that new speech features could be effective in improving gender classification through a behavioral approach, notably including emergency calls.


2021 ◽  
Vol 11 (14) ◽  
pp. 6393
Author(s):  
Ascensión Gallardo-Antolín ◽  
Juan M. Montero

The automatic detection of deceptive behaviors has recently attracted the attention of the research community due to the variety of areas where it can play a crucial role, such as security or criminology. This work is focused on the development of an automatic deception detection system based on gaze and speech features. The first contribution of our research on this topic is the use of attention Long Short-Term Memory (LSTM) networks for single-modal systems with frame-level features as input. In the second contribution, we propose a multimodal system that combines the gaze and speech modalities into the LSTM architecture using two different combination strategies: Late Fusion and Attention-Pooling Fusion. The proposed models are evaluated over the Bag-of-Lies dataset, a multimodal database recorded in real conditions. On the one hand, results show that attentional LSTM networks are able to adequately model the gaze and speech feature sequences, outperforming a reference Support Vector Machine (SVM)-based system with compact features. On the other hand, both combination strategies produce better results than the single-modal systems and the multimodal reference system, suggesting that gaze and speech modalities carry complementary information for the task of deception detection that can be effectively exploited by using LSTMs.


2004 ◽  
Vol 18 (1) ◽  
pp. 13-26 ◽  
Author(s):  
Antoinette R. Miller ◽  
J. Peter Rosenfeld

Abstract University students were screened using items from the Psychopathic Personality Inventory and divided into high (n = 13) and low (n = 11) Psychopathic Personality Trait (PPT) groups. The P300 component of the event-related potential (ERP) was recorded as each group completed a two-block autobiographical oddball task, responding honestly during the first (Phone) block, in which oddball items were participants' home phone numbers, and then feigning amnesia in response to approximately 50% of items in the second (Birthday) block in which oddball items were participants' birthdates. Bootstrapping of peak-to-peak amplitudes correctly identified 100% of low PPT and 92% of high PPT participants as having intact recognition. Both groups demonstrated malingering-related P300 amplitude reduction. For the first time, P300 amplitude and topography differences were observed between honest and deceptive responses to Birthday items. No main between-group P300 effects resulted. Post-hoc analysis revealed between-group differences in a frontally located post-P300 component. Honest responses were associated with late frontal amplitudes larger than deceptive responses at frontal sites in the low PPT group only.


Sign in / Sign up

Export Citation Format

Share Document