scholarly journals Long-term variable Q transform: A novel time-frequency transform algorithm for synthetic speech detection

2021 ◽  
pp. 103256
Author(s):  
Jialong Li ◽  
Hongxia Wang ◽  
Peisong He ◽  
Sani M. Abdullahi ◽  
Bin Li
2021 ◽  
Vol 2021 (1) ◽  
Author(s):  
Clara Borrelli ◽  
Paolo Bestagini ◽  
Fabio Antonacci ◽  
Augusto Sarti ◽  
Stefano Tubaro

AbstractSeveral methods for synthetic audio speech generation have been developed in the literature through the years. With the great technological advances brought by deep learning, many novel synthetic speech techniques achieving incredible realistic results have been recently proposed. As these methods generate convincing fake human voices, they can be used in a malicious way to negatively impact on today’s society (e.g., people impersonation, fake news spreading, opinion formation). For this reason, the ability of detecting whether a speech recording is synthetic or pristine is becoming an urgent necessity. In this work, we develop a synthetic speech detector. This takes as input an audio recording, extracts a series of hand-crafted features motivated by the speech-processing literature, and classify them in either closed-set or open-set. The proposed detector is validated on a publicly available dataset consisting of 17 synthetic speech generation algorithms ranging from old fashioned vocoders to modern deep learning solutions. Results show that the proposed method outperforms recently proposed detectors in the forensics literature.


2021 ◽  
Author(s):  
Denchai Worasawate ◽  
Warisara Asawaponwiput ◽  
Natsue Yoshimura ◽  
Apichart Intarapanich ◽  
Decho Surangsrirat

BACKGROUND Parkinson’s disease (PD) is a long-term neurodegenerative disease of the central nervous system. The current diagnosis is dependent on clinical observation and the abilities and experience of a trained specialist. One of the symptoms that affect most patients over the course of their illness is voice impairment. OBJECTIVE Voice is one of the non-invasive data that can be collected remotely for diagnosis and disease progression monitoring. In this study, we analyzed voice recording data from a smartphone as a possible disease biomarker. The dataset is from one of the largest mobile PD studies, the mPower study. METHODS A total of 29,798 audio clips from 4,051 participants were used for the analysis. The voice recordings were from sustained phonation by the participant saying /aa/ for ten seconds into the iPhone microphone. The audio samples were converted to a spectrogram using a short-time Fourier transform. CNN models were then applied to classify the samples. RESULTS A total of 29,798 audio clips from 4,051 participants were used for the analysis. The voice recordings were from sustained phonation by the participant saying /aa/ for ten seconds into the iPhone microphone. The audio samples were converted to a spectrogram using a short-time Fourier transform. CNN models were then applied to classify the samples. CONCLUSIONS Classification accuracies of the proposed method with LeNet-5, ResNet-50, and VGGNet-16 are 97.7 ± 0.1%, 98.6 ± 0.2%, and 99.3 ± 0.1%, respectively. CLINICALTRIAL ClinicalTrials.gov NCT02696603; https://www.clinicaltrials.gov/ct2/show/NCT02696603


2014 ◽  
Author(s):  
Jon Sanchez ◽  
Ibon Saratxaga ◽  
Inma Hernaez ◽  
Eva Navas ◽  
Daniel Erro

2019 ◽  
Vol 7 (8) ◽  
pp. 275 ◽  
Author(s):  
Picco ◽  
Schiano ◽  
Incardone ◽  
Repetti ◽  
Demarte ◽  
...  

A long-term time series of high-frequency sampled sea-level data collected in the port of Genoa were analyzed to detect the occurrence of meteotsunami events and to characterize them. Time-frequency analysis showed well-developed energy peaks on a 26–30 minute band, which are an almost permanent feature in the analyzed signal. The amplitude of these waves is generally few centimeters but, in some cases, they can reach values comparable or even greater than the local tidal elevation. In the perspective of sea-level rise, their assessment can be relevant for sound coastal work planning and port management. Events having the highest energy were selected for detailed analysis and the main features were identified and characterized by means of wavelet transform. The most important one occurred on 14 October 2016, when the oscillations, generated by an abrupt jump in the atmospheric pressure, achieved a maximum wave height of 50 cm and lasted for about three hours.


Lubricants ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 29 ◽  
Author(s):  
Noushin Mokhtari ◽  
Jonathan Gerald Pelham ◽  
Sebastian Nowoisky ◽  
José-Luis Bote-Garcia ◽  
Clemens Gühmann

In this work, effective methods for monitoring friction and wear of journal bearings integrated in future UltraFan® jet engines containing a gearbox are presented. These methods are based on machine learning algorithms applied to Acoustic Emission (AE) signals. The three friction states: dry (boundary), mixed, and fluid friction of journal bearings are classified by pre-processing the AE signals with windowing and high-pass filtering, extracting separation effective features from time, frequency, and time-frequency domain using continuous wavelet transform (CWT) and a Support Vector Machine (SVM) as the classifier. Furthermore, it is shown that journal bearing friction classification is not only possible under variable rotational speed and load, but also under different oil viscosities generated by varying oil inlet temperatures. A method used to identify the location of occurring mixed friction events over the journal bearing circumference is shown in this paper. The time-based AE signal is fused with the phase shift information of an incremental encoder to achieve an AE signal based on the angle domain. The possibility of monitoring the run-in wear of journal bearings is investigated by using the extracted separation effective AE features. Validation was done by tactile roughness measurements of the surface. There is an obvious AE feature change visible with increasing run-in wear. Furthermore, these investigations show also the opportunity to determine the friction intensity. Long-term wear investigations were done by carrying out long-term wear tests under constant rotational speeds, loads, and oil inlet temperatures. Roughness and roundness measurements were done in order to calculate the wear volume for validation. The integrated AE Root Mean Square (RMS) shows a good correlation with the journal bearing wear volume.


2018 ◽  
Vol 120 (3) ◽  
pp. 1451-1460 ◽  
Author(s):  
Sigge Weisdorf ◽  
Sirin W. Gangstad ◽  
Jonas Duun-Henriksen ◽  
Karina S. S. Mosholt ◽  
Troels W. Kjær

Subcutaneous recording using electroencephalography (EEG) has the potential to enable ultra-long-term epilepsy monitoring in real-life conditions because it allows the patient increased mobility and discreteness. This study is the first to compare physiological and epileptiform EEG signals from subcutaneous and scalp EEG recordings in epilepsy patients. Four patients with probable or definite temporal lobe epilepsy were monitored with simultaneous scalp and subcutaneous EEG recordings. EEG recordings were compared by correlation and time-frequency analysis across an array of clinically relevant waveforms and patterns. We found high similarity between the subcutaneous EEG channels and nearby temporal scalp channels for most investigated electroencephalographic events. In particular, the temporal dynamics of one typical temporal lobe seizure in one patient were similar in scalp and subcutaneous recordings in regard to frequency distribution and morphology. Signal similarity is strongly related to the distance between the subcutaneous and scalp electrodes. On the basis of these limited data, we conclude that subcutaneous EEG recordings are very similar to scalp recordings in both time and time-frequency domains, if the distance between them is small. As many electroencephalographic events are local/regional, the positioning of the subcutaneous electrodes should be considered carefully to reflect the relevant clinical question. The impact of implantation depth of the subcutaneous electrode on recording quality should be investigated further. NEW & NOTEWORTHY This study is the first publication comparing the detection of clinically relevant, pathological EEG features from a subcutaneous recording system designed for out-patient ultra-long-term use to gold standard scalp EEG recordings. Our study shows that subcutaneous channels are very similar to comparable scalp channels, but also point out some issues yet to be resolved.


2020 ◽  
Vol 15 ◽  
pp. 2160-2170 ◽  
Author(s):  
Jichen Yang ◽  
Rohan Kumar Das ◽  
Haizhou Li

Sign in / Sign up

Export Citation Format

Share Document