Fast and Effective Copy-Move Detection of Digital Audio Based on Auto Segment

Detecting digital audio forgeries is a significant research focus in the field of audio forensics. In this article, the authors focus on a special form of digital audio forgery—copy-move—and propose a fast and effective method to detect doctored audios. First, the article segments the input audio data into syllables by voice activity detection and syllable detection. Second, the authors select the points in the frequency domain as feature by applying discrete Fourier transform (DFT) to each audio segment. Furthermore, this article sorts every segment according to the features and gets a sorted list of audio segments. In the end, the article merely compares one segment with some adjacent segments in the sorted list so that the time complexity is decreased. After comparisons with other state of the art methods, the results show that the proposed method can identify the authentication of the input audio and locate the forged position fast and effectively.

Download Full-text

Fast and Effective Copy-Move Detection of Digital Audio Based on Auto Segment

Digital Forensics and Forensic Investigations ◽

10.4018/978-1-7998-3025-2.ch011 ◽

2020 ◽

pp. 127-142

Author(s):

Xinchao Huang ◽

Zihan Liu ◽

Wei Lu ◽

Hongmei Liu ◽

Shijun Xiang

Keyword(s):

Fourier Transform ◽

Special Form ◽

Time Complexity ◽

Voice Activity Detection ◽

Digital Audio ◽

Audio Forensics ◽

Significant Research ◽

Audio Data ◽

Adjacent Segments ◽

Voice Activity

Download Full-text

EVALUATION AND COMPARISON USING ACTIVITY SIGNALS OF SPEECH METHODS IN RIVER PLATE SPANISH USING BEPPA CORPUS

Revista de Investigaciones Universidad del Quindío ◽

10.33975/riuq.vol28n1.43 ◽

2016 ◽

Vol 28 (1) ◽

pp. 138-144

Author(s):

Horderlin Vrangel Robles ◽

Valentin Molina ◽

Luis Martinez ◽

Hermann Davila

Keyword(s):

State Of The Art ◽

Detection Algorithm ◽

Spectral Energy ◽

Activity Detection ◽

Endpoint Detection ◽

Zero Crossing ◽

Order Difference ◽

Rate Method ◽

Short Time ◽

Voice Activity

The results obtained after comparing several algorithms which use basic methods of signal processing for speech activity detection of voice or VAD (Voice Activity Detection-VAD), were assessed in order to determine their effectiveness. The algorithms presented in this article are short-time or spectral energy based endpoint detection algorithm, the zero crossing rate method, and the higher order differential (High Order Difference, HOD) method. First, an introduction of the concept of VAD is presented and the need to apply such language algorithms in River Plate is Spanish. Then a summary of the state of the art techniques and algorithms for detecting voice activity is shown with evidence and experiments used to implement algorithms with BEPPA corpus (Evaluation Battery for Patients with Auditive Prostheses, BEPPA – in Spanish).

Download Full-text

Voice activity detection method based on inter-frame correlation

Journal of Computer Applications ◽

10.3724/sp.j.1087.2011.01447 ◽

2011 ◽

Vol 31 (5) ◽

pp. 1447-1449

Author(s):

Yu LI ◽

Lei-yong GUO ◽

Hong-zhou TAN

Keyword(s):

Detection Method ◽

Voice Activity Detection ◽

Activity Detection ◽

Voice Activity ◽

Inter Frame

Download Full-text

Robust speaker recognition based on level-building voice activity detection

JOURNAL OF SHENZHEN UNIVERSITY SCIENCE AND ENGINEERING ◽

10.3724/sp.j.1249.2012.04328 ◽

2012 ◽

Vol 29 (4) ◽

pp. 328-334

Author(s):

Yan-lu XIE ◽

Jing-song ZHANG ◽

Ming-hui LIU ◽

Zhong-wei HUANG

Keyword(s):

Speaker Recognition ◽

Voice Activity Detection ◽

Activity Detection ◽

Robust Speaker Recognition ◽

Level Building ◽

Voice Activity

Download Full-text

Joint Learning Using Denoising Variational Autoencoders for Voice Activity Detection

10.21437/interspeech.2018-1151 ◽

2018 ◽

Cited By ~ 9

Author(s):

Youngmoon Jung ◽

Younggwan Kim ◽

Yeunju Choi ◽

Hoirin Kim

Keyword(s):

Voice Activity Detection ◽

Activity Detection ◽

Joint Learning ◽

Voice Activity

Download Full-text

Optimizing Voice Activity Detection for Noisy Conditions

10.21437/interspeech.2019-1776 ◽

2019 ◽

Author(s):

Ruixi Lin ◽

Charles Costello ◽

Charles Jankowski ◽

Vishwas Mruthyunjaya

Keyword(s):

Voice Activity Detection ◽

Activity Detection ◽

Noisy Conditions ◽

Voice Activity

Download Full-text

Linear detector and neural networks in cascade for voice activity detection in hearing aids

Applied Acoustics ◽

10.1016/j.apacoust.2020.107832 ◽

2021 ◽

Vol 175 ◽

pp. 107832

Author(s):

Joaquín García-Gómez ◽

Roberto Gil-Pita ◽

Miguel Aguilar-Ortega ◽

Manuel Utrilla-Manso ◽

Manuel Rosa-Zurera ◽

...

Keyword(s):

Neural Networks ◽

Hearing Aids ◽

Voice Activity Detection ◽

Activity Detection ◽

Linear Detector ◽

Voice Activity

Download Full-text

Improving Semi-Supervised Learning for Audio Classification with FixMatch

Electronics ◽

10.3390/electronics10151807 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1807

Author(s):

Sascha Grollmisch ◽

Estefanía Cano

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Transfer Learning ◽

Data Transfer ◽

State Of The Art ◽

Training Data ◽

Audio Classification ◽

Image Domain ◽

Full Dataset ◽

Audio Data

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.

Download Full-text