Machine Learning–Based Splicing Detection in Digital Audio Recordings for Audio Forensics

2021 ◽  
Vol 69 (11) ◽  
pp. 793-804
Author(s):  
Rashmika Patole ◽  
Priti P. Rege
Author(s):  
Rashmika Kiran Patole ◽  
Priti Paresh Rege

The field of audio forensics has seen a huge advancement in recent years with an increasing number of techniques used for the analysis of the audio recordings submitted as evidence in legal investigations. Audio forensics involves authentication of the evidentiary audio recordings, which is an important procedure to verify the integrity of audio recordings. This chapter focuses two audio authentication procedures, namely acoustic environment identification and tampering detection. The authors provide a framework for the above-mentioned procedures discussing in detail the methodology and feature sets used in the two tasks. The main objective of this chapter is to introduce the readers to different machine learning algorithms that can be used for environment identification and forgery detection. The authors also provide some promising results that prove the utility of machine learning algorithms in this interesting field.


2014 ◽  
Vol 283 ◽  
pp. 54-67
Author(s):  
Rafał Korycki ◽  

In the work, the problem of detecting discontinuities in lossily compressed audio recordings was outlined and new methods that can be used to examine the authenticity of digital audio records were presented. The described solutions are based on statistical analysis of the data, calculated on the basis of the value of MDCT coefficients. Designated vectors, consisting of 228 features, were used as the training sequences of two machinę learning algorithms under the supervision of the linear discriminant analysis (LDA) and the support vector machinę (SVM). Detection of multiple compression was both used to detect modification of the recording as well as to reveal traces of montage in digital audio recordings. The effectiveness of the algorithms for the detection of discontinuities was tested on the database of recorded musie consisting of nearly one million MP3 files, specially prepared forthis purpose. The results of the research were discussed in the context of the influence of parameters of the compression on the abiiity to detect interference in digital audio recordings.


2019 ◽  
Vol 9 (15) ◽  
pp. 3097 ◽  
Author(s):  
Diego Renza ◽  
Jaime Andres Arango ◽  
Dora Maria Ballesteros

This paper addresses a problem in the field of audio forensics. With the aim of providing a solution that helps Chain of Custody (CoC) processes, we propose an integrity verification system that includes capture (mobile based), hash code calculation and cloud storage. When the audio is recorded, a hash code is generated in situ by the capture module (an application), and it is sent immediately to the cloud. Later, the integrity of the audio recording given as evidence can be verified according to the information stored in the cloud. To validate the properties of the proposed scheme, we conducted several tests to evaluate if two different inputs could generate the same hash code (collision resistance), and to evaluate how much the hash code changes when small changes occur in the input (sensitivity analysis). According to the results, all selected audio signals provide different hash codes, and these values are very sensitive to small changes over the recorded audio. On the other hand, in terms of computational cost, less than 2 s per minute of recording are required to calculate the hash code. With the above results, our system is useful to verify the integrity of audio recordings that may be relied on as digital evidence.


Author(s):  
E. Yu. Shchetinin

The recognition of human emotions is one of the most relevant and dynamically developing areas of modern speech technologies, and the recognition of emotions in speech (RER) is the most demanded part of them. In this paper, we propose a computer model of emotion recognition based on an ensemble of bidirectional recurrent neural network with LSTM memory cell and deep convolutional neural network ResNet18. In this paper, computer studies of the RAVDESS database containing emotional speech of a person are carried out. RAVDESS-a data set containing 7356 files. Entries contain the following emotions: 0 – neutral, 1 – calm, 2 – happiness, 3 – sadness, 4 – anger, 5 – fear, 6 – disgust, 7 – surprise. In total, the database contains 16 classes (8 emotions divided into male and female) for a total of 1440 samples (speech only). To train machine learning algorithms and deep neural networks to recognize emotions, existing audio recordings must be pre-processed in such a way as to extract the main characteristic features of certain emotions. This was done using Mel-frequency cepstral coefficients, chroma coefficients, as well as the characteristics of the frequency spectrum of audio recordings. In this paper, computer studies of various models of neural networks for emotion recognition are carried out on the example of the data described above. In addition, machine learning algorithms were used for comparative analysis. Thus, the following models were trained during the experiments: logistic regression (LR), classifier based on the support vector machine (SVM), decision tree (DT), random forest (RF), gradient boosting over trees – XGBoost, convolutional neural network CNN, recurrent neural network RNN (ResNet18), as well as an ensemble of convolutional and recurrent networks Stacked CNN-RNN. The results show that neural networks showed much higher accuracy in recognizing and classifying emotions than the machine learning algorithms used. Of the three neural network models presented, the CNN + BLSTM ensemble showed higher accuracy.


Author(s):  
Christian Kraetzer ◽  
Andrea Oermann ◽  
Jana Dittmann ◽  
Andreas Lang

2019 ◽  
Vol 11 (2) ◽  
pp. 47-62 ◽  
Author(s):  
Xinchao Huang ◽  
Zihan Liu ◽  
Wei Lu ◽  
Hongmei Liu ◽  
Shijun Xiang

Detecting digital audio forgeries is a significant research focus in the field of audio forensics. In this article, the authors focus on a special form of digital audio forgery—copy-move—and propose a fast and effective method to detect doctored audios. First, the article segments the input audio data into syllables by voice activity detection and syllable detection. Second, the authors select the points in the frequency domain as feature by applying discrete Fourier transform (DFT) to each audio segment. Furthermore, this article sorts every segment according to the features and gets a sorted list of audio segments. In the end, the article merely compares one segment with some adjacent segments in the sorted list so that the time complexity is decreased. After comparisons with other state of the art methods, the results show that the proposed method can identify the authentication of the input audio and locate the forged position fast and effectively.


Education ◽  
2020 ◽  
Author(s):  
Anne Ladyem McDivitt

Since its introduction in the early 2000s, podcasting has become a popular alternative to traditional radio, with a do-it-yourself emphasis and a democratization of producing audio without a need for advertisers or a broadcaster’s backing. Podcasting has also been a promising learning tool for educators and students. With the popularity of the platform, many have jumped on board to create and utilize podcasts for pedagogical purposes, both in the classroom and for the public. Podcasting for pedagogical purposes has coincided with developments in educational theory such as flipped classrooms, active learning, and digital humanities. While there have been debates about the effectiveness of using podcasts for educational purposes, the majority of the literature on podcasting demonstrates that there are benefits for students learning through podcasts and digital audio recordings. Whether it’s the positives and the negatives of the format, or even just how to create a podcast, literature on podcasting has grown exponentially as more people and scholars think about how to use the medium for learning purposes. One significant hurdle in terms of a creating a podcasting bibliography is that the technology involved has changed over the years since its introduction to academia. While some of the methodology may not be as up-to-date in the older texts, they still have critical information that is relevant to incorporating podcasting into a classroom setting.


IEEE Access ◽  
2017 ◽  
Vol 5 ◽  
pp. 12843-12855 ◽  
Author(s):  
Muhammad Imran ◽  
Zulfiqar Ali ◽  
Sheikh Tahir Bakhsh ◽  
Sheeraz Akram

Sign in / Sign up

Export Citation Format

Share Document