scholarly journals Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral Features

2011 ◽  
Vol 62 (4) ◽  
pp. 199-205 ◽  
Author(s):  
Ghulam Muhammad ◽  
Khalid Alghathbar

Environment Recognition for Digital Audio Forensics Using MPEG-7 and MEL Cepstral FeaturesEnvironment recognition from digital audio for forensics application is a growing area of interest. However, compared to other branches of audio forensics, it is a less researched one. Especially less attention has been given to detect environment from files where foreground speech is present, which is a forensics scenario. In this paper, we perform several experiments focusing on the problems of environment recognition from audio particularly for forensics application. Experimental results show that the task is easier when audio files contain only environmental sound than when they contain both foreground speech and background environment. We propose a full set of MPEG-7 audio features combined with mel frequency cepstral coefficients (MFCCs) to improve the accuracy. In the experiments, the proposed approach significantly increases the recognition accuracy of environment sound even in the presence of high amount of foreground human speech.

2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 27-27
Author(s):  
Ricardo V Ventura ◽  
Rafael Z Lopes ◽  
Lucas T Andrietta ◽  
Fernando Bussiman ◽  
Julio Balieiro ◽  
...  

Abstract The Brazilian gaited horse industry is growing steadily, even after a recession period that affected different economic sectors in the whole country. Recent numbers suggested an increase on the exports, which reveals the relevance of this horse market segment. Horses are classified according to the gait criteria, which divide the horses in two groups associated with the animal movements: lateral (Marcha Picada) or diagonal (Marcha_Batida). These two gait groups usually show remarkable differences related to speed and number of steps per fixed unit of time, among other factors. Audio retrieval refers to the process of information extraction obtained from audio signals. This new data analysis area, in comparison to traditional methods to evaluate and classify gait types (as, for example, human subjective evaluation and video monitoring), provides a potential method to collect phenotypes in a reduced cost manner. Audio files (n = 80) were obtained after extracting audio features from freely available YouTube videos. Videos were manually labeled according to the two gait groups (Marcha Picada or Marcha Batida) and thirty animals were used after a quality control filter step. This study aimed to investigate different metrics associated with audio signal processing, in order to first cluster animals according to the gait type and subsequently include additional traits that could be useful to improve accuracy during the identification of genetically superior animals. Twenty-eight metrics, based on frequency or physical audio aspects, were carried out individually or in groups of relative importance to perform Principal Component Analysis (PCA), as well as to describe the two gait types. The PCA results indicated that over 87% of the animals were correctly clustered. Challenges regarding environmental interferences and noises must be further investigated. These first findings suggest that audio information retrieval could potentially be implemented in animal breeding programs, aiming to improve horse gait.


2021 ◽  
Vol 13 (4) ◽  
pp. 628
Author(s):  
Liang Ye ◽  
Tong Liu ◽  
Tian Han ◽  
Hany Ferdinando ◽  
Tapio Seppänen ◽  
...  

Campus violence is a common social phenomenon all over the world, and is the most harmful type of school bullying events. As artificial intelligence and remote sensing techniques develop, there are several possible methods to detect campus violence, e.g., movement sensor-based methods and video sequence-based methods. Sensors and surveillance cameras are used to detect campus violence. In this paper, the authors use image features and acoustic features for campus violence detection. Campus violence data are gathered by role-playing, and 4096-dimension feature vectors are extracted from every 16 frames of video images. The C3D (Convolutional 3D) neural network is used for feature extraction and classification, and an average recognition accuracy of 92.00% is achieved. Mel-frequency cepstral coefficients (MFCCs) are extracted as acoustic features, and three speech emotion databases are involved. The C3D neural network is used for classification, and the average recognition accuracies are 88.33%, 95.00%, and 91.67%, respectively. To solve the problem of evidence conflict, the authors propose an improved Dempster–Shafer (D–S) algorithm. Compared with existing D–S theory, the improved algorithm increases the recognition accuracy by 10.79%, and the recognition accuracy can ultimately reach 97.00%.


Author(s):  
Jinfang Zeng ◽  
Youming Li ◽  
Yu Zhang ◽  
Da Chen

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. To date, a variety of signal processing and machine learning techniques have been applied to ESC task, including matrix factorization, dictionary learning, wavelet filterbanks and deep neural networks. It is observed that features extracted from deeper networks tend to achieve higher performance than those extracted from shallow networks. However, in ESC task, only the deep convolutional neural networks (CNNs) which contain several layers are used and the residual networks are ignored, which lead to degradation in the performance. Meanwhile, a possible explanation for the limited exploration of CNNs and the difficulty to improve on simpler models is the relative scarcity of labeled data for ESC. In this paper, a residual network called EnvResNet for the ESC task is proposed. In addition, we propose to use audio data augmentation to overcome the problem of data scarcity. The experiments will be performed on the ESC-50 database. Combined with data augmentation, the proposed model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches in terms of classification accuracy.


Author(s):  
Christian Kraetzer ◽  
Andrea Oermann ◽  
Jana Dittmann ◽  
Andreas Lang

10.2196/17906 ◽  
2020 ◽  
Vol 22 (9) ◽  
pp. e17906
Author(s):  
Catharina Zehetmair ◽  
Ede Nagy ◽  
Carla Leetz ◽  
Anna Cranz ◽  
David Kindermann ◽  
...  

Background Refugees have an increased risk of developing mental health problems. There are insufficient psychosocial care structures to meet the resulting need for support. Stabilizing and guided imagery techniques have shown promising results in increasing traumatized refugees’ emotional stabilization. If delivered via audio files, the techniques can be practiced autonomously and independent of time, space, and human resources or stable treatment settings. Objective This study aimed to evaluate the self-practice of stabilizing and guided imagery techniques via digital audio files for traumatized refugees living in a reception and registration center in Germany. Methods From May 2018 to February 2019, 42 traumatized refugees participated in our study. At T1, patients received digital audio files in English, French, Arabic, Farsi, Turkish, or Serbian for self-practice. Nine days later, at T2, a face-to-face interview was conducted. Two months after T2, a follow-up interview took place via telephone. Results At T2, about half of the patients reported the daily practice of stabilizing and guided imagery techniques. At follow-up, the average frequency of practice was once weekly or more for those experiencing worse symptoms. No technical difficulties were reported. According to T2 and follow-up statements, the techniques helped the patients dealing with arousal, concentration, sleep, mood, thoughts, empowerment, and tension. The guided imagery technique “The Inner Safe Place” was the most popular. Self-practice was impeded by postmigratory distress factors, like overcrowded accommodations. Conclusions The results show that self-practice of stabilizing and guided imagery techniques via digital audio files was helpful to and well accepted by the assessed refugees. Even though postmigratory distress factors hampered self-practice, “The Inner Safe Place” technique was particularly well received. Overall, the self-practiced audio-based stabilizing and guided imagery techniques showed promising results among the highly vulnerable group of newly arrived traumatized refugees.


Author(s):  
Ricky Mohanty ◽  
Subhendu Kumar Pani ◽  
Ahmad Taher Azar

The livestock health management system is based on the principal concept to investigate bird health status by collecting biological traits like their sound utterance. This theme is implemented on four different species of livestock to cure them of bronchitis disease. This paper includes the audio features of both healthy and unhealthy livestock. Particularly, the secure audio-wellbeing features are incorporated into the platform to spontaneously examine and conclude using livestock voice information to recognize diseased birds. One month of long-term recognition experimental studies has been conducted where the recognition accuracy of the set of diseased birds was about 99% using adaptive neuro-fuzzy inference system (ANFIS). This recognition accuracy of ANFIS in this regard is better than the performance of an artificial neural network. This is a reliable way for researchers to investigate and constitute evidence of disease curability or eradication of incurable ones.


2019 ◽  
Vol 11 (2) ◽  
pp. 47-62 ◽  
Author(s):  
Xinchao Huang ◽  
Zihan Liu ◽  
Wei Lu ◽  
Hongmei Liu ◽  
Shijun Xiang

Detecting digital audio forgeries is a significant research focus in the field of audio forensics. In this article, the authors focus on a special form of digital audio forgery—copy-move—and propose a fast and effective method to detect doctored audios. First, the article segments the input audio data into syllables by voice activity detection and syllable detection. Second, the authors select the points in the frequency domain as feature by applying discrete Fourier transform (DFT) to each audio segment. Furthermore, this article sorts every segment according to the features and gets a sorted list of audio segments. In the end, the article merely compares one segment with some adjacent segments in the sorted list so that the time complexity is decreased. After comparisons with other state of the art methods, the results show that the proposed method can identify the authentication of the input audio and locate the forged position fast and effectively.


Author(s):  
Stephen R. Chastain ◽  
Jason Caudill

Podcasting has quickly emerged as a leading technology in the new field of mobile learning. Tracing this new technology’s history over the past two years reveals just how broadly the use of digital audio files may become in the fields of education and training. The ease of use, low cost of creation and hosting, and most importantly pervasiveness of user access to compatible hardware combine to make podcasting a major force in both traditional and distance education. This chapter explores the history, technology, and application of podcasting as an instructional tool.


Sign in / Sign up

Export Citation Format

Share Document