Emotion-Based Extraction, Classification and Prediction of the Audio Data

Author(s):  
Anusha Potluri ◽  
Ravi Guguloth ◽  
Chaitanya Muppala
Keyword(s):  
2020 ◽  
Vol 6 ◽  
pp. 17-40
Author(s):  
Muralitheran Munusamy

Sound or audio engineering is a branch of the field of engineering, which involves the process of recording sound and reproducing it by various means, as well as storing in order to be reproduced later. Known as sound or audio engineers, these trained professionals work in a variety of sound production fields and expert in recording methods. They can be instrumental to implement the affordable technologies and technical process to distribute the audio data hence, making it accessible for future generations. The current role of these engineers not only to perform or limited to recording session but they create metadata for archiving and preservation for future needs. Currently, product sleeves of ethnographic recordings represent no technical elements of how traditional music recordings are produced. The product details focus only to some extent on historical elements and musical notation. To an audio archivist, declaring what devices are in a recording is not linked with preservation data. Apart from the format, the sleeved design, technical specification is essential to other social scientists such as audio engineer and field recordist of the future. The aim of the present research is to capture optimum dynamic range of the sound and applying a signal processing that would not alter the tonality, timbre and harmonic of the sound. Further applying a suitable information storage for the metadata to be preserve or archived for future accessing and reproduction.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1807
Author(s):  
Sascha Grollmisch ◽  
Estefanía Cano

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.


1989 ◽  
Vol 32 (7) ◽  
pp. 862-871 ◽  
Author(s):  
Clement Yu ◽  
Wei Sun ◽  
Dina Bitton ◽  
Qi Yang ◽  
Richard Bruno ◽  
...  

2020 ◽  
pp. 002076402098419
Author(s):  
Kwamina Abekah-Carter ◽  
George Ofosu Oti

Background: Homelessness among people with mental illness has grown to become a common phenomenon in many developed and developing countries. Just like in any other country, the living conditions of homeless people with mental illness in Ghana are unwholesome. Despite the increased population of these vulnerable individuals on the streets, not much is known about the perspectives of the general public towards this phenomenon in Ghana. Aim: This research was conducted to explore the perspectives of community members on homeless people with mental illness. The main study objectives were (a) to find out the impacts of the presence of persons with mental illness on the streets and (b) to ascertain the reasons accounting for homelessness among persons with mental illness. Method: Utilizing a qualitative research design, twenty community members were sampled from selected suburbs in Nsawam and interviewed with the use of a semi-structured interview guide. The audio data gathered from the interviews were transcribed verbatim and analysed thematically. Results: Majority of the participants asserted that homeless people with mental illness had no access to good food, shelter, and health care. They further stated that some homeless people with mental illness perpetrated physical and sexual violence against the residents. Moreover, the participants believed that persons with mental illness remained on the streets due to neglect by their family members, and limited access to psychiatric services. Conclusion: This paper concludes by recommending to government to make mental health services accessible and affordable to homeless persons with mental illness nationwide.


2021 ◽  
Vol 14 (1) ◽  
pp. 205979912098776
Author(s):  
Joseph Da Silva

Interviews are an established research method across multiple disciplines. Such interviews are typically transcribed orthographically in order to facilitate analysis. Many novice qualitative researchers’ experiences of manual transcription are that it is tedious and time-consuming, although it is generally accepted within much of the literature that quality of analysis is improved through researchers performing this task themselves. This is despite the potential for the exhausting nature of bulk transcription to conversely have a negative impact upon quality. Other researchers have explored the use of automated methods to ease the task of transcription, more recently using cloud-computing services, but such services present challenges to ensuring confidentiality and privacy of data. In the field of cyber-security, these are particularly concerning; however, any researcher dealing with confidential participant speech should also be uneasy with third-party access to such data. As a result, researchers, particularly early-career researchers and students, may find themselves with no option other than manual transcription. This article presents a secure and effective alternative, building on prior work published in this journal, to present a method that significantly reduced, by more than half, interview transcription time for the researcher yet maintained security of audio data. It presents a comparison between this method and a fully manual method, drawing on data from 10 interviews conducted as part of my doctoral research. The method presented requires an investment in specific equipment which currently only supports the English language.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6722
Author(s):  
Bernhard Hollaus ◽  
Sebastian Stabinger ◽  
Andreas Mehrle ◽  
Christian Raschner

Highly efficient training is a must in professional sports. Presently, this means doing exercises in high number and quality with some sort of data logging. In American football many things are logged, but there is no wearable sensor that logs a catch or a drop. Therefore, the goal of this paper was to develop and verify a sensor that is able to do exactly that. In a first step a sensor platform was used to gather nine degrees of freedom motion and audio data of both hands in 759 attempts to catch a pass. After preprocessing, the gathered data was used to train a neural network to classify all attempts, resulting in a classification accuracy of 93%. Additionally, the significance of each sensor signal was analysed. It turned out that the network relies most on acceleration and magnetometer data, neglecting most of the audio and gyroscope data. Besides the results, the paper introduces a new type of dataset and the possibility of autonomous training in American football to the research community.


Author(s):  
Jinfang Zeng ◽  
Youming Li ◽  
Yu Zhang ◽  
Da Chen

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. To date, a variety of signal processing and machine learning techniques have been applied to ESC task, including matrix factorization, dictionary learning, wavelet filterbanks and deep neural networks. It is observed that features extracted from deeper networks tend to achieve higher performance than those extracted from shallow networks. However, in ESC task, only the deep convolutional neural networks (CNNs) which contain several layers are used and the residual networks are ignored, which lead to degradation in the performance. Meanwhile, a possible explanation for the limited exploration of CNNs and the difficulty to improve on simpler models is the relative scarcity of labeled data for ESC. In this paper, a residual network called EnvResNet for the ESC task is proposed. In addition, we propose to use audio data augmentation to overcome the problem of data scarcity. The experiments will be performed on the ESC-50 database. Combined with data augmentation, the proposed model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches in terms of classification accuracy.


Sign in / Sign up

Export Citation Format

Share Document