Emotion-Based Extraction, Classification and Prediction of the Audio Data

Sound or audio engineering is a branch of the field of engineering, which involves the process of recording sound and reproducing it by various means, as well as storing in order to be reproduced later. Known as sound or audio engineers, these trained professionals work in a variety of sound production fields and expert in recording methods. They can be instrumental to implement the affordable technologies and technical process to distribute the audio data hence, making it accessible for future generations. The current role of these engineers not only to perform or limited to recording session but they create metadata for archiving and preservation for future needs. Currently, product sleeves of ethnographic recordings represent no technical elements of how traditional music recordings are produced. The product details focus only to some extent on historical elements and musical notation. To an audio archivist, declaring what devices are in a recording is not linked with preservation data. Apart from the format, the sleeved design, technical specification is essential to other social scientists such as audio engineer and field recordist of the future. The aim of the present research is to capture optimum dynamic range of the sound and applying a signal processing that would not alter the tonality, timbre and harmonic of the sound. Further applying a suitable information storage for the metadata to be preserve or archived for future accessing and reproduction.

Download Full-text

Improving Semi-Supervised Learning for Audio Classification with FixMatch

Electronics ◽

10.3390/electronics10151807 ◽

2021 ◽

Vol 10 (15) ◽

pp. 1807

Author(s):

Sascha Grollmisch ◽

Estefanía Cano

Keyword(s):

Neural Networks ◽

Supervised Learning ◽

Transfer Learning ◽

Data Transfer ◽

State Of The Art ◽

Training Data ◽

Audio Classification ◽

Image Domain ◽

Full Dataset ◽

Audio Data

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.

Download Full-text

Efficient placement of audio data on optical disks for real-time applications

Communications of the ACM ◽

10.1145/65445.65453 ◽

1989 ◽

Vol 32 (7) ◽

pp. 862-871 ◽

Cited By ~ 46

Author(s):

Clement Yu ◽

Wei Sun ◽

Dina Bitton ◽

Qi Yang ◽

Richard Bruno ◽

...

Keyword(s):

Real Time ◽

Audio Data ◽

Optical Disks ◽

Real Time Applications

Download Full-text

Perspectives of community members on homeless people with mental illness in Nsawam, Ghana

International Journal of Social Psychiatry ◽

10.1177/0020764020984195 ◽

2020 ◽

pp. 002076402098419

Author(s):

Kwamina Abekah-Carter ◽

George Ofosu Oti

Keyword(s):

Mental Illness ◽

Homeless People ◽

Structured Interview ◽

Homeless Persons ◽

Community Members ◽

Qualitative Research Design ◽

People With Mental Illness ◽

Interview Guide ◽

Audio Data ◽

Good Food

Background: Homelessness among people with mental illness has grown to become a common phenomenon in many developed and developing countries. Just like in any other country, the living conditions of homeless people with mental illness in Ghana are unwholesome. Despite the increased population of these vulnerable individuals on the streets, not much is known about the perspectives of the general public towards this phenomenon in Ghana. Aim: This research was conducted to explore the perspectives of community members on homeless people with mental illness. The main study objectives were (a) to find out the impacts of the presence of persons with mental illness on the streets and (b) to ascertain the reasons accounting for homelessness among persons with mental illness. Method: Utilizing a qualitative research design, twenty community members were sampled from selected suburbs in Nsawam and interviewed with the use of a semi-structured interview guide. The audio data gathered from the interviews were transcribed verbatim and analysed thematically. Results: Majority of the participants asserted that homeless people with mental illness had no access to good food, shelter, and health care. They further stated that some homeless people with mental illness perpetrated physical and sexual violence against the residents. Moreover, the participants believed that persons with mental illness remained on the streets due to neglect by their family members, and limited access to psychiatric services. Conclusion: This paper concludes by recommending to government to make mental health services accessible and affordable to homeless persons with mental illness nationwide.

Download Full-text

Producing ‘good enough’ automated transcripts securely: Extending Bokhove and Downey (2018) to address security concerns

Methodological Innovations ◽

10.1177/2059799120987766 ◽

2021 ◽

Vol 14 (1) ◽

pp. 205979912098776

Author(s):

Joseph Da Silva

Keyword(s):

Cyber Security ◽

English Language ◽

Negative Impact ◽

Early Career ◽

Third Party ◽

Doctoral Research ◽

Computing Services ◽

Cloud Computing Services ◽

Audio Data

Interviews are an established research method across multiple disciplines. Such interviews are typically transcribed orthographically in order to facilitate analysis. Many novice qualitative researchers’ experiences of manual transcription are that it is tedious and time-consuming, although it is generally accepted within much of the literature that quality of analysis is improved through researchers performing this task themselves. This is despite the potential for the exhausting nature of bulk transcription to conversely have a negative impact upon quality. Other researchers have explored the use of automated methods to ease the task of transcription, more recently using cloud-computing services, but such services present challenges to ensuring confidentiality and privacy of data. In the field of cyber-security, these are particularly concerning; however, any researcher dealing with confidential participant speech should also be uneasy with third-party access to such data. As a result, researchers, particularly early-career researchers and students, may find themselves with no option other than manual transcription. This article presents a secure and effective alternative, building on prior work published in this journal, to present a method that significantly reduced, by more than half, interview transcription time for the researcher yet maintained security of audio data. It presents a comparison between this method and a fully manual method, drawing on data from 10 interviews conducted as part of my doctoral research. The method presented requires an investment in specific equipment which currently only supports the English language.

Download Full-text

Using Wearable Sensors and a Convolutional Neural Network for Catch Detection in American Football

Sensors ◽

10.3390/s20236722 ◽

2020 ◽

Vol 20 (23) ◽

pp. 6722

Author(s):

Bernhard Hollaus ◽

Sebastian Stabinger ◽

Andreas Mehrle ◽

Christian Raschner

Keyword(s):

Neural Network ◽

Degrees Of Freedom ◽

Wearable Sensors ◽

Professional Sports ◽

American Football ◽

Data Logging ◽

Sensor Platform ◽

Magnetometer Data ◽

Audio Data ◽

New Type

Highly efficient training is a must in professional sports. Presently, this means doing exercises in high number and quality with some sort of data logging. In American football many things are logged, but there is no wearable sensor that logs a catch or a drop. Therefore, the goal of this paper was to develop and verify a sensor that is able to do exactly that. In a first step a sensor platform was used to gather nine degrees of freedom motion and audio data of both hands in 759 attempts to catch a pass. After preprocessing, the gathered data was used to train a neural network to classify all attempts, resulting in a classification accuracy of 93%. Additionally, the significance of each sensor signal was analysed. It turned out that the network relies most on acceleration and magnetometer data, neglecting most of the audio and gyroscope data. Besides the results, the paper introduces a new type of dataset and the possibility of autonomous training in American football to the research community.

Download Full-text

Design of LAN-Based Real-Time Audio Data Recording and Monitoring System

2010 International Conference on Biomedical Engineering and Computer Science ◽

10.1109/icbecs.2010.5462420 ◽

2010 ◽

Author(s):

Jingyang Wang ◽

Liwei Guo ◽

Huiyong Wang ◽

Xiaohong Wang

Keyword(s):

Real Time ◽

Monitoring System ◽

Data Recording ◽

Audio Data

Download Full-text

Automatic transcription of general audio data: preliminary analyses

Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96 ◽

10.1109/icslp.1996.607431 ◽

2002 ◽

Cited By ~ 21

Author(s):

M.S. Spina ◽

V.W. Zue

Keyword(s):

Audio Data ◽

Automatic Transcription

Download Full-text

A Method of Environmental Sound Classification Based on Residual Networks and Data Augmentation

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500188 ◽

2021 ◽

pp. 2150018

Author(s):

Jinfang Zeng ◽

Youming Li ◽

Yu Zhang ◽

Da Chen

Keyword(s):

Neural Networks ◽

Data Augmentation ◽

Machine Learning Techniques ◽

Deep Convolutional Neural Networks ◽

Mel Frequency Cepstral Coefficients ◽

Environmental Sound ◽

Sound Classification ◽

Learning Techniques ◽

Proposed Model ◽

Audio Data

Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. To date, a variety of signal processing and machine learning techniques have been applied to ESC task, including matrix factorization, dictionary learning, wavelet filterbanks and deep neural networks. It is observed that features extracted from deeper networks tend to achieve higher performance than those extracted from shallow networks. However, in ESC task, only the deep convolutional neural networks (CNNs) which contain several layers are used and the residual networks are ignored, which lead to degradation in the performance. Meanwhile, a possible explanation for the limited exploration of CNNs and the difficulty to improve on simpler models is the relative scarcity of labeled data for ESC. In this paper, a residual network called EnvResNet for the ESC task is proposed. In addition, we propose to use audio data augmentation to overcome the problem of data scarcity. The experiments will be performed on the ESC-50 database. Combined with data augmentation, the proposed model outperforms baseline implementations relying on mel-frequency cepstral coefficients and achieves results comparable to other state-of-the-art approaches in terms of classification accuracy.

Download Full-text

The Transmission of Digital Audio: Data Formats

Digital Audio Editing Fundamentals ◽

10.1007/978-1-4842-1648-4_4 ◽

2015 ◽

pp. 27-32

Author(s):

Wallace Jackson

Keyword(s):

Digital Audio ◽

Data Formats ◽

Audio Data

Download Full-text