Optimizing class priors to improve the detection of social signals in audio data

2022 ◽  
Vol 107 ◽  
pp. 104541
Author(s):  
Gábor Gosztolya
Keyword(s):  
2020 ◽  
Vol 6 ◽  
pp. 17-40
Author(s):  
Muralitheran Munusamy

Sound or audio engineering is a branch of the field of engineering, which involves the process of recording sound and reproducing it by various means, as well as storing in order to be reproduced later. Known as sound or audio engineers, these trained professionals work in a variety of sound production fields and expert in recording methods. They can be instrumental to implement the affordable technologies and technical process to distribute the audio data hence, making it accessible for future generations. The current role of these engineers not only to perform or limited to recording session but they create metadata for archiving and preservation for future needs. Currently, product sleeves of ethnographic recordings represent no technical elements of how traditional music recordings are produced. The product details focus only to some extent on historical elements and musical notation. To an audio archivist, declaring what devices are in a recording is not linked with preservation data. Apart from the format, the sleeved design, technical specification is essential to other social scientists such as audio engineer and field recordist of the future. The aim of the present research is to capture optimum dynamic range of the sound and applying a signal processing that would not alter the tonality, timbre and harmonic of the sound. Further applying a suitable information storage for the metadata to be preserve or archived for future accessing and reproduction.


2020 ◽  
Author(s):  
Abdulaziz Abubshait ◽  
Patrick P. Weis ◽  
Eva Wiese

Social signals, such as changes in gaze direction, are essential cues to predict others’ mental states and behaviors (i.e., mentalizing). Studies show that humans can mentalize with non-human agents when they perceive a mind in them (i.e., mind perception). Robots that physically and/or behaviorally resemble humans likely trigger mind perception, which enhances the relevance of social cues and improves social-cognitive performance. The current ex-periments examine whether the effect of physical and behavioral influencers of mind perception on social-cognitive processing is modulated by the lifelikeness of a social interaction. Participants interacted with robots of varying degrees of physical (humanlike vs. robot-like) and behavioral (reliable vs. random) human-likeness while the lifelikeness of a social attention task was manipulated across five experiments. The first four experiments manipulated lifelikeness via the physical realism of the robot images (Study 1 and 2), the biological plausibility of the social signals (Study 3), and the plausibility of the social con-text (Study 4). They showed that humanlike behavior affected social attention whereas appearance affected mind perception ratings. However, when the lifelikeness of the interaction was increased by using videos of a human and a robot sending the social cues in a realistic environment (Study 5), social attention mechanisms were affected both by physical appearance and behavioral features, while mind perception ratings were mainly affected by physical appearance. This indicates that in order to understand the effect of physical and behavioral features on social cognition, paradigms should be used that adequately simulate the lifelikeness of social interactions.


Electronics ◽  
2021 ◽  
Vol 10 (15) ◽  
pp. 1807
Author(s):  
Sascha Grollmisch ◽  
Estefanía Cano

Including unlabeled data in the training process of neural networks using Semi-Supervised Learning (SSL) has shown impressive results in the image domain, where state-of-the-art results were obtained with only a fraction of the labeled data. The commonality between recent SSL methods is that they strongly rely on the augmentation of unannotated data. This is vastly unexplored for audio data. In this work, SSL using the state-of-the-art FixMatch approach is evaluated on three audio classification tasks, including music, industrial sounds, and acoustic scenes. The performance of FixMatch is compared to Convolutional Neural Networks (CNN) trained from scratch, Transfer Learning, and SSL using the Mean Teacher approach. Additionally, a simple yet effective approach for selecting suitable augmentation methods for FixMatch is introduced. FixMatch with the proposed modifications always outperformed Mean Teacher and the CNNs trained from scratch. For the industrial sounds and music datasets, the CNN baseline performance using the full dataset was reached with less than 5% of the initial training data, demonstrating the potential of recent SSL methods for audio data. Transfer Learning outperformed FixMatch only for the most challenging dataset from acoustic scene classification, showing that there is still room for improvement.


1989 ◽  
Vol 32 (7) ◽  
pp. 862-871 ◽  
Author(s):  
Clement Yu ◽  
Wei Sun ◽  
Dina Bitton ◽  
Qi Yang ◽  
Richard Bruno ◽  
...  

2020 ◽  
pp. 002076402098419
Author(s):  
Kwamina Abekah-Carter ◽  
George Ofosu Oti

Background: Homelessness among people with mental illness has grown to become a common phenomenon in many developed and developing countries. Just like in any other country, the living conditions of homeless people with mental illness in Ghana are unwholesome. Despite the increased population of these vulnerable individuals on the streets, not much is known about the perspectives of the general public towards this phenomenon in Ghana. Aim: This research was conducted to explore the perspectives of community members on homeless people with mental illness. The main study objectives were (a) to find out the impacts of the presence of persons with mental illness on the streets and (b) to ascertain the reasons accounting for homelessness among persons with mental illness. Method: Utilizing a qualitative research design, twenty community members were sampled from selected suburbs in Nsawam and interviewed with the use of a semi-structured interview guide. The audio data gathered from the interviews were transcribed verbatim and analysed thematically. Results: Majority of the participants asserted that homeless people with mental illness had no access to good food, shelter, and health care. They further stated that some homeless people with mental illness perpetrated physical and sexual violence against the residents. Moreover, the participants believed that persons with mental illness remained on the streets due to neglect by their family members, and limited access to psychiatric services. Conclusion: This paper concludes by recommending to government to make mental health services accessible and affordable to homeless persons with mental illness nationwide.


Mathematics ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 195
Author(s):  
Adrian Sergiu Darabant ◽  
Diana Borza ◽  
Radu Danescu

The human face holds a privileged position in multi-disciplinary research as it conveys much information—demographical attributes (age, race, gender, ethnicity), social signals, emotion expression, and so forth. Studies have shown that due to the distribution of ethnicity/race in training datasets, biometric algorithms suffer from “cross race effect”—their performance is better on subjects closer to the “country of origin” of the algorithm. The contributions of this paper are two-fold: (a) first, we gathered, annotated and made public a large-scale database of (over 175,000) facial images by automatically crawling the Internet for celebrities’ images belonging to various ethnicity/races, and (b) we trained and compared four state of the art convolutional neural networks on the problem of race and ethnicity classification. To the best of our knowledge, this is the largest, data-balanced, publicly-available face database annotated with race and ethnicity information. We also studied the impact of various face traits and image characteristics on the race/ethnicity deep learning classification methods and compared the obtained results with the ones extracted from psychological studies and anthropomorphic studies. Extensive tests were performed in order to determine the facial features to which the networks are sensitive to. These tests and a recognition rate of 96.64% on the problem of human race classification demonstrate the effectiveness of the proposed solution.


2021 ◽  
Vol 14 (1) ◽  
pp. 205979912098776
Author(s):  
Joseph Da Silva

Interviews are an established research method across multiple disciplines. Such interviews are typically transcribed orthographically in order to facilitate analysis. Many novice qualitative researchers’ experiences of manual transcription are that it is tedious and time-consuming, although it is generally accepted within much of the literature that quality of analysis is improved through researchers performing this task themselves. This is despite the potential for the exhausting nature of bulk transcription to conversely have a negative impact upon quality. Other researchers have explored the use of automated methods to ease the task of transcription, more recently using cloud-computing services, but such services present challenges to ensuring confidentiality and privacy of data. In the field of cyber-security, these are particularly concerning; however, any researcher dealing with confidential participant speech should also be uneasy with third-party access to such data. As a result, researchers, particularly early-career researchers and students, may find themselves with no option other than manual transcription. This article presents a secure and effective alternative, building on prior work published in this journal, to present a method that significantly reduced, by more than half, interview transcription time for the researcher yet maintained security of audio data. It presents a comparison between this method and a fully manual method, drawing on data from 10 interviews conducted as part of my doctoral research. The method presented requires an investment in specific equipment which currently only supports the English language.


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6722
Author(s):  
Bernhard Hollaus ◽  
Sebastian Stabinger ◽  
Andreas Mehrle ◽  
Christian Raschner

Highly efficient training is a must in professional sports. Presently, this means doing exercises in high number and quality with some sort of data logging. In American football many things are logged, but there is no wearable sensor that logs a catch or a drop. Therefore, the goal of this paper was to develop and verify a sensor that is able to do exactly that. In a first step a sensor platform was used to gather nine degrees of freedom motion and audio data of both hands in 759 attempts to catch a pass. After preprocessing, the gathered data was used to train a neural network to classify all attempts, resulting in a classification accuracy of 93%. Additionally, the significance of each sensor signal was analysed. It turned out that the network relies most on acceleration and magnetometer data, neglecting most of the audio and gyroscope data. Besides the results, the paper introduces a new type of dataset and the possibility of autonomous training in American football to the research community.


Sign in / Sign up

Export Citation Format

Share Document