Music video emotion classification using slow–fast audio–video network and unsupervised feature representation

AbstractAffective computing has suffered by the precise annotation because the emotions are highly subjective and vague. The music video emotion is complex due to the diverse textual, acoustic, and visual information which can take the form of lyrics, singer voice, sounds from the different instruments, and visual representations. This can be one reason why there has been a limited study in this domain and no standard dataset has been produced before now. In this study, we proposed an unsupervised method for music video emotion analysis using music video contents on the Internet. We also produced a labelled dataset and compared the supervised and unsupervised methods for emotion classification. The music and video information are processed through a multimodal architecture with audio–video information exchange and boosting method. The general 2D and 3D convolution networks compared with the slow–fast network with filter and channel separable convolution in multimodal architecture. Several supervised and unsupervised networks were trained in an end-to-end manner and results were evaluated using various evaluation metrics. The proposed method used a large dataset for unsupervised emotion classification and interpreted the results quantitatively and qualitatively in the music video that had never been applied in the past. The result shows a large increment in classification score using unsupervised features and information sharing techniques on audio and video network. Our best classifier attained 77% accuracy, an f1-score of 0.77, and an area under the curve score of 0.94 with minimum computational cost.

Download Full-text

Audio/video, information and communication technology equipment

10.3403/30239868u ◽

2018 ◽

Keyword(s):

Information And Communication Technology ◽

Communication Technology ◽

Video Information ◽

Information And Communication ◽

Audio Video

Download Full-text

Audio/video, information and communication technology equipment

10.3403/30347376u ◽

2020 ◽

Keyword(s):

Information And Communication Technology ◽

Communication Technology ◽

Video Information ◽

Information And Communication ◽

Audio Video

Download Full-text

Audio/video, information and communication technology equipment

10.3403/30347376 ◽

2020 ◽

Keyword(s):

Information And Communication Technology ◽

Communication Technology ◽

Video Information ◽

Information And Communication ◽

Audio Video

Download Full-text

Audio/video, information and communication technology equipment

10.3403/pdiectr62368 ◽

2015 ◽

Keyword(s):

Information And Communication Technology ◽

Communication Technology ◽

Video Information ◽

Information And Communication ◽

Audio Video

Download Full-text

Structured Gaussian Process Regression of Music Mood

Fundamenta Informaticae ◽

10.3233/fi-2020-1970 ◽

2020 ◽

Vol 176 (2) ◽

pp. 183-203

Author(s):

Santosh Chapaneri ◽

Deepak Jayaswal

Keyword(s):

Gaussian Process ◽

Computational Cost ◽

Gaussian Process Regression ◽

Gaussian Mixture ◽

Feature Representation ◽

Affective Content ◽

Variational Bayesian Inference ◽

Benchmark Datasets ◽

Valence And Arousal ◽

Structured Regression

Modeling the music mood has wide applications in music categorization, retrieval, and recommendation systems; however, it is challenging to computationally model the affective content of music due to its subjective nature. In this work, a structured regression framework is proposed to model the valence and arousal mood dimensions of music using a single regression model at a linear computational cost. To tackle the subjectivity phenomena, a confidence-interval based estimated consensus is computed by modeling the behavior of various annotators (e.g. biased, adversarial) and is shown to perform better than using the average annotation values. For a compact feature representation of music clips, variational Bayesian inference is used to learn the Gaussian mixture model representation of acoustic features and chord-related features are used to improve the valence estimation by probing the chord progressions between chroma frames. The dimensionality of features is further reduced using an adaptive version of kernel PCA. Using an efficient implementation of twin Gaussian process for structured regression, the proposed work achieves a significant improvement in R2 for arousal and valence dimensions relative to state-of-the-art techniques on two benchmark datasets for music mood estimation.

Download Full-text

Performance Evaluation of Request and Response Time for Audio and Video Data Sets

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-135 ◽

2019 ◽

pp. 133-136

Author(s):

Dr. Manish L Jivtode

Keyword(s):

Web Services ◽

Web Service ◽

Information Exchange ◽

Video Data ◽

Data Sets ◽

Representational State Transfer ◽

State Transfer ◽

Extensible Markup ◽

Communications Protocol ◽

Audio Video

Web services are applications that allow for communication between devices over the internet and are independent of the technology. The devices are built and use standardized eXtensible Markup Language (XML) for information exchange. A client or user is able to invoke a web service by sending an XML message and then gets back and XML response message. There are a number of communication protocols for web services that use the XML format such as Web Services Flow Language (WSFL), Blocks Extensible Exchange Protocol(BEEP) etc. Simple Object Access Protocol (SOAP) and Representational State Transfer (REST) are used options for accessing web services. It is not directly comparable that SOAP is a communications protocol while REST is a set of architectural principles for data transmission. In this paper, the data size of 1KB, 2KB, 4KB, 8KB and 16KB were tested each for Audio, Video and result obtained for CRUD methods. The encryption and decryption timings in milliseconds/seconds were recorded by programming extensibility points of a WCF REST web service in the Azure cloud..

Download Full-text

Split brains

Routledge Encyclopedia of Philosophy ◽

10.4324/0123456789-w042-2 ◽

2018 ◽

Author(s):

Elizabeth Schechter

Keyword(s):

Visual Field ◽

Information Exchange ◽

Visual Information ◽

Right Hemisphere ◽

The Other ◽

Brain Surgery ◽

Interhemispheric Interaction ◽

Split Brain ◽

The One ◽

The Right

The largest fibre tract in the human brain connects the two cerebral hemispheres. A ‘split-brain’ surgery severs this structure, sometimes together with other white matter tracts connecting the right hemisphere and the left. Split-brain surgeries have long been performed on non-human animals for experimental purposes, but a number of these surgeries were also performed on adult human beings in the second half of the twentieth century, as a medical treatment for severe cases of epilepsy. A number of these people afterwards agreed to participate in ongoing research into the psychobehavioural consequences of the procedure. These experiments have helped to show that the corpus callosum is a significant source of interhemispheric interaction and information exchange in the ‘neurotypical’ brain. After split-brain surgery, the two hemispheres operate unusually independently of each other in the realm of perception, cognition, and the control of action. For instance, each hemisphere receives visual information directly from the opposite (‘contralateral’) side of space, the right hemisphere from the left visual field and the left hemisphere from the right visual field. This is true of the normal (‘neurotypical’) brain too, but in the neurotypical case interhemispheric tracts allow either hemisphere to gain access to the information that the other has received. In a split-brain subject however the information more or less stays put in whatever hemisphere initially received it. And it isn’t just visual information that is confined to one hemisphere or the other after the surgery. Rather, after split-brain surgery, each hemisphere is the source of proprietary perceptual information of various kinds, and is also the source of proprietary memories, intentions, and aptitudes. Various notions of psychological unity or integration have always been central to notions of mind, personhood, and the self. Although split-brain surgery does not prevent interhemispheric interaction or exchange, it naturally alters and impedes it. So does the split-brain subject as a whole nonetheless remain a unitary psychological being? Or could there now be two such psychological beings within one human animal – sharing one body, one face, one voice? Prominent neuropsychologists working with the subjects have often appeared to argue or assume that a split-brain subject has a divided or disunified consciousness and even two minds. Although a number of philosophers agree, the majority seem to have resisted these conscious and mental ‘duality claims’, defending alternative interpretations of the split-brain experimental results. The sources of resistance are diverse, including everything from a commitment to the necessary unity of consciousness, to recognition of those psychological processes that remain interhemispherically integrated, to concerns about what the moral and legal consequences would be of recognizing multiple psychological beings in one body. On the other hand underlying most of these arguments against the various ‘duality’ claims is the simple fact that the split-brain subject does not appear to be two persons, but one – and there are powerful conceptual, social, and moral connections between being a unitary person on the one hand and having a unified consciousness and mind on the other.

Download Full-text

Impact of Cloud Gaming in Health Care, Education, and Entertainment Services

Emerging Technologies and Applications for Cloud-Based Gaming - Advances in Multimedia and Interactive Technologies ◽

10.4018/978-1-5225-0546-4.ch012 ◽

2017 ◽

pp. 261-283

Author(s):

Padmalaya Nayak ◽

Shelendra Kumar Sharma

Keyword(s):

Health Care ◽

Computational Cost ◽

Large Data ◽

Health Care Education ◽

Video Streams ◽

Client Side ◽

Audio Video ◽

Use Of Internet ◽

The Impact

With the rapid growth of Cloud Computing, various diverse applications are growing exponentially through large data centers with the use of Internet. Cloud gaming is one of the most novel service applications that helps to store the video games in cloud and client can access the games as audio/video streams. Cloud gaming in practice substantially reduces the computational cost at the client side and enables the use of thin clients. Further, Quality of Service (QoS) may be affected through cloud gaming by introducing access latency. The objective of this chapter is to bring the impact and effectiveness of cloud gaming application on users, Health care, Entertainment, and Education.

Download Full-text

High-Resolution Neural Network for Driver Visual Attention Prediction

Sensors ◽

10.3390/s20072030 ◽

2020 ◽

Vol 20 (7) ◽

pp. 2030 ◽

Cited By ~ 1

Author(s):

Byeongkeun Kang ◽

Yeejin Lee

Keyword(s):

Neural Network ◽

High Resolution ◽

Visual Attention ◽

Visual Information ◽

Prediction Accuracy ◽

Critical Role ◽

Image Resolution ◽

Feature Representation ◽

Safe Driving ◽

Feature Representations

Driving is a task that puts heavy demands on visual information, thereby the human visual system plays a critical role in making proper decisions for safe driving. Understanding a driver’s visual attention and relevant behavior information is a challenging but essential task in advanced driver-assistance systems (ADAS) and efficient autonomous vehicles (AV). Specifically, robust prediction of a driver’s attention from images could be a crucial key to assist intelligent vehicle systems where a self-driving car is required to move safely interacting with the surrounding environment. Thus, in this paper, we investigate a human driver’s visual behavior in terms of computer vision to estimate the driver’s attention locations in images. First, we show that feature representations at high resolution improves visual attention prediction accuracy and localization performance when being fused with features at low-resolution. To demonstrate this, we employ a deep convolutional neural network framework that learns and extracts feature representations at multiple resolutions. In particular, the network maintains the feature representation with the highest resolution at the original image resolution. Second, attention prediction tends to be biased toward centers of images when neural networks are trained using typical visual attention datasets. To avoid overfitting to the center-biased solution, the network is trained using diverse regions of images. Finally, the experimental results verify that our proposed framework improves the prediction accuracy of a driver’s attention locations.

Download Full-text

A Face Image Virtualization Mechanism for Privacy Intrusion Prevention in Healthcare Video Surveillance Systems

Symmetry ◽

10.3390/sym12060891 ◽

2020 ◽

Vol 12 (6) ◽

pp. 891 ◽

Cited By ~ 3

Author(s):

Jinsu Kim ◽

Namje Park

Keyword(s):

Video Surveillance ◽

Visual Information ◽

Face Image ◽

Video Data ◽

Surveillance Systems ◽

Image Information ◽

Video Information ◽

Face Area ◽

Blockchain Technology ◽

Wide Range

Closed-circuit television (CCTV) and video surveillance systems (VSSs) are becoming increasingly more common each year to help prevent incidents/accidents and ensure the security of public places and facilities. The increased presence of VSS is also increasing the number of per capita exposures to CCTV cameras. To help protect the privacy of the exposed objects, attention is being drawn to technologies that utilize intelligent video surveillance systems (IVSSs). IVSSs execute a wide range of surveillance duties—from simple identification of objects in the recorded video data, to understanding and identifying the behavioral patterns of objects and the situations at the incident/accident scenes, as well as the processing of video information to protect the privacy of the recorded objects against leakage. Besides, the recorded privacy information is encrypted and recorded using blockchain technology to prevent forgery of the image. The technology herein proposed (the “proposed mechanism”) is implemented to a VSS, where the mechanism converts the original visual information recorded on a VSS into a similarly constructed image information, so that the original information can be protected against leakage. The face area extracted from the image information is recorded in a separate database, allowing the creation of a restored image that is in perfect symmetry with the original image for images with virtualized face areas. Specifically, the main section of this study proposes an image modification mechanism that inserts a virtual face image that closely matches a predetermined similarity and uses a blockchain as the storage area.

Download Full-text