International Journal of Multimedia Data Engineering and Management
Latest Publications


TOTAL DOCUMENTS

173
(FIVE YEARS 34)

H-INDEX

8
(FIVE YEARS 1)

Published By Igi Global

1947-8542, 1947-8534

Author(s):  
Jing Chen ◽  
Haifeng Li ◽  
Lin Ma ◽  
Hongjian Bo

Emotion detection using EEG signals has advantages in eliminating social masking to obtain a better understanding of underlying emotions. This paper presents the cognitive response to emotional speech and emotion recognition from EEG signals. A framework is proposed to recognize mental states from EEG signals induced by emotional speech: First, speech-evoked emotion cognitive experiment is designed, and EEG dataset is collected. Second, power-related features are extracted using EEMD-HHT, which is more accurate to reflect the instantaneous frequency of the signal than STFT and WT. An extensive analysis of relationships between frequency bands and emotional annotation of stimulus are presented using MIC and statistical analysis. The strongest correlations with EEG signals are found in lateral and medial orbitofrontal cortex (OFC). Finally, the performance of different feature set and classifier combinations are evaluated, and the experiments show that the framework proposed in this paper can effectively recognize emotion from EEG signals with accuracy of 75.7% for valence and 71.4% for arousal.


Author(s):  
Wei Wang ◽  
Hui Liu ◽  
Wangqun Lin

In the rapidly changing air combat environment, it is quite difficult for pilots to make speedy and reasonable decisions in a very short period due to lack of experience and the uncertainty of perception situation. Hence, the authors propose an intelligent cognitive tactical strategy framework of air combat on multi-source information in uncertain air combat situations for decision support. A fuzzy inferring tree method is proposed to simulate human intellection. Then, to further improve the accuracy of the reasoning results, a genetic algorithm is introduced to optimize the structure and parameters of fuzzy rules. The simulation results show that the proposed model is reasonable, fast, accurate, repeatable, and fatigue-free, which lays a good foundation for future high-end unmanned combat explorations.


Author(s):  
Amine Rahmani

Cryptography is one of the most used techniques to secure data since antiquity. It has been largely improved by introducing several mathematical concepts. This paper proposes a new asymmetric cryptography approach using combined Arnold's cat map with hyperbolic function and Chebyshev chaotic map for audio and image encryption. The proposed scheme uses Chebyshev map for public and secrete keys generation and the same equation with Arnold's cat map for encryption and decryption. Hyperbolic functions are also introduced replacing regular integer values in Arnold's map. The results show a good and promising efficiency as well as the theoretical discussion. Several future possible improvements are presented in the conclusion.


Author(s):  
Hui Liu ◽  
Wei Wang ◽  
Chuang Wen Wang

This paper introduces an improved HMM (hidden Markov model) for low altitude acoustic target recognition. To overcome the limitation of the classical CDHMM (continuous density hidden Markov model) training algorithm and the generalization ability deficiency of existing discriminative learning methods, a new discriminative training method for estimating the CDHMM in acoustic target recognition is proposed based on the principle of maximizing the minimum relative separation margin. According to the definition of the relative margin, the new training criterion can be equation as a standard constrained minimax optimization problem. Then, the optimization problem can be solved by a GPD (generalized probabilistic descent) algorithm. The experimental results show that the performance of the algorithm is significantly improved compared with the former training method, which can effectively improve the recognition ability of the acoustic target recognition system.


Author(s):  
Gavindya Jayawardena ◽  
Sampath Jayarathna

Eye-tracking experiments involve areas of interest (AOIs) for the analysis of eye gaze data. While there are tools to delineate AOIs to extract eye movement data, they may require users to manually draw boundaries of AOIs on eye tracking stimuli or use markers to define AOIs. This paper introduces two novel techniques to dynamically filter eye movement data from AOIs for the analysis of eye metrics from multiple levels of granularity. The authors incorporate pre-trained object detectors and object instance segmentation models for offline detection of dynamic AOIs in video streams. This research presents the implementation and evaluation of object detectors and object instance segmentation models to find the best model to be integrated in a real-time eye movement analysis pipeline. The authors filter gaze data that falls within the polygonal boundaries of detected dynamic AOIs and apply object detector to find bounding-boxes in a public dataset. The results indicate that the dynamic AOIs generated by object detectors capture 60% of eye movements & object instance segmentation models capture 30% of eye movements.


Author(s):  
Riju Bhattacharya ◽  
Naresh Kumar Nagwani ◽  
Sarsij Tripathi

Graph kernels have evolved as a promising and popular method for graph clustering over the last decade. In this work, comparative study on the five standard graph kernel techniques for graph clustering has been performed. The graph kernels, namely vertex histogram kernel, shortest path kernel, graphlet kernel, k-step random walk kernel, and Weisfeiler-Lehman kernel have been compared for graph clustering. The clustering methods considered for the kernel comparison are hierarchical, k-means, model-based, fuzzy-based, and self-organizing map clustering techniques. The comparative study of kernel methods over the clustering techniques is performed on MUTAG benchmark dataset. Clustering performance is assessed with internal validation performance parameters such as connectivity, Dunn, and the silhouette index. Finally, the comparative analysis is done to facilitate researchers for selecting the appropriate kernel method for effective graph clustering. The proposed methodology elicits k-step random walk and shortest path kernel have performed best among all graph clustering approaches.


Author(s):  
Yi Qin ◽  
Huayu Zhang ◽  
Yuni Wang ◽  
Mei Mao ◽  
Fuguo Chen

This paper is made to observe the impact of 3D (three-dimensional) and 2D (two-dimensional) music on autonomic nervous system and to explore the mechanism of the music. This study changes and retains some musical elements of the four music, and 73 healthy participants listened to four music tracks with headphones: 3D slow music, 2D slow music, 3D fast music, and 2D fast music. The results show that galvanic skin response (GSR) data decreased in all participants after listening to 3D music. Among them, the first and third 3D music, which bears obvious characteristics of sound spatial movements, high melody definition, stable rhythm structure, and high timbre identification of the main melody significantly changed participants' GSR compared to the benchmark obtained before the experiment (P<0.05). It can be reasonably argued that 3D music may improve the regulation of autonomic nervous system responses, which contributes to the health of mind and body.


Author(s):  
Lijuan Duan ◽  
Xiao Xu ◽  
Qing En

3D human dance generation from music is an interesting and challenging task in which the aim is to estimate 3D pose from visual and audio information. Existing methods only use skeleton information to complete this task, which may cause jittering results. In addition, due to lack of appropriate evaluation metrics for this task, it is difficult to evaluate the quality of the generated results. In this paper, the authors explore multi-modality dance generation networks through constructing the correspondence between the visual and the audio cues. Specifically, they propose a 2D prediction module to predict future frames by fusing visual and audio features. Moreover, they propose a 3D conversion module, which is able to generate the 3D skeleton from the 2D skeleton. In addition, some new human dance generation evaluation metrics are proposed to evaluate the quality of the generated results. Experimental results indicate that the proposed modules can meet the requirements of authenticity and diversity.


Author(s):  
Yu Zhang ◽  
Ju Liu ◽  
Xiaoxi Liu ◽  
Xuesong Gao

In this manuscript, the authors present a keyshots-based supervised video summarization method, where feature fusion and LSTM networks are used for summarization. The framework can be divided into three folds: 1) The authors formulate video summarization as a sequence to sequence problem, which should predict the importance score of video content based on video feature sequence. 2) By simultaneously considering visual features and textual features, the authors present the deep fusion multimodal features and summarize videos based on recurrent encoder-decoder architecture with bi-directional LSTM. 3) Most importantly, in order to train the supervised video summarization framework, the authors adopt the number of users who decided to select current video clip in their final video summary as the importance scores and ground truth. Comparisons are performed with the state-of-the-art methods and different variants of FLSum and T-FLSum. The results of F-score and rank correlation coefficients on TVSum and SumMe shows the outstanding performance of the method proposed in this manuscript.


Author(s):  
Zhigang Zhu ◽  
Jin Chen ◽  
Lei Zhang ◽  
Yaohua Chang ◽  
Tyler Franklin ◽  
...  

The iASSIST is an iPhone-based assistive sensor solution for independent and safe travel for people who are blind or visually impaired, or those who simply face challenges in navigating an unfamiliar indoor environment. The solution integrates information of Bluetooth beacons, data connectivity, visual models, and user preferences. Hybrid models of interiors are created in a modeling stage with these multimodal data, collected, and mapped to the floor plan as the modeler walks through the building. Client-server architecture allows scaling to large areas by lazy-loading models according to beacon signals and/or adjacent region proximity. During the navigation stage, a user with the navigation app is localized within the floor plan, using visual, connectivity, and user preference data, along an optimal route to their destination. User interfaces for both modeling and navigation use multimedia channels, including visual, audio, and haptic feedback for targeted users. The design of human subject test experiments is also described, in addition to some preliminary experimental results.


Sign in / Sign up

Export Citation Format

Share Document