multimodal features
Recently Published Documents


TOTAL DOCUMENTS

172
(FIVE YEARS 101)

H-INDEX

15
(FIVE YEARS 3)

2022 ◽  
Vol 2022 ◽  
pp. 1-18
Author(s):  
Chao Tang ◽  
Anyang Tong ◽  
Aihua Zheng ◽  
Hua Peng ◽  
Wei Li

The traditional human action recognition (HAR) method is based on RGB video. Recently, with the introduction of Microsoft Kinect and other consumer class depth cameras, HAR based on RGB-D (RGB-Depth) has drawn increasing attention from scholars and industry. Compared with the traditional method, the HAR based on RGB-D has high accuracy and strong robustness. In this paper, using a selective ensemble support vector machine to fuse multimodal features for human action recognition is proposed. The algorithm combines the improved HOG feature-based RGB modal data, the depth motion map-based local binary pattern features (DMM-LBP), and the hybrid joint features (HJF)-based joints modal data. Concomitantly, a frame-based selective ensemble support vector machine classification model (SESVM) is proposed, which effectively integrates the selective ensemble strategy with the selection of SVM base classifiers, thus increasing the differences between the base classifiers. The experimental results have demonstrated that the proposed method is simple, fast, and efficient on public datasets in comparison with other action recognition algorithms.


2022 ◽  
Vol 0 (0) ◽  
Author(s):  
Yumin Chen

Abstract Given the evolving multimodal features in educational settings, modes other than language further enable the diversity in the realization of meanings and pedagogic goals. This paper explores modality in multimodal pedagogic materials for teaching English as a foreign language in China. Drawing upon the social semiotic approach to modality in visual media, this study provides a comparative analysis of modality markers in different elemental genres that constitute the macrogenre of a teaching unit, with a focus on explaining the underlying reasons for the different choices in terms of coding orientation. It is shown that different degrees of deviation from the accepted coding orientation are employed in different constituent genres of the macrogenre of a given text.


2021 ◽  
Author(s):  
Ishaan Batta ◽  
Anees Abrol ◽  
Zening Fu ◽  
Vince Calhoun

Here we introduce a multimodal framework to identify subspaces in the human brain that are defined by collective changes in structural and functional measures and are actively linked to demographic, biological and cognitive indicators in a population. We determine the multimodal subspaces using principles of active subspace learning (ASL) and demonstrate its application on a sample learning task (biological ageing) on a Schizophrenia dataset. The proposed multimodal ASL method successfully identifies latent brain representations as subsets of brain regions and connections forming co-varying subspaces in association with biological age. We show that Schizophrenia is characterized by different subspace patterns compared to those in a cognitively normal brain. The multimodal features generated by projecting structural and functional MRI components onto these active subspaces perform better than several PCA-based transformations and equally well when compared to non-transformed features on the studied learning task. In essence, the proposed method successfully learns active brain subspaces associated with a specific brain condition but inferred from the brain imaging data along with the biological/cognitive traits of interest.


2021 ◽  
Vol 12 ◽  
Author(s):  
Chenyang Yao ◽  
Na Hu ◽  
Hengyi Cao ◽  
Biqiu Tang ◽  
Wenjing Zhang ◽  
...  

Background: Antipsychotic medications provide limited long-term benefit to ~30% of schizophrenia patients. Multimodal magnetic resonance imaging (MRI) data have been used to investigate brain features between responders and nonresponders to antipsychotic treatment; however, these analytical techniques are unable to weigh the interrelationships between modalities. Here, we used multiset canonical correlation and joint independent component analysis (mCCA + jICA) to fuse MRI data to examine the shared and specific multimodal features between the patients and healthy controls (HCs) and between the responders and non-responders.Method: Resting-state functional and structural MRI data were collected from 55 patients with drug-naïve first-episode schizophrenia (FES) and demographically matched HCs. Based on the decrease in Positive and Negative Syndrome Scale scores from baseline to the 1-year follow-up, FES patients were divided into a responder group (RG) and a non-responder group (NRG). Gray matter volume (GMV), fractional amplitude of low-frequency fluctuation (fALFF), and regional homogeneity (ReHo) maps were used as features in mCCA + jICA.Results: Between FES patients and HCs, there were three modality-specific discriminative independent components (ICs) showing the difference in mixing coefficients (GMV-IC7, GMV-IC8, and fALFF-IC5). The fusion analysis indicated one modality-shared IC (GMV-IC2 and ReHo-IC2) and three modality-specific ICs (GMV-IC1, GMV-IC3, and GMV-IC6) between the RG and NRG. The right postcentral gyrus showed a significant difference in GMV features between FES patients and HCs and modality-shared features (GMV and ReHo) between responders and nonresponders. The modality-shared component findings were highlighted by GMV, mainly in the bilateral temporal gyrus and the right cerebellum associated with ReHo in the right postcentral gyrus.Conclusions: This study suggests that joint anatomical and functional features of the cortices may reflect an early pathophysiological mechanism that is related to a 1-year treatment response.


Author(s):  
Xing Xu ◽  
Yifan Wang ◽  
Yixuan He ◽  
Yang Yang ◽  
Alan Hanjalic ◽  
...  

Image-sentence matching is a challenging task in the field of language and vision, which aims at measuring the similarities between images and sentence descriptions. Most existing methods independently map the global features of images and sentences into a common space to calculate the image-sentence similarity. However, the image-sentence similarity obtained by these methods may be coarse as (1) an intermediate common space is introduced to implicitly match the heterogeneous features of images and sentences in a global level, and (2) only the inter-modality relations of images and sentences are captured while the intra-modality relations are ignored. To overcome the limitations, we propose a novel Cross-Modal Hybrid Feature Fusion (CMHF) framework for directly learning the image-sentence similarity by fusing multimodal features with inter- and intra-modality relations incorporated. It can robustly capture the high-level interactions between visual regions in images and words in sentences, where flexible attention mechanisms are utilized to generate effective attention flows within and across the modalities of images and sentences. A structured objective with ranking loss constraint is formed in CMHF to learn the image-sentence similarity based on the fused fine-grained features of different modalities bypassing the usage of intermediate common space. Extensive experiments and comprehensive analysis performed on two widely used datasets—Microsoft COCO and Flickr30K—show the effectiveness of the hybrid feature fusion framework in CMHF, in which the state-of-the-art matching performance is achieved by our proposed CMHF method.


Patterns ◽  
2021 ◽  
pp. 100364
Author(s):  
Saaket Agrawal ◽  
Marcus D.R. Klarqvist ◽  
Connor Emdin ◽  
Aniruddh P. Patel ◽  
Manish D. Paranjpe ◽  
...  

2021 ◽  
Vol 37 (3) ◽  
pp. 208-230
Author(s):  
Wan Fatimah Solihah Wan Abdul Halim ◽  
◽  
Intan Safinaz Zainudin ◽  
Nor Fariza Mohd Nor ◽  
◽  
...  

The medical tourism industry, which was seriously affected by the coronavirus disease of 2019 (COVID-19), needs to give attention to its online promotional message strategy to boost the industry. Cultural variability is also crucial since the market for the medical tourism industry is global. However, studies involving cultural variability have only focused on examining single discourse mode, mainly the linguistic mode and overlooked the multimodal perspective. This study, therefore, examined the way in which the Prince Court Medical Centre (PCMC), a private hospital in Malaysia is presented and how the various modes in the hospital's website are combined to deliver promotional messages to international medical tourists. A total of three web pages from the website of PCMC were analysed using the Systemic Functional Theory framework. This study employed Halliday’s metafunction theory (for language analysis and Kress and van Leeuwen’s model for image analysis. The ways in which the multimodal features of the website reflect communicative style from the cultural perspective were also explored. Hall’s (2000) cultural dimension of context dependency which classifies cultures into high-context and low-context cultures was used to present the analysis. The findings revealed that PCMC’s hospital website has elements that are mainly encountered in low-context cultures such as elaborated code systems as well as direct, explicit, and highly structured messages. The findings help create awareness of communicative strategies in designing medical tourism websites that involve meaning making through texts and images and the possible cultural interpretation especially among copywriters, website designers or medical tourism stakeholders. Keywords: Multimodal analysis, systemic functional theory (SFT), cultural context dimension, online promotional discourse, medical tourism.


2021 ◽  
Vol 12 ◽  
Author(s):  
Monica Pereira ◽  
Hongying Meng ◽  
Kate Hone

It is well recognised that social signals play an important role in communication effectiveness. Observation of videos to understand non-verbal behaviour is time-consuming and limits the potential to incorporate detailed and accurate feedback of this behaviour in practical applications such as communication skills training or performance evaluation. The aim of the current research is twofold: (1) to investigate whether off-the-shelf emotion recognition technology can detect social signals in media interviews and (2) to identify which combinations of social signals are most promising for evaluating trainees’ performance in a media interview. To investigate this, non-verbal signals were automatically recognised from practice on-camera media interviews conducted within a media training setting with a sample size of 34. Automated non-verbal signal detection consists of multimodal features including facial expression, hand gestures, vocal behaviour and ‘honest’ signals. The on-camera interviews were categorised into effective and poor communication exemplars based on communication skills ratings provided by trainers and neutral observers which served as a ground truth. A correlation-based feature selection method was used to select signals associated with performance. To assess the accuracy of the selected features, a number of machine learning classification techniques were used. Naive Bayes analysis produced the best results with an F-measure of 0.76 and prediction accuracy of 78%. Results revealed that a combination of body movements, hand movements and facial expression are relevant for establishing communication effectiveness in the context of media interviews. The results of the current study have implications for the automatic evaluation of media interviews with a number of potential application areas including enhancing communication training including current media skills training.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Xiaodong Liu ◽  
Songyang Li ◽  
Miao Wang

The context, such as scenes and objects, plays an important role in video emotion recognition. The emotion recognition accuracy can be further improved when the context information is incorporated. Although previous research has considered the context information, the emotional clues contained in different images may be different, which is often ignored. To address the problem of emotion difference between different modes and different images, this paper proposes a hierarchical attention-based multimodal fusion network for video emotion recognition, which consists of a multimodal feature extraction module and a multimodal feature fusion module. The multimodal feature extraction module has three subnetworks used to extract features of facial, scene, and global images. Each subnetwork consists of two branches, where the first branch extracts the features of different modes, and the other branch generates the emotion score for each image. Features and emotion scores of all images in a modal are aggregated to generate the emotion feature of the modal. The other module takes multimodal features as input and generates the emotion score for each modal. Finally, features and emotion scores of multiple modes are aggregated, and the final emotion representation of the video will be produced. Experimental results show that our proposed method is effective on the emotion recognition dataset.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Sidra Minhas ◽  
Aasia Khanum ◽  
Atif Alvi ◽  
Farhan Riaz ◽  
Shoab A. Khan ◽  
...  

In Alzheimer’s disease (AD) progression, it is imperative to identify the subjects with mild cognitive impairment before clinical symptoms of AD appear. This work proposes a technique for decision support in identifying subjects who will show transition from mild cognitive impairment (MCI) to Alzheimer’s disease (AD) in the future. We used robust predictors from multivariate MRI-derived biomarkers and neuropsychological measures and tracked their longitudinal trajectories to predict signs of AD in the MCI population. Assuming piecewise linear progression of the disease, we designed a novel weighted gradient offset-based technique to forecast the future marker value using readings from at least two previous follow-up visits. Later, the complete predictor trajectories are used as features for a standard support vector machine classifier to identify MCI-to-AD progressors amongst the MCI patients enrolled in the Alzheimer’s disease neuroimaging initiative (ADNI) cohort. We explored the performance of both unimodal and multimodal models in a 5-fold cross-validation setup. The proposed technique resulted in a high classification AUC of 91.2% and 95.7% for 6-month- and 1-year-ahead AD prediction, respectively, using multimodal markers. In the end, we discuss the efficacy of MRI markers as compared to NM for MCI-to-AD conversion prediction.


Sign in / Sign up

Export Citation Format

Share Document