scholarly journals A Survey on Deep Learning for Multimodal Data Fusion

2020 ◽  
Vol 32 (5) ◽  
pp. 829-864 ◽  
Author(s):  
Jing Gao ◽  
Peng Li ◽  
Zhikui Chen ◽  
Jianing Zhang

With the wide deployments of heterogeneous networks, huge amounts of data with characteristics of high volume, high variety, high velocity, and high veracity are generated. These data, referred to multimodal big data, contain abundant intermodality and cross-modality information and pose vast challenges on traditional data fusion methods. In this review, we present some pioneering deep learning models to fuse these multimodal big data. With the increasing exploration of the multimodal big data, there are still some challenges to be addressed. Thus, this review presents a survey on deep learning for multimodal data fusion to provide readers, regardless of their original community, with the fundamentals of multimodal deep learning fusion method and to motivate new multimodal data fusion techniques of deep learning. Specifically, representative architectures that are widely used are summarized as fundamental to the understanding of multimodal deep learning. Then the current pioneering multimodal data fusion deep learning models are summarized. Finally, some challenges and future topics of multimodal data fusion deep learning models are described.


2021 ◽  
Author(s):  
Chems Eddine Louahem M'Sabah ◽  
Ahmed Bouziane ◽  
Youcef Ferdi


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Hu Zhu ◽  
Ze Wang ◽  
Yu Shi ◽  
Yingying Hua ◽  
Guoxia Xu ◽  
...  

Multimodal fusion is one of the popular research directions of multimodal research, and it is also an emerging research field of artificial intelligence. Multimodal fusion is aimed at taking advantage of the complementarity of heterogeneous data and providing reliable classification for the model. Multimodal data fusion is to transform data from multiple single-mode representations to a compact multimodal representation. In previous multimodal data fusion studies, most of the research in this field used multimodal representations of tensors. As the input is converted into a tensor, the dimensions and computational complexity increase exponentially. In this paper, we propose a low-rank tensor multimodal fusion method with an attention mechanism, which improves efficiency and reduces computational complexity. We evaluate our model through three multimodal fusion tasks, which are based on a public data set: CMU-MOSI, IEMOCAP, and POM. Our model achieves a good performance while flexibly capturing the global and local connections. Compared with other multimodal fusions represented by tensors, experiments show that our model can achieve better results steadily under a series of attention mechanisms.



Author(s):  
Asako Kanezaki ◽  
Ryohei Kuga ◽  
Yusuke Sugano ◽  
Yasuyuki Matsushita


Sensors ◽  
2020 ◽  
Vol 20 (23) ◽  
pp. 6856
Author(s):  
Su Mu ◽  
Meng Cui ◽  
Xiaodi Huang

Multimodal learning analytics (MMLA), which has become increasingly popular, can help provide an accurate understanding of learning processes. However, it is still unclear how multimodal data is integrated into MMLA. By following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, this paper systematically surveys 346 articles on MMLA published during the past three years. For this purpose, we first present a conceptual model for reviewing these articles from three dimensions: data types, learning indicators, and data fusion. Based on this model, we then answer the following questions: 1. What types of data and learning indicators are used in MMLA, together with their relationships; and 2. What are the classifications of the data fusion methods in MMLA. Finally, we point out the key stages in data fusion and the future research direction in MMLA. Our main findings from this review are (a) The data in MMLA are classified into digital data, physical data, physiological data, psychometric data, and environment data; (b) The learning indicators are behavior, cognition, emotion, collaboration, and engagement; (c) The relationships between multimodal data and learning indicators are one-to-one, one-to-any, and many-to-one. The complex relationships between multimodal data and learning indicators are the key for data fusion; (d) The main data fusion methods in MMLA are many-to-one, many-to-many and multiple validations among multimodal data; and (e) Multimodal data fusion can be characterized by the multimodality of data, multi-dimension of indicators, and diversity of methods.



2020 ◽  
Vol 237 ◽  
pp. 111599 ◽  
Author(s):  
Maitiniyazi Maimaitijiang ◽  
Vasit Sagan ◽  
Paheding Sidike ◽  
Sean Hartling ◽  
Flavio Esposito ◽  
...  


Electronics ◽  
2020 ◽  
Vol 9 (7) ◽  
pp. 1152
Author(s):  
Michal Bednarek ◽  
Piotr Kicki ◽  
Krzysztof Walas

The efficient multi-modal fusion of data streams from different sensors is a crucial ability that a robotic perception system should exhibit to ensure robustness against disturbances. However, as the volume and dimensionality of sensory-feedback increase it might be difficult to manually design a multimodal-data fusion system that can handle heterogeneous data. Nowadays, multi-modal machine learning is an emerging field with research focused mainly on analyzing vision and audio information. Although, from the robotics perspective, haptic sensations experienced from interaction with an environment are essential to successfully execute useful tasks. In our work, we compared four learning-based multi-modal fusion methods on three publicly available datasets containing haptic signals, images, and robots’ poses. During tests, we considered three tasks involving such data, namely grasp outcome classification, texture recognition, and—most challenging—multi-label classification of haptic adjectives based on haptic and visual data. Conducted experiments were focused not only on the verification of the performance of each method but mainly on their robustness against data degradation. We focused on this aspect of multi-modal fusion, as it was rarely considered in the research papers, and such degradation of sensory feedback might occur during robot interaction with its environment. Additionally, we verified the usefulness of data augmentation to increase the robustness of the aforementioned data fusion methods.



2019 ◽  
Vol 36 (2) ◽  
pp. 36-48 ◽  
Author(s):  
Nikolaos Bakalos ◽  
Athanasios Voulodimos ◽  
Nikolaos Doulamis ◽  
Anastasios Doulamis ◽  
Avi Ostfeld ◽  
...  


Author(s):  
Nida Sae Jong ◽  
Alba Garcia Seco de Herrera ◽  
Pornchai Phukpattaranont


Author(s):  
Wen Qi ◽  
Hang Su ◽  
Ke Fan ◽  
Ziyang Chen ◽  
Jiehao Li ◽  
...  

The generous application of robot-assisted minimally invasive surgery (RAMIS) promotes human-machine interaction (HMI). Identifying various behaviors of doctors can enhance the RAMIS procedure for the redundant robot. It bridges intelligent robot control and activity recognition strategies in the operating room, including hand gestures and human activities. In this paper, to enhance identification in a dynamic situation, we propose a multimodal data fusion framework to provide multiple information for accuracy enhancement. Firstly, a multi-sensors based hardware structure is designed to capture varied data from various devices, including depth camera and smartphone. Furthermore, in different surgical tasks, the robot control mechanism can shift automatically. The experimental results evaluate the efficiency of developing the multimodal framework for RAMIS by comparing it with a single sensor system. Implementing the KUKA LWR4+ in a surgical robot environment indicates that the surgical robot systems can work with medical staff in the future.



Sign in / Sign up

Export Citation Format

Share Document