scholarly journals Robust Multimodal Representation Learning with Evolutionary Adversarial Attention Networks

Author(s):  
Feiran Huang ◽  
Alireza Jolfaei ◽  
Ali Kashif Bashir
2021 ◽  
Author(s):  
Masaya Sato ◽  
Tamaki Kobayashi ◽  
Yoko Soroida ◽  
Takashi Tanaka ◽  
Takuma Nakatsuka ◽  
...  

Abstract Recently, multimodal representation learning for images and other information such as numbers or language has gained much attention due to the possibility of combining latent features using a single distribution. The aim of the current study was to analyze the diagnostic performance of deep multimodal representation model-based integration of tumor image, patient background, and blood biomarkers for the differentiation of liver tumors observed using B-mode ultrasonography (US). First, we applied supervised learning with a convolutional neural network (CNN) to 972 liver nodules in the training and development sets (479 benign and 493 malignant nodules), to develop a predictive model using segmented B-mode tumor images. Additionally, we also applied a deep multimodal representation model to integrate information about patient background or blood biomarkers to B-mode images. We then investigated the performance of the models in an independent test set of 108 liver nodules, including 53 benign and 55 malignant tumors. Using only the segmented B-mode images, the diagnostic accuracy and area under the curve (AUC) values were 68.52% and 0.721, respectively. As the information about patient background such as age or sex and blood biomarkers was integrated, the diagnostic performance increased in a stepwise manner. The diagnostic accuracy and AUC value of the multimodal DL model (which integrated B-mode tumor image, patient age, sex, AST, ALT, platelet count, and albumin data) reached 96.30% and 0.994, respectively. Integration of patient background and blood biomarkers in addition to US image using multimodal representation learning outperformed the CNN model using US images. We expect that the deep multimodal representation model could be a feasible and acceptable tool that can effectively support the definitive diagnosis of liver tumors using B-mode US in daily clinical practice.


Author(s):  
Nicholas Westing ◽  
Kevin C. Gross ◽  
Brett J. Borghetti ◽  
Christine M. Schubert Kabban ◽  
Jacob Martin ◽  
...  

2019 ◽  
Vol 6 (6) ◽  
pp. 10675-10685 ◽  
Author(s):  
Zhenhua Huang ◽  
Xin Xu ◽  
Juan Ni ◽  
Honghao Zhu ◽  
Cheng Wang

Author(s):  
Yi Tay ◽  
Anh Tuan Luu ◽  
Siu Cheung Hui

Co-Attentions are highly effective attention mechanisms for text matching applications. Co-Attention enables the learning of pairwise attentions, i.e., learning to attend based on computing word-level affinity scores between two documents. However, text matching problems can exist in either symmetrical or asymmetrical domains. For example, paraphrase identification is a symmetrical task while question-answer matching and entailment classification are considered asymmetrical domains. In this paper, we argue that Co-Attention models in asymmetrical domains require different treatment as opposed to symmetrical domains, i.e., a concept of word-level directionality should be incorporated while learning word-level similarity scores. Hence, the standard inner product in real space commonly adopted in co-attention is not suitable. This paper leverages attractive properties of the complex vector space and proposes a co-attention mechanism based on the complex-valued inner product (Hermitian products). Unlike the real dot product, the dot product in complex space is asymmetric because the first item is conjugated. Aside from modeling and encoding directionality, our proposed approach also enhances the representation learning process. Extensive experiments on five text matching benchmark datasets demonstrate the effectiveness of our approach. 


Sign in / Sign up

Export Citation Format

Share Document