MMPC-RF: A Deep Multimodal Feature-Level Fusion Architecture for Hybrid Spam E-mail Detection
Hybrid spam is an undesirable e-mail (electronic mail) that contains both image and text parts. It is more harmful and complex as compared to image-based and text-based spam e-mail. Thus, an efficient and intelligent approach is required to distinguish between spam and ham. To our knowledge, a small number of studies have been aimed at detecting hybrid spam e-mails. Most of these multimodal architectures adopted the decision-level fusion method, whereby the classification scores of each modality were concatenated and fed to another classification model to make a final decision. Unfortunately, this method not only demands many learning steps, but it also loses correlation in mixed feature space. In this paper, we propose a deep multimodal feature-level fusion architecture that concatenates two embedding vectors to have a strong representation of e-mails and increase the performance of the classification. The paragraph vector distributed bag of words (PV-DBOW) and the convolutional neural network (CNN) were used as feature extraction techniques for text and image parts, respectively, of the same e-mail. The extracted feature vectors were concatenated and fed to the random forest (RF) model to classify a hybrid e-mail as either spam or ham. The experiments were conducted on three hybrid datasets made using three publicly available corpora: Enron, Dredze, and TREC 2007. According to the obtained results, the proposed model provides a higher accuracy of 99.16% compared to recent state-of-the-art methods.