scholarly journals MMPC-RF: A Deep Multimodal Feature-Level Fusion Architecture for Hybrid Spam E-mail Detection

2021 ◽  
Vol 11 (24) ◽  
pp. 11968
Author(s):  
Ghizlane Hnini ◽  
Jamal Riffi ◽  
Mohamed Adnane Mahraz ◽  
Ali Yahyaouy ◽  
Hamid Tairi

Hybrid spam is an undesirable e-mail (electronic mail) that contains both image and text parts. It is more harmful and complex as compared to image-based and text-based spam e-mail. Thus, an efficient and intelligent approach is required to distinguish between spam and ham. To our knowledge, a small number of studies have been aimed at detecting hybrid spam e-mails. Most of these multimodal architectures adopted the decision-level fusion method, whereby the classification scores of each modality were concatenated and fed to another classification model to make a final decision. Unfortunately, this method not only demands many learning steps, but it also loses correlation in mixed feature space. In this paper, we propose a deep multimodal feature-level fusion architecture that concatenates two embedding vectors to have a strong representation of e-mails and increase the performance of the classification. The paragraph vector distributed bag of words (PV-DBOW) and the convolutional neural network (CNN) were used as feature extraction techniques for text and image parts, respectively, of the same e-mail. The extracted feature vectors were concatenated and fed to the random forest (RF) model to classify a hybrid e-mail as either spam or ham. The experiments were conducted on three hybrid datasets made using three publicly available corpora: Enron, Dredze, and TREC 2007. According to the obtained results, the proposed model provides a higher accuracy of 99.16% compared to recent state-of-the-art methods.

2016 ◽  
Vol 2016 ◽  
pp. 1-13 ◽  
Author(s):  
Tao Zhou ◽  
Huiling Lu ◽  
Junjie Zhang ◽  
Hongbin Shi

In order to improve the detection accuracy of pulmonary nodules in CT image, considering two problems of pulmonary nodules detection model, including unreasonable feature structure and nontightness of feature representation, a pulmonary nodules detection algorithm is proposed based on SVM and CT image feature-level fusion with rough sets. Firstly, CT images of pulmonary nodule are analyzed, and 42-dimensional feature components are extracted, including six new 3-dimensional features proposed by this paper and others 2-dimensional and 3-dimensional features. Secondly, these features are reduced for five times with rough set based on feature-level fusion. Thirdly, a grid optimization model is used to optimize the kernel function of support vector machine (SVM), which is used as a classifier to identify pulmonary nodules. Finally, lung CT images of 70 patients with pulmonary nodules are collected as the original samples, which are used to verify the effectiveness and stability of the proposed model by four groups’ comparative experiments. The experimental results show that the effectiveness and stability of the proposed model based on rough set feature-level fusion are improved in some degrees.


2019 ◽  
Vol 8 (3) ◽  
pp. 2761-2767

Iris recognition system has gained prominent focus because of its uniqueness, stability over time. But the recognition level of single biometric based recognition systems is greatly affected by environmental conditions, physiological deficiency. Multi-biometric systems diminish this problem with the fusion of features collected from various traits or samples of the same trait, a single trait by employing multiple algorithms or multiple instances. To gain the advantages of multi-biometric systems in iris recognition, a Multi-algorithmic iris recognition system has been proposed where Texture features from iris are extracted by using 2D-Log Gabor filter and Phase features are extracted by Haar Wavelet; And these features can be integrated at various levels like Decision, Rank, Score, feature, and pixel. Even though the feature level fusion contains rich information about biometric samples when compared to remaining fusion levels; it involves mapping complexity, high dimensional feature space. To gain advantage of feature level fusion in iris recognition and to overcome the problem of resulted high dimensional feature space, Genetic Algorithm (GA) based reduction scheme, Principal Component Analysis (PCA) reduction strategy and a hybrid reduction scheme which is a combination of PCA and GA have been applied to reduce the resulted feature space. The performance of these reduction strategies have evaluated on CASIA iris database, IIT Delhi iris database using Machine Learning approaches. The results have shown that the feature space has dramatically reduced while keeping recognition accuracy and also revealed that space and time requirements have significantly decreased after employing feature reduction schemes.


Symmetry ◽  
2020 ◽  
Vol 12 (5) ◽  
pp. 867
Author(s):  
Fen Liu ◽  
Yuxuan Liu ◽  
Hongqiang Sang

Various defects are formed on the workpiece surface during the production process. Workpiece surface defects are classified according to various characteristics, which includes a bumped surface, scratched surface and pit surface. Suppliers analyze the cause of workpiece surface defects through the defect types and thus determines the subsequent processing. Therefore, the correct classification is essential regarding workpiece surface defects. In this paper, a multi-classifier decision-level fusion classification model for workpiece surface defects based on a convolutional neural network (CNN) was proposed. In the proposed model, the histogram of oriented gradient (HOG) was used to extract the features of the second fully connected layer of the CNN, and the features of the HOG were further extracted by using the local binary patterns (LBP), which was called the HOG–LBP feature extraction. Finally, this paper designed a symmetry ensemble classifier, which was used to classify the features of the last fully connected layer of the CNN and the features of the HOG–LBP. The comprehensive decision was made by fusing the classification results of the symmetry structure channels. The experiments were carried out, and the results showed that the proposed model could improve the accuracy of the workpiece surface defect classification.


2021 ◽  
Vol 13 (7) ◽  
pp. 1323
Author(s):  
Yingying Kong ◽  
Biyuan Yan ◽  
Yanjuan Liu ◽  
Henry Leung ◽  
Xiangyang Peng

In terms of land cover classification, optical images have been proven to have good classification performance. Synthetic Aperture Radar (SAR) has the characteristics of working all-time and all-weather. It has more significant advantages over optical images for the recognition of some scenes, such as water bodies. One of the current challenges is how to fuse the benefits of both to obtain more powerful classification capabilities. This study proposes a classification model based on random forest with the conditional random fields (CRF) for feature-level fusion classification using features extracted from polarized SAR and optical images. In this paper, feature importance is introduced as a weight in the pairwise potential function of the CRF to improve the correction rate of misclassified points. The results show that the dataset combining the two provides significant improvements in feature identification when compared to the dataset using optical or polarized SAR image features alone. Among the four classification models used, the random forest-importance_ conditional random fields (RF-Im_CRF) model developed in this paper obtained the best overall accuracy (OA) and Kappa coefficient, validating the effectiveness of the method.


Computers ◽  
2021 ◽  
Vol 10 (2) ◽  
pp. 21
Author(s):  
Mehwish Leghari ◽  
Shahzad Memon ◽  
Lachhman Das Dhomeja ◽  
Akhtar Hussain Jalbani ◽  
Asghar Ali Chandio

The extensive research in the field of multimodal biometrics by the research community and the advent of modern technology has compelled the use of multimodal biometrics in real life applications. Biometric systems that are based on a single modality have many constraints like noise, less universality, intra class variations and spoof attacks. On the other hand, multimodal biometric systems are gaining greater attention because of their high accuracy, increased reliability and enhanced security. This research paper proposes and develops a Convolutional Neural Network (CNN) based model for the feature level fusion of fingerprint and online signature. Two types of feature level fusion schemes for the fingerprint and online signature have been implemented in this paper. The first scheme named early fusion combines the features of fingerprints and online signatures before the fully connected layers, while the second fusion scheme named late fusion combines the features after fully connected layers. To train and test the proposed model, a new multimodal dataset consisting of 1400 samples of fingerprints and 1400 samples of online signatures from 280 subjects was collected. To train the proposed model more effectively, the size of the training data was further increased using augmentation techniques. The experimental results show an accuracy of 99.10% achieved with early feature fusion scheme, while 98.35% was achieved with late feature fusion scheme.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Jianxiang Wei ◽  
Guanzhong Feng ◽  
Zhiqiang Lu ◽  
Pu Han ◽  
Yunxia Zhu ◽  
...  

Adverse drug reactions (ADRs) pose health threats to humans. Therefore, the risk re-evaluation of post-marketing drugs has become an important part of the pharmacovigilance work of various countries. In China, drugs are mainly divided into three categories, from high-risk to low-risk drugs, namely, prescription drugs (Rx), over-the-counter drugs A (OTC-A), and over-the-counter drugs B (OTC-B). Until now, there has been a lack of automated evaluation methods for the three status switch of drugs. Based on China Food and Drug Administration’s (CFDA) spontaneous reporting database (CSRD), we proposed a classification model to predict risk level of drugs by using feature enhancement based on Generative Adversarial Networks (GAN) and Synthetic Minority Over-Sampling Technique (SMOTE). A total of 985,960 spontaneous reports from 2011 to 2018 were selected from CSRD in Jiangsu Province as experimental data. After data preprocessing, a class-imbalance data set was obtained, which contained 887 Rx (accounting for 84.72%), 113 OTC-A (10.79%), and 47 OTC-B (4.49%). Taking drugs as the samples, ADRs as the features, and signal detection results obtained by proportional reporting ratio (PRR) method as the feature values, we constructed the original data matrix, where the last column represents the category label of each drug. Our proposed model expands the ADR data from both the sample space and the feature space. In terms of feature space, we use feature selection (FS) to screen ADR symptoms with higher importance scores. Then, we use GAN to generate artificial data, which are added to the feature space to achieve feature enhancement. In terms of sample space, we use SMOTE technology to expand the minority samples to balance three categories of drugs and minimize the classification deviation caused by the gap in the sample size. Finally, we use random forest (RF) algorithm to classify the feature-enhanced and balanced data set. The experimental results show that the accuracy of the proposed classification model reaches 98%. Our proposed model can well evaluate drug risk levels and provide automated methods for status switch of post-marketing drugs.


2020 ◽  
Vol 23 (4) ◽  
pp. 274-284 ◽  
Author(s):  
Jingang Che ◽  
Lei Chen ◽  
Zi-Han Guo ◽  
Shuaiqun Wang ◽  
Aorigele

Background: Identification of drug-target interaction is essential in drug discovery. It is beneficial to predict unexpected therapeutic or adverse side effects of drugs. To date, several computational methods have been proposed to predict drug-target interactions because they are prompt and low-cost compared with traditional wet experiments. Methods: In this study, we investigated this problem in a different way. According to KEGG, drugs were classified into several groups based on their target proteins. A multi-label classification model was presented to assign drugs into correct target groups. To make full use of the known drug properties, five networks were constructed, each of which represented drug associations in one property. A powerful network embedding method, Mashup, was adopted to extract drug features from above-mentioned networks, based on which several machine learning algorithms, including RAndom k-labELsets (RAKEL) algorithm, Label Powerset (LP) algorithm and Support Vector Machine (SVM), were used to build the classification model. Results and Conclusion: Tenfold cross-validation yielded the accuracy of 0.839, exact match of 0.816 and hamming loss of 0.037, indicating good performance of the model. The contribution of each network was also analyzed. Furthermore, the network model with multiple networks was found to be superior to the one with a single network and classic model, indicating the superiority of the proposed model.


2010 ◽  
Vol 2 (1) ◽  
pp. 28-38 ◽  
Author(s):  
K. Kannan ◽  
S. Arumuga Perumal ◽  
K. Arulmozhi

Agriculture ◽  
2021 ◽  
Vol 11 (4) ◽  
pp. 371
Author(s):  
Yu Jin ◽  
Jiawei Guo ◽  
Huichun Ye ◽  
Jinling Zhao ◽  
Wenjiang Huang ◽  
...  

The remote sensing extraction of large areas of arecanut (Areca catechu L.) planting plays an important role in investigating the distribution of arecanut planting area and the subsequent adjustment and optimization of regional planting structures. Satellite imagery has previously been used to investigate and monitor the agricultural and forestry vegetation in Hainan. However, the monitoring accuracy is affected by the cloudy and rainy climate of this region, as well as the high level of land fragmentation. In this paper, we used PlanetScope imagery at a 3 m spatial resolution over the Hainan arecanut planting area to investigate the high-precision extraction of the arecanut planting distribution based on feature space optimization. First, spectral and textural feature variables were selected to form the initial feature space, followed by the implementation of the random forest algorithm to optimize the feature space. Arecanut planting area extraction models based on the support vector machine (SVM), BP neural network (BPNN), and random forest (RF) classification algorithms were then constructed. The overall classification accuracies of the SVM, BPNN, and RF models optimized by the RF features were determined as 74.82%, 83.67%, and 88.30%, with Kappa coefficients of 0.680, 0.795, and 0.853, respectively. The RF model with optimized features exhibited the highest overall classification accuracy and kappa coefficient. The overall accuracy of the SVM, BPNN, and RF models following feature optimization was improved by 3.90%, 7.77%, and 7.45%, respectively, compared with the corresponding unoptimized classification model. The kappa coefficient also improved. The results demonstrate the ability of PlanetScope satellite imagery to extract the planting distribution of arecanut. Furthermore, the RF is proven to effectively optimize the initial feature space, composed of spectral and textural feature variables, further improving the extraction accuracy of the arecanut planting distribution. This work can act as a theoretical and technical reference for the agricultural and forestry industries.


Sign in / Sign up

Export Citation Format

Share Document