scholarly journals Spam-Detection with Comparative Analysis and Spamming Words Extractions

Author(s):  
Md Khairul Islam ◽  
Md Al Amin ◽  
Md Rakibul Islam ◽  
Md Nosin Ibna Mahbub ◽  
Md Imran Hossain Showrov ◽  
...  

Communication through email plays an essential part especially in every sector of our day-to-day life. Considering its significance, it is important to filter spam emails from emails. Spam email, also known as junk email, is unwanted messages that are sent by the electronic medium in large quantities. Most of the spam emails are commercial in nature that is not only irritating but also harmful due to malicious scams or malware-hosting sites or use viruses attached to the message. In this paper, we identify spam emails and expose how spam emails can be distinguished from legitimate/normal emails. We deployed four machine learning models and two deep learning models over the datasets including the combined dataset. Besides, we also try to find the important keywords that are found repeatedly from spam emails repository. This type of knowledge will enable us to detect spam emails for our personnel and community security purpose.<br>

2021 ◽  
Author(s):  
Md Khairul Islam ◽  
Md Al Amin ◽  
Md Rakibul Islam ◽  
Md Nosin Ibna Mahbub ◽  
Md Imran Hossain Showrov ◽  
...  

Communication through email plays an essential part especially in every sector of our day-to-day life. Considering its significance, it is important to filter spam emails from emails. Spam email, also known as junk email, is unwanted messages that are sent by the electronic medium in large quantities. Most of the spam emails are commercial in nature that is not only irritating but also harmful due to malicious scams or malware-hosting sites or use viruses attached to the message. In this paper, we identify spam emails and expose how spam emails can be distinguished from legitimate/normal emails. We deployed four machine learning models and two deep learning models over the datasets including the combined dataset. Besides, we also try to find the important keywords that are found repeatedly from spam emails repository. This type of knowledge will enable us to detect spam emails for our personnel and community security purpose.<br>


2021 ◽  
Author(s):  
Md Khairul Islam ◽  
Md Al Amin ◽  
Md Rakibul Islam ◽  
Md Nosin Ibna Mahbub ◽  
Md Imran Hossain Showrov ◽  
...  

Communication through email plays an essential part especially in every sector of our day-to-day life. Considering its significance, it is important to filter spam emails from emails. Spam email, also known as junk email, is unwanted messages that are sent by the electronic medium in large quantities. Most of the spam emails are commercial in nature that is not only irritating but also harmful due to malicious scams or malware-hosting sites or use viruses attached to the message. In this paper, we identify spam emails and expose how spam emails can be distinguished from legitimate/normal emails. We deployed four machine learning models and two deep learning models over the datasets including the combined dataset. Besides, we also try to find the important keywords that are found repeatedly from spam emails repository. This type of knowledge will enable us to detect spam emails for our personnel and community security purpose.<br>


2021 ◽  
Vol 11 (5) ◽  
pp. 2164
Author(s):  
Jiaxin Li ◽  
Zhaoxin Zhang ◽  
Changyong Guo

X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites and malware are good examples. Those X.509 certificates found in phishing sites or malware are called malicious X.509 certificates. This paper applies different machine learning models, including classical machine learning models, ensemble learning models, and deep learning models, to distinguish between malicious certificates and benign certificates with Verification for Extraction (VFE). The VFE is a system we design and implement for obtaining plentiful characteristics of certificates. The result shows that ensemble learning models are the most stable and efficient models with an average accuracy of 95.9%, which outperforms many previous works. In addition, we obtain an SVM-based detection model with an accuracy of 98.2%, which is the highest accuracy. The outcome indicates the VFE is capable of capturing essential and crucial characteristics of malicious X.509 certificates.


2019 ◽  
Author(s):  
Mojtaba Haghighatlari ◽  
Gaurav Vishwakarma ◽  
Mohammad Atif Faiz Afzal ◽  
Johannes Hachmann

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>


Author(s):  
Noé Sturm ◽  
Jiangming Sun ◽  
Yves Vandriessche ◽  
Andreas Mayr ◽  
Günter Klambauer ◽  
...  

<div>This article describes an application of high-throughput fingerprints (HTSFP) built upon industrial data accumulated over the years. </div><div>The fingerprint was used to build machine learning models (multi-task deep learning + SVM) for compound activity predictions towards a panel of 131 targets. </div><div>Quality of the predictions and the scaffold hopping potential of the HTSFP were systematically compared to traditional structural descriptors ECFP. </div><div><br></div>


2021 ◽  
Vol 23 (2) ◽  
pp. 359-370
Author(s):  
Michał Matuszczak ◽  
Mateusz Żbikowski ◽  
Andrzej Teodorczyk

The article proposes an approach based on deep and machine learning models to predict a component failure as an enhancement of condition based maintenance scheme of a turbofan engine and reviews currently used prognostics approaches in the aviation industry. Component degradation scale representing its life consumption is proposed and such collected condition data are combined with engines sensors and environmental data. With use of data manipulation techniques, a framework for models training is created and models' hyperparameters obtained through Bayesian optimization. Models predict the continuous variable representing condition based on the input. Best performed model is identified by detemining its score on the holdout set. Deep learning models achieved 0.71 MSE score (ensemble meta-model of neural networks) and outperformed significantly machine learning models with their best score at 1.75. The deep learning models shown their feasibility to predict the component condition within less than 1 unit of the error in the rank scale.


Sign in / Sign up

Export Citation Format

Share Document