Spam-Detection with Comparative Analysis and Spamming Words Extractions

10.36227/techrxiv.16832320.v2 ◽

2021 ◽

Author(s):

Md Khairul Islam ◽

Md Al Amin ◽

Md Rakibul Islam ◽

Md Nosin Ibna Mahbub ◽

Md Imran Hossain Showrov ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Comparative Analysis ◽

Spam Detection ◽

Learning Models ◽

Electronic Medium ◽

Machine Learning Models

Communication through email plays an essential part especially in every sector of our day-to-day life. Considering its significance, it is important to filter spam emails from emails. Spam email, also known as junk email, is unwanted messages that are sent by the electronic medium in large quantities. Most of the spam emails are commercial in nature that is not only irritating but also harmful due to malicious scams or malware-hosting sites or use viruses attached to the message. In this paper, we identify spam emails and expose how spam emails can be distinguished from legitimate/normal emails. We deployed four machine learning models and two deep learning models over the datasets including the combined dataset. Besides, we also try to find the important keywords that are found repeatedly from spam emails repository. This type of knowledge will enable us to detect spam emails for our personnel and community security purpose.<br>

Download Full-text

Spam-Detection with Comparative Analysis and Spamming Words Extractions

10.36227/techrxiv.16832320 ◽

2021 ◽

Author(s):

Md Khairul Islam ◽

Md Al Amin ◽

Md Rakibul Islam ◽

Md Nosin Ibna Mahbub ◽

Md Imran Hossain Showrov ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Comparative Analysis ◽

Spam Detection ◽

Learning Models ◽

Electronic Medium ◽

Machine Learning Models

Communication through email plays an essential part especially in every sector of our day-to-day life. Considering its significance, it is important to filter spam emails from emails. Spam email, also known as junk email, is unwanted messages that are sent by the electronic medium in large quantities. Most of the spam emails are commercial in nature that is not only irritating but also harmful due to malicious scams or malware-hosting sites or use viruses attached to the message. In this paper, we identify spam emails and expose how spam emails can be distinguished from legitimate/normal emails. We deployed four machine learning models and two deep learning models over the datasets including the combined dataset. Besides, we also try to find the important keywords that are found repeatedly from spam emails repository. This type of knowledge will enable us to detect spam emails for our personnel and community security purpose.<br>

Download Full-text

A Comparative Analysis of Machine Learning Models for Prediction of Passing Bachelor Admission Test in Life-Science Faculty of a Public University in Bangladesh

2020 IEEE Electric Power and Energy Conference (EPEC) ◽

10.1109/epec48502.2020.9320119 ◽

2020 ◽

Author(s):

Md. Abul Ala Walid ◽

S.M. Masum Ahmed ◽

S M Shibly Sadique

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Life Science ◽

Public University ◽

Learning Models ◽

Science Faculty ◽

Admission Test ◽

Machine Learning Models

Download Full-text

A Comparative Analysis of Machine Learning Models developed from Homomorphic Encryption based RSA and Paillier algorithm

2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS) ◽

10.1109/iciccs51141.2021.9432348 ◽

2021 ◽

Author(s):

Harsh J. Kiratsata ◽

Mahesh Panchal

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Homomorphic Encryption ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Machine Learning-Based Malicious X.509 Certificates’ Detection

Applied Sciences ◽

10.3390/app11052164 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2164

Author(s):

Jiaxin Li ◽

Zhaoxin Zhang ◽

Changyong Guo

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Ensemble Learning ◽

Traffic Analysis ◽

Learning Models ◽

Detection Model ◽

Analysis Tools ◽

Average Accuracy ◽

Machine Learning Models

X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites and malware are good examples. Those X.509 certificates found in phishing sites or malware are called malicious X.509 certificates. This paper applies different machine learning models, including classical machine learning models, ensemble learning models, and deep learning models, to distinguish between malicious certificates and benign certificates with Verification for Extraction (VFE). The VFE is a system we design and implement for obtaining plentiful characteristics of certificates. The result shows that ensemble learning models are the most stable and efficient models with an average accuracy of 95.9%, which outperforms many previous works. In addition, we obtain an SVM-based detection model with an accuracy of 98.2%, which is the highest accuracy. The outcome indicates the VFE is capable of capturing essential and crucial characteristics of malicious X.509 certificates.

Download Full-text

Comparative analysis of machine learning models when solving problem of detecting abnormal errors in parameters measurements of pipeline systems’ operational modes

Automation Telemechanization and Communication in Oil Industry ◽

10.33285/0132-2222-2021-1(570)-55-60 ◽

2021 ◽

pp. 55-60

Author(s):

E.A. Golubyatnikov ◽

◽

D.G. Leonov ◽

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Learning Models ◽

Pipeline Systems ◽

Operational Modes ◽

Machine Learning Models

Download Full-text

A Physics-Infused Deep Learning Model for the Prediction of Refractive Indices and Its Use for the Large-Scale Screening of Organic Compound Space

10.26434/chemrxiv.8796950 ◽

2019 ◽

Author(s):

Mojtaba Haghighatlari ◽

Gaurav Vishwakarma ◽

Mohammad Atif Faiz Afzal ◽

Johannes Hachmann

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Organic Molecules ◽

Learning Model ◽

Training Data ◽

Refractive Indices ◽

Learning Models ◽

Deep Learning Model ◽

Machine Learning Models

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>

Download Full-text

Application of Bioactivity Profile Based Fingerprints for Building Machine Learning Models

10.26434/chemrxiv.6969584 ◽

2018 ◽

Cited By ~ 1

Author(s):

Noé Sturm ◽

Jiangming Sun ◽

Yves Vandriessche ◽

Andreas Mayr ◽

Günter Klambauer ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

High Throughput ◽

Scaffold Hopping ◽

Learning Models ◽

Industrial Data ◽

Structural Descriptors ◽

Bioactivity Profile ◽

Machine Learning Models

<div>This article describes an application of high-throughput fingerprints (HTSFP) built upon industrial data accumulated over the years. </div><div>The fingerprint was used to build machine learning models (multi-task deep learning + SVM) for compound activity predictions towards a panel of 131 targets. </div><div>Quality of the predictions and the scaffold hopping potential of the HTSFP were systematically compared to traditional structural descriptors ECFP. </div><div><br></div>

Download Full-text

Predictive modelling of turbofan engine components condition using machine and deep learning methods

Eksploatacja i Niezawodnosc - Maintenance and Reliability ◽

10.17531/ein.2021.2.16 ◽

2021 ◽

Vol 23 (2) ◽

pp. 359-370

Author(s):

Michał Matuszczak ◽

Mateusz Żbikowski ◽

Andrzej Teodorczyk

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Predictive Modelling ◽

Continuous Variable ◽

Environmental Data ◽

Bayesian Optimization ◽

Aviation Industry ◽

Learning Models ◽

Turbofan Engine ◽

Machine Learning Models

The article proposes an approach based on deep and machine learning models to predict a component failure as an enhancement of condition based maintenance scheme of a turbofan engine and reviews currently used prognostics approaches in the aviation industry. Component degradation scale representing its life consumption is proposed and such collected condition data are combined with engines sensors and environmental data. With use of data manipulation techniques, a framework for models training is created and models' hyperparameters obtained through Bayesian optimization. Models predict the continuous variable representing condition based on the input. Best performed model is identified by detemining its score on the holdout set. Deep learning models achieved 0.71 MSE score (ensemble meta-model of neural networks) and outperformed significantly machine learning models with their best score at 1.75. The deep learning models shown their feasibility to predict the component condition within less than 1 unit of the error in the rank scale.

Download Full-text

Comparative Analysis of Machine Learning Models for the Prediction of Pedestrian Crash Severity: Focused on Balancing Pedestrian Crash Dataset

Journal of Korean Society for Geospatial Information System ◽

10.7319/kogsis.2021.29.2.003 ◽

2021 ◽

Vol 29 (2) ◽

pp. 3-15

Author(s):

Hojun Lee ◽

Sugie Lee

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Crash Severity ◽

Learning Models ◽

Machine Learning Models

Download Full-text