Supervised Machine Learning Classifier for Email Spam Filtering

Author(s):  
Deepika Mallampati ◽  
K. Chandra Shekar ◽  
K. Ravikanth
2020 ◽  
Author(s):  
John T. Halloran ◽  
Gregor Urban ◽  
David Rocke ◽  
Pierre Baldi

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.


Author(s):  
RajKishore Sahni

The upsurge in the volume of unwanted emails called spam has created an intense need for the development of more dependable and robust antispam filters. Machine learning methods of recent are being used to successfully detect and filter spam emails. We present a systematic review of some of the popular machine learning based email spam filtering approaches. Our review covers survey of the important concepts, attempts, efficiency, and the research trend in spam filtering. The preliminary discussion in the study background examines the applications of machine learning techniques to the email spam filtering process of the leading internet service providers (ISPs) like Gmail, Yahoo and Outlook emails spam filters. Discussion on general email spam filtering process, and the various efforts by different researchers in combating spam through the use machine learning techniques was done. Our review compares the strengths and drawbacks of existing machine learning approaches and the open research problems in spam filtering. We recommended deep learning and deep adversarial learning as the future techniques that can effectively handle the menace of spam emails


Author(s):  
Wasan Shaker Awad ◽  
Wafa M. Rafiq

Email is the most popular choice of communication due to its low-cost and easy accessibility, which makes email spam a major issue. Emails can be incorrectly marked by a spam filter and legitimate emails can get lost in the spam folder or the spam emails can deluge the users' inboxes. Therefore, various methods based on statistics and machine learning have been developed to classify emails accurately. In this chapter, the existing spam filtering methods were studied comprehensively, and a spam email classifier based on the genetic algorithm was proposed. The proposed algorithm was successful in achieving high accuracy by reducing the rate of false positives, but at the same time, it also maintained an acceptable rate of false negatives. The proposed algorithm was tested on 2000 emails from the two popular spam datasets, Enron and LingSpam, and the accuracy was found to be nearly 90%. The results showed that the genetic algorithm is an effective method for spam classification and with further enhancements that will provide a more robust spam filter.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Marcus Alvarez ◽  
Elior Rahmani ◽  
Brandon Jew ◽  
Kristina M. Garske ◽  
Zong Miao ◽  
...  

Heliyon ◽  
2019 ◽  
Vol 5 (6) ◽  
pp. e01802 ◽  
Author(s):  
Emmanuel Gbenga Dada ◽  
Joseph Stephen Bassi ◽  
Haruna Chiroma ◽  
Shafi'i Muhammad Abdulhamid ◽  
Adebayo Olusola Adetunmbi ◽  
...  

2021 ◽  
Vol 13 (2) ◽  
pp. 971
Author(s):  
Papiya Debnath ◽  
Pankaj Chittora ◽  
Tulika Chakrabarti ◽  
Prasun Chakrabarti ◽  
Zbigniew Leonowicz ◽  
...  

Earthquakes are one of the most overwhelming types of natural hazards. As a result, successfully handling the situation they create is crucial. Due to earthquakes, many lives can be lost, alongside devastating impacts to the economy. The ability to forecast earthquakes is one of the biggest issues in geoscience. Machine learning technology can play a vital role in the field of geoscience for forecasting earthquakes. We aim to develop a method for forecasting the magnitude range of earthquakes using machine learning classifier algorithms. Three different ranges have been categorized: fatal earthquake; moderate earthquake; and mild earthquake. In order to distinguish between these classifications, seven different machine learning classifier algorithms have been used for building the model. To train the model, six different datasets of India and regions nearby to India have been used. The Bayes Net, Random Tree, Simple Logistic, Random Forest, Logistic Model Tree (LMT), ZeroR and Logistic Regression algorithms have been applied to each dataset. All of the models have been developed using the Weka tool and the results have been noted. It was observed that Simple Logistic and LMT classifiers performed well in each case.


2020 ◽  
Author(s):  
Ahmed M. Moustafa ◽  
Paul J. Planet

AbstractBackgroundDiscrete classification of SARS-CoV-2 viral genotypes can identify emerging strains and detect geographic spread, viral diversity, and transmission events.MethodsWe developed a tool (GNUVID) that integrates whole genome multilocus sequence typing and a supervised machine learning random forest-based classifier. We used GNUVID to assign sequence type (ST) profiles to each of 69,686 SARS-CoV-2 complete, high-quality genomes available from GISAID as of October 20th 2020. STs were then clustered into clonal complexes (CCs), and then used to train a machine learning classifier. We used this tool to detect potential introduction and exportation events, and to estimate effective viral diversity across locations and over time in 16 US states.ResultsGNUVID is a scalable tool for viral genotype classification (available at https://github.com/ahmedmagds/GNUVID) that can be used to quickly process tens of thousands of genomes. Our genotyping ST/CC analysis uncovered dynamic local changes in ST/CC prevalence and diversity with multiple replacement events in different states. We detected an average of 20.6 putative introductions and 7.5 exportations for each state. Effective viral diversity dropped in all states as shelter-in-place travel-restrictions went into effect and increased as restrictions were lifted. Interestingly, our analysis showed correlation between effective diversity and the date that state-wide mask mandates were imposed.ConclusionsOur classification tool uncovered multiple introduction and exportation events, as well as waves of expansion and replacement of SARS-CoV-2 genotypes in different states. Combined with future genomic sampling the GNUVID system could be used to track circulating viral diversity and identify emerging clones and hotspots.


Author(s):  
Wataru Takabatake ◽  
Kohei Yamamoto ◽  
Kentaroh Toyoda ◽  
Tomoaki Ohtsuki ◽  
Yohei Shibata ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document