Text Segmentation and Recognition for Enhanced Image Spam Detection

2021 ◽  
Author(s):  
Mallikka Rajalingam
2017 ◽  
Vol 77 (11) ◽  
pp. 13249-13278 ◽  
Author(s):  
Amiza Amir ◽  
Bala Srinivasan ◽  
Asad I. Khan

Spam features represent the unique and special characteristics associated with spam, which are further used to differentiate them from other genuine messages. Each message m is processed by a feature extraction module to represent m in terms of n dimensional feature vector x = (x1, x2, …, xn) containing n features. This feature vector consists of many such features extracted from spam. In case of text based spam filters, a feature can be a word and a feature vector may be composed of various words extracted from spam. Each spam is associated with one feature vector. Based on the characteristics discussed in previous chapter, we will try to extract different features capturing those unique characteristics from image spam, in order to build the robust spam detection algorithms further. These features are broadly classified into high level metadata features, low level image features like color features, grayscale features, texture related features and embedded text related features.


In order to understand the never-ending fights between developers of anti-spam detection techniques and the spammers; it is important to have an insight of the history of spam mails. On May 3, 1978, Gary Thuerk, a marketing manager at Digital Equipment Corporation sent his first mass email to more than 400 customers over the Arpanet in order to promote and sell Digital's new T-Series of VAX systems (Streitfeld, 2003). In this regard, he said, “It's too much work to send everyone an e-mail. So we'll send one e-mail to everyone”. He said with pride, “I was the pioneer. I saw a new way of doing things.” As every coin has two sides, any technology too can be utilized for good and bad intention. At that time, Gary Thuerk would have never dreamt of this method of sending mails to emerge as an area of research in future. Gary Thuerk ended up getting crowned as the father of spam mails instead of the father of e-marketing. In the present scenario, the internet receives 2.5 billion pieces of spam a day by spiritual followers of Thuerk.


2019 ◽  
Vol 8 (3) ◽  
pp. 5892-5896

In belonging to other supports duel beside researchers of image spam detections, unsolicited mail have newly developed the image based spam dodge to construct the investigation of e-mails’ content of text unsuccessful. To avoid signature based recognition, it involves in implanting the unsolicited text or message into an appendage image, which is frequently arbitrarily customized. Identifying image based spam emails tries out to be an motivating illustration of the problem text embedded in images were subjected to noise such as background pattern, color, font variations and imperfections in a font size so as to eliminate the chances of being identified as unsolicited e-mail by classification techniques. In this research paper we spring a exhaustive review and categorization of machine learning and classification systems suggested so far in contradiction of image based spam email, and make an empirical investigation and correlation of few of them on real, widely accessible data sets.


2021 ◽  
pp. 1036-1045
Author(s):  
Ahmad M. Salih ◽  
Ban N. Nadim

E-mail is an efficient and reliable data exchange service. Spams are undesired e-mail messages which are randomly sent in bulk usually for commercial aims. Obfuscated image spamming is one of the new tricks to bypass text-based and Optical Character Recognition (OCR)-based spam filters. Image spam detection based on image visual features has the advantage of efficiency in terms of reducing the computational cost and improving the performance. In this paper, an image spam detection schema is presented. Suitable image processing techniques were used to capture the image features that can differentiate spam images from non-spam ones. Weighted k-nearest neighbor, which is a simple, yet powerful, machine learning algorithm, was used as a classifier. The results confirm the effectiveness of the proposed schema as it is evaluated over two datasets. The first dataset is a real and benchmark dataset while the other is a real-like, modern, and more challenging dataset collected from social media and many public available image spam datasets. The obtained accuracy was 99.36% and 91% on benchmark and the proposed dataset, respectively.


Sign in / Sign up

Export Citation Format

Share Document