Unsolicited visual data is undesirable in any form. The art of hiding malicious content in images and adding them as attachments to electronic mails has become a popular nuisance. In recent years, attackers have developed various new techniques to evade traditional spam classification systems. Text-based spam classification has been in focus for a long time and, researchers have successfully created a prodigal system for identifying spam text in electronic mails using Optical Character Recognition technology. In the last decade, extensive work has been performed to tackle image spam but with unsatisfactory results. Various algorithms and data augmentation techniques are used today to develop an optimal model for image spam recognition. Many of these proposed systems come close to the ideal system but do not provide 100 percent accuracy. This paper highlights the role of three popular techniques in image spam filtering. We discuss the importance and application of Optical Character Recognition, Support Vector Machines and, Artificial Neural Networks in unsolicited visual data filtering. This paper sheds light on the algorithms of these techniques. We provide a comparison of their accuracy, which helps us draw useful insights for developing a robust unsolicited visual data classification system. This paper aims to bring clarity regarding the feasibility of using these techniques to develop an unsolicited visual data filtering system. This paper records that the most favourable results are obtained using Artificial Neural Networks.


2018 ◽  
Vol 22 (5-6) ◽  
pp. 1029-1037 ◽  
Author(s):  
Fan Aiwan ◽  
Yang Zhaofeng

Author(s):  
Вера Аркадьевна Частикова ◽  
Константин Валерьевич Козачёк

Представлен анализ основных проблем фильтрации почтового спама, современных методов фильтрации нежелательных писем и способов обхода систем защиты. Вводится понятие « легитимного спама » - новой проблемы, с которой сталкиваются пользователи электронной почты. Рассмотрены методы представления текста: bag-of-words и Embedding-пространство, а также методы классификации: искусственные нейронные сети, метод опорных векторов, наивный байесовский классификатор. В работе определены эффективные методы, построенные на анализе текста, для решения задач обнаружения различных видов спама: типичного ( известного системе ) , составленного при помощи методов обхода систем детекции спама, и легитимного. An analysis of the main problems of filtering mail spam, modern methods of filtering unwanted letters and methods of bypassing security systems is presented. The concept of “legitimate spam” is being introduced - a new problem that email users face. Methods of text presentation are considered: bag-of-words and Embedding-space, as well as classification methods: artificial neural networks, the method of reference vectors, naive Bayesian classifier. The work identifies effective methods based on text analysis, for solving the problems of detecting various types of spam: a typical (known to system), compiled using methods of bypassing spam detection systems, and legitimate.


Sign in / Sign up

Export Citation Format

Share Document