Combination of ROSVM and LR for Spam Filter

Author(s):  
Yadong Wang ◽  
Haoliang Qi ◽  
Hong Deng ◽  
Yong Han
Keyword(s):  
Author(s):  
Denis Aleksandrovich Kiryanov

The subject of this research is the development of the architecture of expert system for distributed content aggregation system, the main purpose of which is the categorization of aggregated data. The author examines the advantages and disadvantages of expert systems, toolset for development of expert systems, classification of expert systems, as well as application of expert systems for categorization of data. Special attention is given to the description of architecture of the proposed expert system, which consists of spam filter, component for determination of the main category for each type of the processed content, and components for determination of subcategories, one of which is based on the domain rules, and the other uses the methods of machine learning methods and complements the first one. The conclusion is made that expert system can be effectively applied for solution of the problems of categorization of data in the content aggregation systems. The author establishes that hybrid solutions, which combine an approach based on the use of knowledge base and rules with implementation of neural networks allow reducing the cost of the expert system. The novelty of this research lies in the proposed architecture of the system, which is easily extensible and adaptable to workloads by scaling existing modules or adding new ones. The proposed module for spam detection leans on adapting the behavioral algorithm for detecting spam in emails; the proposed module for determination of the key categories of content uses two types of algorithms: fuzzy fingerprints and Twitter topic fuzzy fingerprints that was initially applied for categorization of messages in the social network Twitter. The module that determine subcategory based on the keywords functions in interaction with the thesaurus database. The latter classifier uses the reference vector algorithm for the final determination of subcategories.


Author(s):  
Ajay Kumar Gupta

This chapter presents an overview of spam email as a serious problem in our internet world and creates a spam filter that reduces the previous weaknesses and provides better identification accuracy with less complexity. Since J48 decision tree is a widely used classification technique due to its simple structure, higher classification accuracy, and lower time complexity, it is used as a spam mail classifier here. Now, with lower complexity, it becomes difficult to get higher accuracy in the case of large number of records. In order to overcome this problem, particle swarm optimization is used here to optimize the spam base dataset, thus optimizing the decision tree model as well as reducing the time complexity. Once the records have been standardized, the decision tree is again used to check the accuracy of the classification. The chapter presents a study on various spam-related issues, various filters used, related work, and potential spam-filtering scope.


Author(s):  
Olumide Babatope LONGE ◽  
Stella Chinye CHIEMEKE ◽  
Olufade F. Williams ONIFADE ◽  
Folake Adunni LONGE
Keyword(s):  

The inefficiencies of current spam filters against fraudulent (419) mails is not unrelated to the use by spammers of good-word attacks, topic drifts, parasitic spamming, wrong categorization and recategorization of electronic mails by e-mail clients and of course the fuzzy factors of greed and gullibility on the part of the recipients who responds to fraudulent spam mail offers. In this paper, we establish that mail token manipulations remain, above any other tactics, the most potent tool used by Nigerian scammers to fool statistical spam filters. While hoping that the uncovering of these manipulative evidences will prove useful in future antispam research, our findings also sensitize spam filter developers on the need to inculcate within their antispam architecture robust modules that can deal with the identified camouflages.


2014 ◽  
Vol 513-517 ◽  
pp. 2111-2114 ◽  
Author(s):  
Zong Jie Wang ◽  
Yi Liu ◽  
Zhong Jian Wang

The co-occurrence word emphasize the word and word internal relations, so its use can improve shortage from the hypothetical of Bayesian algorithm. To build Token Dictionary, Information Gain algorithm is used to choose Tokens, and Synonyms Dictionary is used to acquire more Tokens. By large amounts of training, the matching scores of Token are counted, according to the matching rate the Tokens that is valuable are selected, and the Token Dictionary is established. The proposed method is used to E-mail classification experiment, the results show that the accuracy of spam filter has a well improvement.


2012 ◽  
Vol 9 (4) ◽  
pp. 327-335 ◽  
Author(s):  
Tiago A. Almeida ◽  
Akebo Yamakami
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document