Boosting Accuracy of Classical Machine Learning Antispam Classifiers in Real Scenarios by Applying Rough Set Theory

Nowadays, spam deliveries represent a major problem to benefit from the wide range of Internet-based communication forms. Despite the existence of different well-known intelligent techniques for fighting spam, only some specific implementations of Naïve Bayes algorithm are finally used in real environments for performance reasons. As long as some of these algorithms suffer from a large number of false positive errors, in this work we propose a rough set postprocessing approach able to significantly improve their accuracy. In order to demonstrate the advantages of the proposed method, we carried out a straightforward study based on a publicly available standard corpus (SpamAssassin), which compares the performance of previously successful well-known antispam classifiers (i.e., Support Vector Machines, AdaBoost, Flexible Bayes, and Naïve Bayes) with and without the application of our developed technique. Results clearly evidence the suitability of our rough set postprocessing approach for increasing the accuracy of previous successful antispam classifiers when working in real scenarios.

Download Full-text

Combination with Machine Learning Algorithms for the Classification in E-Bussiness

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.230-232.625 ◽

2011 ◽

Vol 230-232 ◽

pp. 625-628

Author(s):

Lei Shi ◽

Xin Ming Ma ◽

Xiao Hong Hu

Keyword(s):

Machine Learning ◽

Support Vector Machines ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Machine Learning Algorithms ◽

Classification Model ◽

Support Vector ◽

Mathematical Tool ◽

Vector Machines

E-bussiness has grown rapidly in the last decade and massive amount of data on customer purchases, browsing pattern and preferences has been generated. Classification of electronic data plays a pivotal role to mine the valuable information and thus has become one of the most important applications of E-bussiness. Support Vector Machines are popular and powerful machine learning techniques, and they offer state-of-the-art performance. Rough set theory is a formal mathematical tool to deal with incomplete or imprecise information and one of its important applications is feature selection. In this paper, rough set theory and support vector machines are combined to construct a classification model to classify the data of E-bussiness effectively.

Download Full-text

A Semantic Scattering model for the automatic interpretation of English genitives

Natural Language Engineering ◽

10.1017/s1351324908004798 ◽

2009 ◽

Vol 15 (2) ◽

pp. 215-239 ◽

Cited By ~ 1

Author(s):

ADRIANA BADULESCU ◽

DAN MOLDOVAN

Keyword(s):

Support Vector Machines ◽

Decision Trees ◽

Naive Bayes ◽

Word Sense Disambiguation ◽

Naïve Bayes ◽

Semantic Relations ◽

Support Vector ◽

Word Sense ◽

Vector Machines ◽

Bayes Algorithm

AbstractAn important problem in knowledge discovery from text is the automatic extraction of semantic relations. This paper addresses the automatic classification of thesemantic relationsexpressed by English genitives. A learning model is introduced based on the statistical analysis of the distribution of genitives' semantic relations in a corpus. The semantic and contextual features of the genitive's noun phrase constituents play a key role in the identification of the semantic relation. The algorithm was trained and tested on a corpus of approximately 20,000 sentences and achieved an f-measure of 79.80 per cent for of-genitives, far better than the 40.60 per cent obtained using a Decision Trees algorithm, the 50.55 per cent obtained using a Naive Bayes algorithm, or the 72.13 per cent obtained using a Support Vector Machines algorithm on the same corpus using the same features. The results were similar for s-genitives: 78.45 per cent using Semantic Scattering, 47.00 per cent using Decision Trees, 43.70 per cent using Naive Bayes, and 70.32 per cent using a Support Vector Machines algorithm. The results demonstrate the importance of word sense disambiguation and semantic generalization/specialization for this task. They also demonstrate that different patterns (in our case the two types of genitive constructions) encode different semantic information and should be treated differently in the sense that different models should be built for different patterns.

Download Full-text

Effectiveness of Classification Methods on the Diabetes System

Asian Journal of Research in Computer Science ◽

10.9734/ajrcos/2021/v12i330287 ◽

2021 ◽

pp. 33-43

Author(s):

Ahmed T. Shawky ◽

Ismail M. Hagag

Keyword(s):

Machine Learning ◽

Naive Bayes ◽

Early Stage ◽

Research Paper ◽

Naïve Bayes ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbor ◽

Machine Learning Classification ◽

Bayes Algorithm

In today’s world using data mining and classification is considered to be one of the most important techniques, as today’s world is full of data that is generated by various sources. However, extracting useful knowledge out of this data is the real challenge, and this paper conquers this challenge by using machine learning algorithms to use data for classifiers to draw meaningful results. The aim of this research paper is to design a model to detect diabetes in patients with high accuracy. Therefore, this research paper using five different algorithms for different machine learning classification includes, Decision Tree, Support Vector Machine (SVM), Random Forest, Naive Bayes, and K- Nearest Neighbor (K-NN), the purpose of this approach is to predict diabetes at an early stage. Finally, we have compared the performance of these algorithms, concluding that K-NN algorithm is a better accuracy (81.16%), followed by the Naive Bayes algorithm (76.06%).

Download Full-text

Prediction of Heart Disease using Machine Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1081.0982s1019 ◽

2019 ◽

Vol 8 (2S10) ◽

pp. 474-477

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Support Vector Machines ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Data Set ◽

Vector Machines ◽

Naive Bayes Classification ◽

Naïve Bayes Classification

Machine learning is one of the fast growing aspect in current world. Machine learning (ML) and Artificial Neural Network (ANN) are helpful in detection and diagnosis of various heart diseases. Naïve Bayes Classification is a vital approach of classification in machine learning. The heart disease consists of set of range disorders affecting the heart. It includes blood vessel problems such as irregular heart beat issues, weak heart muscles, congenital heart defects, cardio vascular disease and coronary artery disease. Coronary heart disorder is a familiar type of heart disease. It reduces the blood flow to the heart leading to a heart attack. In this paper the UCI machine learning repository data set consisting of patients suffering from heart disease is analyzed using Naïve Bayes classification and support vector machines. The classification accuracy of the patients suffering from heart disease is predicted using Naïve Bayes classification and support vector machines. Implementation is done using R language.

Download Full-text

A Rough Set and Cellular Genetic Fusion Algorithm for Acute Critical Disease Prediction

International Journal of Computers Communications & Control ◽

10.15837/ijccc.2020.6.3894 ◽

2020 ◽

Vol 15 (6) ◽

Author(s):

Hongxin Wang ◽

Lijing Jia ◽

Heng Zhuang ◽

Xueyan Li ◽

Yuzhuo Zhao ◽

...

Keyword(s):

Genetic Algorithm ◽

Support Vector Machine ◽

Logistic Regression ◽

Rough Set ◽

Rough Set Theory ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

Disease Prediction ◽

Fusion Algorithm

This study is to solve the problems of an overly-broad scale of medical indicators, lack of retrospective research samples, insufficient depth of data mining, and low disease prediction accuracy. In this paper, we propose an intelligent screening algorithm that combines a genetic algorithm, cellular automata, and rough set theory. This algorithm can achieve high accuracy in predicting patient outcomes with a small number of indicators. And we compare it with the traditional genetic algorithm. We built the prediction model with 64 indicators based on the logistic regression (AUC 0.8628), support vector machine (AUC 0.5319), Naïve Bayes (AUC 0.7102), and AdaBoost algorithms (AUC 0.9095). Using the cellular genetic algorithm for attribute screening not only effectively reduces the number of indicators but also achieve almost the same accuracy of prediction with 8 indicators based on the logistic regression (AUC 0.8782), support vector machine (AUC 0.8525), Naïve Bayes (AUC 0.8408), and AdaBoost algorithms (AUC 0.8770). Compared with the traditional scoring system, the predictive model established in this paper can more accurately predict rebleeding accidents based on physiological test indicators and continuous patient indicators.

Download Full-text

Multidamage Detection of Bridges Using Rough Set Theory and Naive-Bayes Classifier

Mathematical Problems in Engineering ◽

10.1155/2018/6752456 ◽

2018 ◽

Vol 2018 ◽

pp. 1-13 ◽

Cited By ~ 1

Author(s):

Shuang Sun ◽

Li Liang ◽

Ming Li ◽

Xin Li

Keyword(s):

Damage Detection ◽

Set Theory ◽

Rough Set ◽

Rough Set Theory ◽

Naive Bayes ◽

Detection Method ◽

Naïve Bayes ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Two Stage

This paper is intended to introduce a two-stage detection method to solve the multidamage problem in bridges. Vibration analysis is conducted to acquire the dynamic fingerprints which are regarded as information sources. Bayesian fusion is used to integrate these sources and preliminarily locate the damage. Then, the RSNB method which combines rough set theory and Naive-Bayes classifier is proposed to simplify the sample dimensions and fuse the remaining attributes for damage extent detection. A numerical simulation of a real structure, the Sishui Bridge in Shenyang, China, is conducted to validate the effectiveness of the proposed detection method. Data fusion based method is compared with single-valued index method at the damage localization stage. The proposed RSNB method is compared with the Back Propagation Neural Network (BPNN) method at the damage qualification stage. The results show that the proposed two-stage damage detection method has better performances in regard to transparency, accuracy, efficiency, noise robustness, and stability. Furthermore, an ambient excitation modal test was carried out on the bridge to obtain the vibration responses and assess the damage condition with the proposed method. This novel approach is applicable for early damage detection and provides a basis for bridge management and maintenance.

Download Full-text