scholarly journals Rotation Forest model modification within the email spam classification

Author(s):  
А.О. Шанін

Increased use of email in daily transactions for many businesses or general communication due to its cost-effectiveness has made emails vulnerable to attacks, including spam. Spam emails are unsolicited messages that are very similar to each other and sent to multiple recipients randomly. This study analyzes the Rotation Forest model and modifies it for spam classification problem. Also, the aim of this study is to create a better classifier. To improve classifier stability, the experiments were carried out on Enron spam, Ling spam, and SpamAssasin datasets and evaluated for accuracy, f-measure, precision, and recall.

2012 ◽  
Vol 5s1 ◽  
pp. BII.S8958 ◽  
Author(s):  
Kirk Roberts ◽  
Sanda M. Harabagiu

In this paper we report on the approaches that we developed for the 2011 i2b2 Shared Task on Sentiment Analysis of Suicide Notes. We have cast the problem of detecting emotions in suicide notes as a supervised multi-label classification problem. Our classifiers use a variety of features based on (a) lexical indicators, (b) topic scores, and (c) similarity measures. Our best submission has a precision of 0.551, a recall of 0.485, and a F-measure of 0.516.


Author(s):  
Hadj Ahmed Bouarara

The internet era promotes electronic commerce and facilitates access to many services. In today's digital society, the explosion in communication has revolutionized the field of electronic communication. Unfortunately, this technology has become incontestably the original source of malicious activities, especially the plague called undesirables email (SPAM) that has grown tremendously in the last few years. This chapter unveils fresh bio-inspired techniques (artificial social cockroaches [ASC], artificial haemostasis system [AHS], and artificial heart lungs system [AHLS]) and their application for SPAM detection. For the experimentation, the authors used the benchmark SMS Spam corpus V.0.1 and the validation measures (recall, precision, f-measure, entropy, accuracy, and error). They optimize the sensitive parameters of each algorithm (text representation technique, distance measure, weightings, and threshold). The results are positive compared to the result of artificial social bees and machine-learning algorithms (decision tree C4.5 and K-means).


2018 ◽  
Vol 272 ◽  
pp. 638-646 ◽  
Author(s):  
Hui-Juan Zhu ◽  
Zhu-Hong You ◽  
Ze-Xuan Zhu ◽  
Wei-Lei Shi ◽  
Xing Chen ◽  
...  

10.29007/f4j4 ◽  
2018 ◽  
Author(s):  
Behnam Sabeti ◽  
Pedram Hosseini ◽  
Gholamreza Ghassem-Sani ◽  
Sَeyed Abolghasem Mirroshandel

Sentiment analysis refers to the use of natural language processing to identify and extract subjective information from textual resources. One approach for sentiment extraction is using a sentiment lexicon. A sentiment lexicon is a set of words associated with the sentiment orientation that they express. In this paper, we describe the process of generating a general purpose sentiment lexicon for Persian. A new graph-based method is introduced for seed selection and expansion based on an ontology. Sentiment lexicon generation is then mapped to a document classification problem. We used the K-nearest neighbors and nearest centroid methods for classification. These classifiers have been evaluated based on a set of hand labeled synsets. The final sentiment lexicon has been generated by the best classifier. The results show an acceptable performance in terms of accuracy and F-measure in the generated sentiment lexicon.


Author(s):  
Palaiyah Solainayagi ◽  
Ramalingam Ponnusamy

<span lang="EN-US">Currently, customer's product review opinion plays an essential role in deciding the purchasing of the online product. A customer prefers to acquire the opinion of other customers by viewing their opinion during online products' reviews, blogs and social networking sites, etc. The majority of the product reviews including huge words. A few users provide the opinion; it is tough to analysis and understands the meaning of reviews. To improve user fulfillment and shopping experience, it has become a general practice for online sellers to allow their users to review or to communicate opinions of the products that they have sold. The major goal of the paper is to solve feature extraction problem and opinion classification problem from customers utilized product reviews which extract the feature words and opinion words from product reviews. To propose an Efficient Feature Extraction and Classification (EFEC) algorithm is implementing to extracts a feature from opinion words. The reviewer usually marks both positive and negative parts of the reviewed product, despite the fact that their general opinion on the product may be positive or negative. An EFEC algorithm is utilized to predict the number of positive and negative opinion in reviews. Based on Experimental evaluations, proposed algorithm improves accuracy 15.05%, precision 13.7%, recall 15.59% and F-measure 15.07% of the proposed system compared than existing methodologies</span>


2020 ◽  
pp. 693-726
Author(s):  
Hadj Ahmed Bouarara ◽  
Reda Mohamed Hamou ◽  
Abdelmalek Amine

The internet era promotes electronic commerce and facilitates access to many services. In today's digital society the explosion in communication has revolutionized the field of electronic communication. Unfortunately, this technology has become incontestably the original source of malicious activities, especially the plague called undesirables email (SPAM) that has grown tremendously in the last few years. This paper deals on the unveiling of fresh bio-inspired techniques (artificial social cockroaches (ASC), artificial haemostasis system (AHS) and artificial heart lungs system (AHLS)) and their application for SPAM detection. For the authors' experimentation, they have used the benchmark SMS Spam corpus V.0.1 and the validation measures (recall, precision, f-measure, entropy, accuracy and error). They have optimising the sensitive parameters of each algorithm (text representation technique, distance measure, weightings, and threshold). The results are positive compared to the result of artificial social bees and machine learning algorithms (decision tree C4.5 and K-means).


2019 ◽  
Vol 1 (5) ◽  
Author(s):  
Maryam Shuaib ◽  
Shafi’i Muhammad Abdulhamid ◽  
Olawale Surajudeen Adebayo ◽  
Oluwafemi Osho ◽  
Ismaila Idris ◽  
...  

Electronics ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 99 ◽  
Author(s):  
Krzysztof Gajowniczek ◽  
Iga Grzegorczyk ◽  
Tomasz Ząbkowski ◽  
Chandrajit Bajaj

Construction of an ensemble model is a process of combining many diverse base predictive learners. It arises questions of how to weight each model and how to tune the parameters of the weighting process. The most straightforward approach is simply to average the base models. However, numerous studies have shown that a weighted ensemble can provide superior prediction results to a simple average of models. The main goals of this article are to propose a new weighting algorithm applicable for each tree in the Random Forest model and the comprehensive examination of the optimal parameter tuning. Importantly, the approach is motivated by its flexibility, good performance, stability, and resistance to overfitting. The proposed scheme is examined and evaluated on the Physionet/Computing in Cardiology Challenge 2015 data set. It consists of signals (electrocardiograms and pulsatory waveforms) from intensive care patients which triggered an alarm for five cardiac arrhythmia types (Asystole, Bradycardia, Tachycardia, Ventricular Tachycardia, and Ventricular Fultter/Fibrillation). The classification problem regards whether the alarm should or should not have been generated. It was proved that the proposed weighting approach improved classification accuracy for the three most challenging out of the five investigated arrhythmias comparing to the standard Random Forest model.


2019 ◽  
Vol 8 (3) ◽  
pp. 4148-4153

The swiftly growth of spam email has escalated the need to upgrade the existing spam detection and filtration methods. There is the existence of several machine learning methods for the classification and detection of email spam but these lacks in some cases. In this research work ensemble methods are adapted to detect the email spam. The machine learning methods of Multinomial Naïve Bayes and J48 Decision Tree algorithms are considered and ensembled. The considered ensemble methods are bagging and boosting. The experimentation is conducted on the dataset of CSDMC2010 Spam corpus. The results for the considered dataset are evaluated using individual classifiers, bagging, and boosting ensemble approaches. The system performance is accessed in terms of precision, recall, f-measure, and accuracy. The experimental outcomes indicates the distinguish results for the detection of email spam using ensemble methods.


Sign in / Sign up

Export Citation Format

Share Document