Spam image email filtering using K-NN and SVM

<p>The developing utilization of web has advanced a simple and quick method for e-correspondence. The outstanding case for this is e-mail. Presently days sending and accepting email as a method for correspondence is prominently utilized. Be that as it may, at that point there stand up an issue in particular, Spam mails. Spam sends are the messages send by some obscure sender just to hamper the improvement of Internet e.g. Advertisement and many more. Spammers introduced the new technique of embedding the spam mails in the attached image in the mail. In this paper, we proposed a method based on combination of SVM and KNN. SVM tend to set aside a long opportunity to prepare with an expansive information set. On the off chance that "excess" examples are recognized and erased in pre-handling, the preparation time could be diminished fundamentally. We propose a k-nearest neighbor (k-NN) based example determination strategy. The strategy tries to select the examples that are close to the choice limit and that are effectively named. The fundamental thought is to discover close neighbors to a question test and prepare a nearby SVM that jelly the separation work on the gathering of neighbors. Our experimental studies based on a public available dataset (Dredze) show that results are improved to approximately 98%.</p>

Download Full-text

ADAPTIVE SPAM FILTERING USING DYNAMIC FEATURE SPACES

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213007003473 ◽

2007 ◽

Vol 16 (04) ◽

pp. 627-646 ◽

Cited By ~ 6

Author(s):

YAN ZHOU ◽

MADHURI S. MULEKAR ◽

PRAVEEN NERELLAPALLI

Keyword(s):

Decision Tree ◽

Adaptive Learning ◽

Nearest Neighbor ◽

Experimental Results ◽

Huffman Coding ◽

Support Vector ◽

Dynamic Feature ◽

Spam Filtering ◽

K Nearest Neighbor ◽

E Mail

Unsolicited bulk e-mail, also known as spam, has been an increasing problem for the e-mail society. This paper presents a new spam filtering strategy that 1) uses a practical entropy coding technique, Huffman coding, to dynamically encode the feature space of the e-mail collected over time and, 2) applies an online algorithm to adaptively enhance the learned spam concept as new e-mail data becomes available. The contributions of this work include a highly efficient spam filtering algorithm in which the input space is radically reduced to a single-dimension input vector, and an adaptive learning technique that is robust to vocabulary change, concept drifting and skewed class distributions. We compare our technique with several existing off-line learning techniques including support vector machine, logistic regression, naïve Bayes, k-nearest neighbor, C4.5 decision tree, RBFNetwork, boosted decision tree and stacking. We demonstrate the effectiveness of our technique by presenting the experimental results on the e-mail data that is publicly available. A more in-depth statistical analysis on the experimental results is also presented and discussed.

Download Full-text

MiRNA-disease association prediction via hypergraph learning based on high-dimensionality features

BMC Medical Informatics and Decision Making ◽

10.1186/s12911-020-01320-w ◽

2021 ◽

Vol 21 (S1) ◽

Author(s):

Yu-Tian Wang ◽

Qing-Wen Wu ◽

Zhen Gao ◽

Jian-Cheng Ni ◽

Chun-Hou Zheng

Keyword(s):

Computational Models ◽

Cross Validation ◽

Nearest Neighbor ◽

Experimental Studies ◽

Prediction Method ◽

Disease Association ◽

High Dimensionality ◽

K Nearest Neighbor ◽

Hypergraph Learning ◽

Disease Associations

Abstract Background MicroRNAs (miRNAs) have been confirmed to have close relationship with various human complex diseases. The identification of disease-related miRNAs provides great insights into the underlying pathogenesis of diseases. However, it is still a big challenge to identify which miRNAs are related to diseases. As experimental methods are in general expensive and time‐consuming, it is important to develop efficient computational models to discover potential miRNA-disease associations. Methods This study presents a novel prediction method called HFHLMDA, which is based on high-dimensionality features and hypergraph learning, to reveal the association between diseases and miRNAs. Firstly, the miRNA functional similarity and the disease semantic similarity are integrated to form an informative high-dimensionality feature vector. Then, a hypergraph is constructed by the K-Nearest-Neighbor (KNN) method, in which each miRNA-disease pair and its k most relevant neighbors are linked as one hyperedge to represent the complex relationships among miRNA-disease pairs. Finally, the hypergraph learning model is designed to learn the projection matrix which is used to calculate uncertain miRNA-disease association score. Result Compared with four state-of-the-art computational models, HFHLMDA achieved best results of 92.09% and 91.87% in leave-one-out cross validation and fivefold cross validation, respectively. Moreover, in case studies on Esophageal neoplasms, Hepatocellular Carcinoma, Breast Neoplasms, 90%, 98%, and 96% of the top 50 predictions have been manually confirmed by previous experimental studies. Conclusion MiRNAs have complex connections with many human diseases. In this study, we proposed a novel computational model to predict the underlying miRNA-disease associations. All results show that the proposed method is effective for miRNA–disease association predication.

Download Full-text

Weighted k-Nearest Neighbour for Image Spam Classification

Iraqi Journal of Science ◽

10.24996/ijs.2021.62.3.32 ◽

2021 ◽

pp. 1036-1045

Author(s):

Ahmad M. Salih ◽

Ban N. Nadim

Keyword(s):

Optical Character Recognition ◽

Data Exchange ◽

Nearest Neighbor ◽

Learning Algorithm ◽

Computational Cost ◽

Image Features ◽

Spam Detection ◽

K Nearest Neighbor ◽

Image Spam ◽

E Mail

E-mail is an efficient and reliable data exchange service. Spams are undesired e-mail messages which are randomly sent in bulk usually for commercial aims. Obfuscated image spamming is one of the new tricks to bypass text-based and Optical Character Recognition (OCR)-based spam filters. Image spam detection based on image visual features has the advantage of efficiency in terms of reducing the computational cost and improving the performance. In this paper, an image spam detection schema is presented. Suitable image processing techniques were used to capture the image features that can differentiate spam images from non-spam ones. Weighted k-nearest neighbor, which is a simple, yet powerful, machine learning algorithm, was used as a classifier. The results confirm the effectiveness of the proposed schema as it is evaluated over two datasets. The first dataset is a real and benchmark dataset while the other is a real-like, modern, and more challenging dataset collected from social media and many public available image spam datasets. The obtained accuracy was 99.36% and 91% on benchmark and the proposed dataset, respectively.

Download Full-text

IMPACT OF FULL RANK PRINCIPAL COMPONENT ANALYSIS ON CLASSIFICATION ALGORITHMS FOR FACE RECOGNITION

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001412560058 ◽

2012 ◽

Vol 26 (03) ◽

pp. 1256005 ◽

Cited By ~ 4

Author(s):

FENGXI SONG ◽

JANE YOU ◽

DAVID ZHANG ◽

YONG XU

Keyword(s):

Principal Component Analysis ◽

Face Recognition ◽

Nearest Neighbor ◽

Experimental Studies ◽

Principal Component ◽

Full Rank ◽

Component Analysis ◽

Support Vector ◽

Classification Algorithms ◽

K Nearest Neighbor

Full rank principal component analysis (FR-PCA) is a special form of principal component analysis (PCA) which retains all nonzero components of PCA. Generally speaking, it is hard to estimate how the accuracy of a classifier will change after data are compressed by PCA. However, this paper reveals an interesting fact that the transformation by FR-PCA does not change the accuracy of many well-known classification algorithms. It predicates that people can safely use FR-PCA as a preprocessing tool to compress high-dimensional data without deteriorating the accuracies of these classifiers. The main contribution of the paper is that it theoretically proves that the transformation by FR-PCA does not change accuracies of the k nearest neighbor, the minimum distance, support vector machine, large margin linear projection, and maximum scatter difference classifiers. In addition, through extensive experimental studies conducted on several benchmark face image databases, this paper demonstrates that FR-PCA can greatly promote the efficiencies of above-mentioned five classification algorithms in appearance-based face recognition.

Download Full-text

EEG in game user analysis: A framework for expertise classification during gameplay

10.1101/2021.01.29.428766 ◽

2021 ◽

Author(s):

Tehmina Hafeez ◽

Sanay Muhammad Umar Saeed ◽

Aamir Arsalan ◽

Syed Muhammad Anwar ◽

Muhammad Usman Ashraf ◽

...

Keyword(s):

Computer Science ◽

Video Game ◽

Evaluation Method ◽

Nearest Neighbor ◽

Brain Activity ◽

Computer Engineering ◽

K Nearest Neighbor ◽

Engineering And Technology ◽

Expertise Level ◽

E Mail

AbstractVideo games have become a ubiquitous part of demographically diverse cultures. Numerous studies have focused on analyzing the cognitive aspects involved in game playing that could help provide an optimal gaming experience level by improving video game design. To this end, we present a framework for classifying the game player’s expertise level using wearable electroencephalography (EEG) headset. We hypothesize that expert/novice players’ brain activity is different, which can be classified using the frequency domain features extracted from EEG signals of the game player. A systematic channel reduction approach is presented using a correlation-based attribute evaluation method. This approach identifies two significant EEG channels, i.e., AF3 and P7, from the Emotiv EPOC headset’s fourteen channels. The features extracted from these EEG channels contribute the most to the video game player’s expertise level classification. This finding is validated by performing statistical analysis (t-test) over the extracted features. Moreover, among multiple classifiers used, K-nearest neighbor is the best classifier in classifying the game player’s expertise level with up to 98.04% classification accuracy.Author summaryTehmina Hafeez ROLES Investigation, Writing – original draft * E-mail: [email protected] AFFILIATION Department of Computer Engineering, University of Engineering and Technology, Taxila, 47050, Pakistan.Sanay Muhammad Umar Saeed (Corresponding author) ROLES Conceptualization, Writing – review editing * E-mail: [email protected] AFFILIATION Department of Computer Engineering, University of Engineering and Technology, Taxila, 47050, Pakistan.Aamir Arsalan ROLES Methodology, Writing – review editing * E-mail: [email protected] AFFILIATION Department of Computer Engineering, University of Engineering and Technology, Taxila, 47050, Pakistan.Syed Muhammad Anwar ROLES Validation, Writing – review editing * E-mail: [email protected] AFFILIATION Department of Software Engineering, University of Engineering and Technology, Taxila, 47050, Pakistan.Muhammad Usman Ashraf (Corresponding author) ROLES Validation, Writing – review editing * E-mail: [email protected] AFFILIATION Department of Computer Science, University of management and Technology, Lahore (Sialkot), 51040, Pakistan.Khalid Alsubhi ROLES Conceptualization, Writing – review editing AFFILIATION Department of Computer Science, King Abdul Aziz University, Jeddah, Saudi Arabia.

Download Full-text

E-Mail Spam Filtering

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.39004 ◽

2021 ◽

Vol 9 (11) ◽

pp. 1265-1269

Author(s):

Rohitkumar R Upadhyay

Keyword(s):

Nearest Neighbor ◽

Low Cost ◽

Online Communication ◽

Spam Filtering ◽

K Nearest Neighbor ◽

Knowledge Based ◽

Classification Evaluation ◽

Additive Regression ◽

E Mail ◽

Email Spam

Abstract: E-mail is that the most typical method of communication because of its ability to get, the rapid modification of messages and low cost of distribution. E-mail is one among the foremost secure medium for online communication and transferring data or messages through the net. An overgrowing increase in popularity, the quantity of unsolicited data has also increased rapidly. Spam causes traffic issues and bottlenecks that limit the quantity of memory and bandwidth, power and computing speed. To filtering data, different approaches exist which automatically detect and take away these untenable messages. There are several numbers of email spam filtering technique like Knowledge-based technique, Clustering techniques, Learning-based technique, Heuristic processes so on. For data filtering, various approaches exist that automatically detect and suppress these indefensible messages. This paper illustrates a survey of various existing email spam filtering system regarding Machine Learning Technique (MLT) like Naive Bayes, SVM, K-Nearest Neighbor, Bayes Additive Regression, KNN Tree, and rules. Henceforth here we give the classification, evaluation and comparison of some email spam filtering system and summarize the scenario regarding accuracy rate of various existing approaches. Keywords: e-mail spam, unsolicited bulk email, spam filtering methods.

Download Full-text

Machine Learning Verdict of EEG Signals in Brain Computer Interface

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1838114 ◽

2018 ◽

pp. 429-441

Author(s):

M. Jeyanthi ◽

C. Velayutham

Keyword(s):

Nearest Neighbor ◽

Technology Development ◽

Vital Role ◽

Svm Classifier ◽

K Nearest Neighbor ◽

Data Mining Technique ◽

Data Set ◽

Eeg Data ◽

Irrelevant Attributes

In Science and Technology Development BCI plays a vital role in the field of Research. Classification is a data mining technique used to predict group membership for data instances. Analyses of BCI data are challenging because feature extraction and classification of these data are more difficult as compared with those applied to raw data. In this paper, We extracted features using statistical Haralick features from the raw EEG data . Then the features are Normalized, Binning is used to improve the accuracy of the predictive models by reducing noise and eliminate some irrelevant attributes and then the classification is performed using different classification techniques such as Naïve Bayes, k-nearest neighbor classifier, SVM classifier using BCI dataset. Finally we propose the SVM classification algorithm for the BCI data set.

Download Full-text

PENENTUAN DAERAH PRIORITAS PELAYANAN AKTA KELAHIRAN DENGAN METODE K-NN DAN K-MEANS

Komputasi: Jurnal Ilmiah Ilmu Komputer dan Matematika ◽

10.33751/komputasi.v17i1.1735 ◽

2020 ◽

Vol 17 (1) ◽

pp. 319-328

Author(s):

Ade Muchlis Maulana Anwar ◽

Prihastuti Harsani ◽

Aries Maesya

Keyword(s):

Nearest Neighbor ◽

Information Gain ◽

Birth Certificate ◽

Population Data ◽

Community Services ◽

Birth Certificates ◽

Similar Data ◽

K Nearest Neighbor ◽

Civil Registration ◽

The Family

Population Data is individual data or aggregate data that is structured as a result of Population Registration and Civil Registration activities. Birth Certificate is a Civil Registration Deed as a result of recording the birth event of a baby whose birth is reported to be registered on the Family Card and given a Population Identification Number (NIK) as a basis for obtaining other community services. From the total number of integrated birth certificate reporting for the 2018 Population Administration Information System (SIAK) totaling 570,637 there were 503,946 reported late and only 66,691 were reported publicly. Clustering is a method used to classify data that is similar to others in one group or similar data to other groups. K-Nearest Neighbor is a method for classifying objects based on learning data that is the closest distance to the test data. k-means is a method used to divide a number of objects into groups based on existing categories by looking at the midpoint. In data mining preprocesses, data is cleaned by filling in the blank data with the most dominating data, and selecting attributes using the information gain method. Based on the k-nearest neighbor method to predict delays in reporting and the k-means method to classify priority areas of service with 10,000 birth certificate data on birth certificates in 2019 that have good enough performance to produce predictions with an accuracy of 74.00% and with K = 2 on k-means produces a index davies bouldin of 1,179.

Download Full-text

A Scalable K-Nearest Neighbor Algorithm for Recommendation System Problems

2020 43rd International Convention on Information, Communication and Electronic Technology (MIPRO) ◽

10.23919/mipro48935.2020.9245195 ◽

2020 ◽

Author(s):

A. Sagdic ◽

C. Tekinbas ◽

E. Arslan ◽

T. Kucukyilmaz

Keyword(s):

Recommendation System ◽

Nearest Neighbor ◽

K Nearest Neighbor ◽

Nearest Neighbor Algorithm ◽

K Nearest Neighbor Algorithm

Download Full-text

Optimizing Error Rate in Intrusion Detection System Using Artificial Neural Network Algorithm

International Journal of Emerging Research in Management and Technology ◽

10.23956/ijermt.v6i9.102 ◽

2018 ◽

Vol 6 (9) ◽

pp. 152

Author(s):

S. Vijaya Rani ◽

G. N. K. Suresh Babu

Keyword(s):

Neural Network ◽

Artificial Neural Network ◽

Intrusion Detection ◽

Error Rate ◽

Learning Process ◽

Nearest Neighbor ◽

Detection System ◽

Support Vector ◽

K Nearest Neighbor ◽

Artificial Neural

The illegal hackers penetrate the servers and networks of corporate and financial institutions to gain money and extract vital information. The hacking varies from one computing system to many system. They gain access by sending malicious packets in the network through virus, worms, Trojan horses etc. The hackers scan a network through various tools and collect information of network and host. Hence it is very much essential to detect the attacks as they enter into a network. The methods available for intrusion detection are Naive Bayes, Decision tree, Support Vector Machine, K-Nearest Neighbor, Artificial Neural Networks. A neural network consists of processing units in complex manner and able to store information and make it functional for use. It acts like human brain and takes knowledge from the environment through training and learning process. Many algorithms are available for learning process This work carry out research on analysis of malicious packets and predicting the error rate in detection of injured packets through artificial neural network algorithms.

Download Full-text