scholarly journals Using Reduced Set of Features to Detect Spam in Twitter Data with Decision Tree and KNN Classifier Algorithms

In social media, the users share their ideas, opinions to their neighbours and friends. Spammers send spam information to the genuine users to mislead them. This spam data is a very serious problem in social media sites. To detect spam messages in social media various spam detection methodologies are developed by researchers. The researchers used more number of features to construct the models. Generally the original dataset contains many irrelevant and redundant features. Such large amount of features reduces the spam detection accuracy. To improve the spam detection accuracy in social media networks, we have to reduce the meaningless attributes from high dimensional social media dataset. In order to reduce dimensionality of dataset, we have used one of the dimensionality reduction approach, called principal component analysis (PCA). After reducing the dimensionality of the dataset, the dataset samples are classified using Decision Tree Induction classifier algorithm and K Nearest Neighbour algorithm. In our proposed work these algorithms are used to check data samples are spam samples or ham samples. In this methodology, we have used Twitter dataset for testing proposed approach. Experimental results shows that KNN classifier outperforms compared to Decision tree classifier.

2021 ◽  
Author(s):  
Anwar Yahya Ebrahim ◽  
Hoshang Kolivand

The authentication of writers, handwritten autograph is widely realized throughout the world, the thorough check of the autograph is important before going to the outcome about the signer. The Arabic autograph has unique characteristics; it includes lines, and overlapping. It will be more difficult to realize higher achievement accuracy. This project attention the above difficulty by achieved selected best characteristics of Arabic autograph authentication, characterized by the number of attributes representing for each autograph. Where the objective is to differentiate if an obtain autograph is genuine, or a forgery. The planned method is based on Discrete Cosine Transform (DCT) to extract feature, then Spars Principal Component Analysis (SPCA) to selection significant attributes for Arabic autograph handwritten recognition to aid the authentication step. Finally, decision tree classifier was achieved for signature authentication. The suggested method DCT with SPCA achieves good outcomes for Arabic autograph dataset when we have verified on various techniques.


2018 ◽  
Vol 2018 ◽  
pp. 1-13 ◽  
Author(s):  
Hongchen Wu ◽  
Huaxiang Zhang ◽  
Lizhen Cui ◽  
Xinjun Wang

Privacy issues have become a major concern in the web of resource sharing, and users often have difficulty managing their information disclosure in the context of high-quality experiences from social media and Internet of Things. Recent studies have shown that users’ disclosure decisions may be influenced by heuristics from the crowds, leading to inconsistency in the disclosure volumes and reduction of the prediction accuracy. Therefore, an analysis of why this influence occurs and how to optimize the user experience is highly important. We propose a novel heuristic model that defines the data structures of items and participants in social media, utilizes a modified decision-tree classifier that can predict participants’ disclosures, and puts forward a correlation analysis for detecting disclosure inconsistences. The heuristic model is applied to real-time dataset to evaluate the behavioral effects. Decision-tree classifier and correlation analysis indeed prove that some participants’ behaviors in information disclosures became decreasingly correlated during item requesting. Participants can be “persuaded” to change their disclosure behaviors, and the users’ answers to the mildly sensitive items tend to be more variable and less predictable. Using this approach, recommender systems in social media can thus know the users better and provide service with higher prediction accuracy.


Border Gateway Protocol (BGP) is utilized to send and receive data packets over the internet. Over the years, this protocol has suffered from some massive hits, caused by worms, such as Nimda, Slammer, Code Red etc., hardware failures, and/or prefix hijacking. This caused obstruction of services to many. However, Identification of anomalous messages traversing over BGP allows discovering of such attacks in time. In this paper, a Machine Learning approach has been applied to identify such BGP messages. Principal Component Analysis technique was applied for reducing dimensionality up to 2 components, followed by generation of Decision Tree, Random Forest, AdaBoost and GradientBoosting classifiers. On fine tuning the parameters, the random forest classifier generated an accuracy of 97.84%, the decision tree classifier followed closely with an accuracy of 97.38%. The GradientBoosting Classifier gave an accuracy of 95.41% and the AdaBoost Classifier gave an accuracy of 94.43%.


2021 ◽  
Vol 12 (11) ◽  
pp. 1916-1924
Author(s):  
Tamanna Siddiqui, Et. al.

Sarcasm is well-defined as a cutting, frequently sarcastic remark intended to fast ridicule or dislike. Irony detection is the assignment of fittingly labeling the text as’ Sarcasm’ or ’non- Sarcasm.’ There is a challenging task owing to the deficiency of facial expressions and intonation in the text. Social media and micro-blogging websites are extensively explored for getting the information to extract the opinion of the target because a huge of text data existence is put out into the open field into social media like Twitter. Such large, openly available text data could be utilized for a variety of researches. Here we applied text data set for classifying Sarcasm and experiments have been made from the textual data extracted from the Twitter data set. Text data set downloaded from Kaggle, including 1984 tweets that collected from Twitter. These data already have labels here. In this paper, we apply these data to train our model Classifiers for different algorithms to see the ability of model machine learning to recognize sarcasm and non-sarcasm through a set of the process start by text pre-processing feature extraction (TF-IDF) and apply different classification algorithms, such as Decision Tree classifier, Multinomial Naïve Bayes Classifier, Support vector machines, and Logistic Regression classifier. Then tuning a model fitting the best results, we get in (TF-IDF) we achieve 0.94% in Multinomial NB, Decision Tree Classifier we achieve 0.93%, Logistic Regression we achieve 0.97%, and Support vector machines (SVM) we achieve 0.42%. All these result models were improved, except the SVM model has the lowest accuracy. The results were extracted, and the evaluation of the results has been proved above to be good in accuracy for identifying sarcastic impressions of people.


Author(s):  
Hassan Najadat ◽  
Mohammad A. Alzubaidi ◽  
Islam Qarqaz

Reviews or comments that users leave on social media have great importance for companies and business entities. New product ideas can be evaluated based on customer reactions. However, this use of social media is complicated by those who post spam on social media in the form of reviews and comments. Designing methodologies to automatically detect and block social media spam is complicated by the fact that spammers continuously develop new ways to leave their spam comments. Researchers have proposed several methods to detect English spam reviews. However, few studies have been conducted to detect Arabic spam reviews. This article proposes a keyword-based method for detecting Arabic spam reviews. Keywords or Features are subsets of words from the original text that are labelled as important. A term's weight, Term Frequency–Inverse Document Frequency (TF-IDF) matrix, and filter methods (such as information gain, chi-squared, deviation, correlation, and uncertainty) have been used to extract keywords from Arabic text. The method proposed in this article detects Arabic spam in Facebook comments. The dataset consists of 3,000 Arabic comments extracted from Facebook pages. Four different machine learning algorithms are used in the detection process, including C4.5, kNN, SVM, and Naïve Bayes classifiers. The results show that the Decision Tree classifier outperforms the other classification algorithms, with a detection accuracy of 92.63%.


Sign in / Sign up

Export Citation Format

Share Document