Detecting Arabic Spam Reviews in Social Networks Based on Classification Algorithms

Reviews or comments that users leave on social media have great importance for companies and business entities. New product ideas can be evaluated based on customer reactions. However, this use of social media is complicated by those who post spam on social media in the form of reviews and comments. Designing methodologies to automatically detect and block social media spam is complicated by the fact that spammers continuously develop new ways to leave their spam comments. Researchers have proposed several methods to detect English spam reviews. However, few studies have been conducted to detect Arabic spam reviews. This article proposes a keyword-based method for detecting Arabic spam reviews. Keywords or Features are subsets of words from the original text that are labelled as important. A term's weight, Term Frequency–Inverse Document Frequency (TF-IDF) matrix, and filter methods (such as information gain, chi-squared, deviation, correlation, and uncertainty) have been used to extract keywords from Arabic text. The method proposed in this article detects Arabic spam in Facebook comments. The dataset consists of 3,000 Arabic comments extracted from Facebook pages. Four different machine learning algorithms are used in the detection process, including C4.5, kNN, SVM, and Naïve Bayes classifiers. The results show that the Decision Tree classifier outperforms the other classification algorithms, with a detection accuracy of 92.63%.

Download Full-text

Using Reduced Set of Features to Detect Spam in Twitter Data with Decision Tree and KNN Classifier Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.f3616.078919 ◽

2019 ◽

Vol 8 (9) ◽

pp. 6-12

Keyword(s):

Social Media ◽

Decision Tree ◽

Principal Component ◽

Spam Detection ◽

Detection Accuracy ◽

Decision Tree Classifier ◽

Social Media Networks ◽

Original Dataset ◽

Knn Classifier ◽

Tree Classifier

In social media, the users share their ideas, opinions to their neighbours and friends. Spammers send spam information to the genuine users to mislead them. This spam data is a very serious problem in social media sites. To detect spam messages in social media various spam detection methodologies are developed by researchers. The researchers used more number of features to construct the models. Generally the original dataset contains many irrelevant and redundant features. Such large amount of features reduces the spam detection accuracy. To improve the spam detection accuracy in social media networks, we have to reduce the meaningless attributes from high dimensional social media dataset. In order to reduce dimensionality of dataset, we have used one of the dimensionality reduction approach, called principal component analysis (PCA). After reducing the dimensionality of the dataset, the dataset samples are classified using Decision Tree Induction classifier algorithm and K Nearest Neighbour algorithm. In our proposed work these algorithms are used to check data samples are spam samples or ham samples. In this methodology, we have used Twitter dataset for testing proposed approach. Experimental results shows that KNN classifier outperforms compared to Decision tree classifier.

Download Full-text

Cyber Bullying Detection for Twitter Using ML Classification Algorithms

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.38701 ◽

2021 ◽

Vol 9 (11) ◽

pp. 24-29

Author(s):

Muskan Patidar

Keyword(s):

Machine Learning ◽

Social Media ◽

Natural Language ◽

Naive Bayes ◽

Learning Algorithms ◽

Naïve Bayes ◽

Cyber Bullying ◽

Machine Learning Algorithms ◽

Support Vector ◽

Classification Algorithms

Abstract: Social networking platforms have given us incalculable opportunities than ever before, and its benefits are undeniable. Despite benefits, people may be humiliated, insulted, bullied, and harassed by anonymous users, strangers, or peers. Cyberbullying refers to the use of technology to humiliate and slander other people. It takes form of hate messages sent through social media and emails. With the exponential increase of social media users, cyberbullying has been emerged as a form of bullying through electronic messages. We have tried to propose a possible solution for the above problem, our project aims to detect cyberbullying in tweets using ML Classification algorithms like Naïve Bayes, KNN, Decision Tree, Random Forest, Support Vector etc. and also we will apply the NLTK (Natural language toolkit) which consist of bigram, trigram, n-gram and unigram on Naïve Bayes to check its accuracy. Finally, we will compare the results of proposed and baseline features with other machine learning algorithms. Findings of the comparison indicate the significance of the proposed features in cyberbullying detection. Keywords: Cyber bullying, Machine Learning Algorithms, Twitter, Natural Language Toolkit

Download Full-text

Analytical Models for Traffic Congestion and Accident Analysis

10.31979/mti.2021.2102 ◽

2021 ◽

Author(s):

Hongrui Liu ◽

Rahul Ramachandra Shetty

Keyword(s):

Traffic Congestion ◽

Regression Tree ◽

Machine Learning Algorithms ◽

Analytical Models ◽

Gradient Boosting ◽

Decision Tree Classifier ◽

The Road ◽

Tree Classifier ◽

Extreme Gradient Boosting ◽

On The Road

In the US, over 38,000 people die in road crashes each year, and 2.35 million are injured or disabled, according to the statistics report from the Association for Safe International Road Travel (ASIRT) in 2020. In addition, traffic congestion keeping Americans stuck on the road wastes millions of hours and billions of dollars each year. Using statistical techniques and machine learning algorithms, this research developed accurate predictive models for traffic congestion and road accidents to increase understanding of the complex causes of these challenging issues. The research used US Accidents data consisting of 49 variables describing 4.2 million accident records from February 2016 to December 2020, as well as logistic regression, tree-based techniques such as Decision Tree Classifier and Random Forest Classifier (RF), and Extreme Gradient boosting (XG-boost) to process and train the models. These models will assist people in making smart real-time transportation decisions to improve mobility and reduce accidents.

Download Full-text

QSAR Models for Active Substances Against Pseudomonas aeruginosa Using Disk-diffusion Test Data

10.20944/preprints202102.0147.v1 ◽

2021 ◽

Author(s):

Cosmin Alexandru Bugeac ◽

Robert Ancuceanu ◽

Mihaela Dinu

Keyword(s):

Pseudomonas Aeruginosa ◽

Model Development ◽

Qsar Model ◽

Machine Learning Algorithms ◽

Disk Diffusion ◽

Support Vector ◽

Decision Tree Classifier ◽

K Nearest Neighbors ◽

Disk Diffusion Test ◽

Tree Classifier

Pseudomonas aeruginosa is a Gram-negative bacillus included among the six "ESKAPE" microbial species with an outstanding ability to "escape" currently used antibiotics and developing new antibiotics against it is of the highest priority. Whereas minimum inhibitory concentration (MIC) values against Pseudomonas aeruginosa have been used previously for QSAR model development, disk diffusion results (inhibition zones) have not been apparently used for this purpose in the literature, and we decided to explore their use in this sense. We developed multiple QSAR methods using several machine learning algorithms (Support vector classifier, K Nearest Neighbors, Random Forest Classifier, Decision Tree Classifier, AdaBoost Classifier, Logistic Regression, and Naive Bayes Classifier). The main descriptors used in building the models belonged to the families of adjacency matrix, constitutional descriptors, first highest eigenvalue of Burden matrix, centered Moreau-Broto autocorrelation, and averaged and centered Moreau-Broto autocorrelation descriptors. A total of 32 models were built, of which 28 were selected and stacked to create a meta-model. In terms of balanced accuracy, the best performance was provided by KNN, SVM and AdaBoost algorithms, but the ensemble method had slightly superior results in nested cross-validation.

Download Full-text

Sentiment Analysis on Social Media Big Data With Multiple Tweet Words

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.j9684.0881019 ◽

2019 ◽

Vol 8 (10) ◽

pp. 3429-3434 ◽

Cited By ~ 2

Keyword(s):

Machine Learning ◽

Social Media ◽

Big Data ◽

Sentiment Analysis ◽

Language Processing ◽

Sentiment Classification ◽

Support Vector ◽

Decision Tree Classifier ◽

Machine Learning Classification ◽

Tree Classifier

The main objective of this paper is Analyze the reviews of Social Media Big Data of E-Commerce product’s. And provides helpful result to online shopping customers about the product quality and also provides helpful decision making idea to the business about the customer’s mostly liking and buying products. This covers all features or opinion words, like capitalized words, sequence of repeated letters, emoji, slang words, exclamatory words, intensifiers, modifiers, conjunction words and negation words etc available in tweets. The existing work has considered only two or three features to perform Sentiment Analysis with the machine learning technique Natural Language Processing (NLP). In this proposed work familiar Machine Learning classification models namely Multinomial Naïve Bayes, Support Vector Machine, Decision Tree Classifier, and, Random Forest Classifier are used for sentiment classification. The sentiment classification is used as a decision support system for the customers and also for the business.

Download Full-text

A Hybrid Model for Android Malware Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2250.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 2656-2662

Keyword(s):

Malware Detection ◽

Machine Learning Algorithms ◽

Detection Accuracy ◽

Dynamic Parameters ◽

Android Malware ◽

Detection Techniques ◽

Advantages And Disadvantages ◽

Android Malware Detection ◽

Tree Classifier ◽

Hybrid Detection

Android malware have risen exponentially over the past few years, posing several serious threats such as system damage, financial loss, and mobile botnets. Various detection techniques have been proposed in the literature for Android malware detection. Some of the techniques analyze static parameters such as permissions, or intents, whereas, others focus on dynamic parameters such as network traffic or system calls. Static techniques are relatively easier to implement, however, stealthy recent malware evade static detection by virtue of update attacks. Dynamic detection can be used to detect such stealthy malware, however, it increases the computation overhead. Hence, both kinds of techniques have their own advantages and disadvantages. In this paper, we have proposed an innovative hybrid detection model that uses both static and dynamic features for malware analysis and detection. We first rank the static and dynamic parameters according to the information gain and then apply machine learning algorithms in the testing phase. The results indicate that hybrid approach is better than both static and dynamic approaches and the proposed model achieves 98.9% detection accuracy with Decision Tree classifier

Download Full-text

Ensemble-Based Machine Learning for Predicting Sudden Human Fall Using Health Data

Mathematical Problems in Engineering ◽

10.1155/2021/8608630 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Utkarsh Saxena ◽

Soumen Moulik ◽

Soumya Ranjan Nayak ◽

Thomas Hanne ◽

Diptendu Sinha Roy

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Majority Voting ◽

Support Vector ◽

Human Beings ◽

Medical Terminology ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Health Parameters

We attempt to predict the accidental fall of human beings due to sudden abnormal changes in their health parameters such as blood pressure, heart rate, and sugar level. In medical terminology, this problem is known as Syncope. The primary motivation is to prevent such falls by predicting abnormal changes in these health parameters that might trigger a sudden fall. We apply various machine learning algorithms such as logistic regression, a decision tree classifier, a random forest classifier, K-Nearest Neighbours (KNN), a support vector machine, and a naive Bayes classifier on a relevant dataset and verify our results with the cross-validation method. We observe that the KNN algorithm provides the best accuracy in predicting such a fall. However, the accuracy results of some other algorithms are also very close. Thus, we move one step further and propose an ensemble model, Majority Voting, which aggregates the prediction results of multiple machine learning algorithms and finally indicates the probability of a fall that corresponds to a particular human being. The proposed ensemble algorithm yields 87.42% accuracy, which is greater than the accuracy provided by the KNN algorithm.

Download Full-text

A Comparative Study to Evaluate the Performance of Classification Algorithms in Mammogram Analysis

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.6.14960 ◽

2018 ◽

Vol 7 (3.6) ◽

pp. 154

Author(s):

S K. Sajan ◽

M Germanus Alex

Keyword(s):

Breast Cancer ◽

Neural Network ◽

Decision Tree ◽

Automated System ◽

Support Vector ◽

Classification Algorithms ◽

Neural Network Classifier ◽

Decision Tree Classifier ◽

Tree Classifier ◽

Mammogram Image

Breast cancer is a major threat humans are facing irrespective of geographical limits. The awareness about breast cancer has increased during the last decade and many preventive measures were in practice to detect the breast cancer before the symptoms were felt. Mammography is a screening methodology currently in practice. In this paper the mammogram image is analyzed using automated system. The automated system is designed to be capable of distinguishing the mammogram image into a normal or malignant. This process involves image enhancement and image segmentation at preprocessing level. Histogram equalization technique is used to transform low contrast region of the mammogram into region with higher contrast and Fuzzy C Means (FCM) algorithm is used to segment the mammogram image into regions suitable for further analysis. After enhancement and segmentation at preprocessing level the classification is done using three classification algorithms like decision tree classifier, Neural Network classifier and Support Vector Machine (SVM). The performance of the classification algorithms is evaluated using the following criteria like speed, flexibility, robustness, scalability, interpretability, Time complexity and also based on accuracy, sensitivity and specificity. The results obtained in classification are compared with other classification algorithms. It is found that the neural network classifier approach produces better results compared to other classifiers.The average accuracy in diagnosis by Neural Network approach classifier is around 91%. Also it is found that the decision tree approach is much flexible and easy to use compared to other approaches.

Download Full-text

Prediction of warning level in aircraft accidents using data mining techniques

The Aeronautical Journal ◽

10.1017/s0001924000009623 ◽

2014 ◽

Vol 118 (1206) ◽

pp. 935-952 ◽

Cited By ~ 6

Author(s):

A. B. Arockia Christopher ◽

S. Appavu alias Balamurugan

Keyword(s):

Data Mining ◽

Principal Components ◽

Information Gain ◽

Decision Tree Classifier ◽

Aircraft Accidents ◽

Analysis Process ◽

Tree Classifier ◽

Using Data ◽

Amount Of Knowledge ◽

Better Than

Abstract Data mining is a data analysis process which is designed for large amounts of data. It proposes a methodology for evaluating risk and safety and describes the main issues of aircraft accidents. We have a huge amount of knowledge and data collection in aviation companies. This paper focuses on different feature selectwindion techniques applied to the datasets of airline databases to understand and clean the dataset. CFS subset evaluator, consistency subset evaluator, gain ratio feature evaluator, information gain attribute evaluator, OneR attribute evaluator, principal components attribute transformer, ReliefF attribute evaluatoboundar and symmetrical uncertainty attribute evaluator are used in this study in order to reduce the number of initial attributes. The classification algorithms, such as DT, KNN, SVM, NN and NB, are used to predict the warning level of the component as the class attribute. We have explored the use of different classification techniques on aviation components data. For this purpose Weka software tools are used. This study also proves that the principal components attribute with decision tree classifier would perform better than other attributes and techniques on airline data. Accuracy is also very highly improved. This work may be useful for an aviation company to make better predictions. Some safety recommendations are also addressed to airline companies.

Download Full-text

Prediction of Students’ Performance based on Academic, Behaviour, Extra and Co-Curricular Activities

Webology ◽

10.14704/web/v18si01/web18058 ◽

2021 ◽

Vol 18 (Special Issue 01) ◽

pp. 262-279

Author(s):

T. Jenitha ◽

S. Santhi ◽

J. Monisha Privthy Jeba

Keyword(s):

Extracurricular Activities ◽

Family Background ◽

Machine Learning Algorithms ◽

Support Vector ◽

Academic Institutions ◽

Physical And Mental Health ◽

Decision Tree Classifier ◽

Academic Scholarship ◽

Tree Classifier ◽

Training Programmes

Since Academic institutions contain huge volume of data regarding students such as academic scores, scores in co and extracurricular activities, family annual income, family background and other supporting documents, predicting individual students performance in all aspects manually is a difficult task. The proposed work uses data mining techniques to identify students who are eligible for scholarships and other benefits. Students are classified into different categories by means of academic, behavior, extra and co-curricular activities. Machine Learning algorithms such as Naive Bayes, Decision Tree Classifier and Support Vector Machine are used for predicting the performance of the student. With the help of this proposed model parents and instructors can monitor student’s performance and they can also provide essential technical and moral support. Also this helps in providing academic scholarship and training to the students to support them financially and to enrich their knowledge. It suggests the Academic Institutions to organize induction or training programmes at the beginning of the semester. Technical training, motivational talks, Yoga, etc are organized by the institutions by keeping in mind of students physical and mental health. Considering the e-learning platforms huge volumes of data and plethora of information are generated. In this work, various learning models are constructed and their accuracies are compared to analyse which algorithm out-performs.

Download Full-text