A machine learning approach for Arabic text classification using N-gram frequency statistics

Journal of Informetrics ◽

10.1016/j.joi.2008.11.005 ◽

2009 ◽

Vol 3 (1) ◽

pp. 72-77 ◽

Author(s):

Laila Khreisat

Keyword(s):

Machine Learning ◽

Text Classification ◽

Learning Approach ◽

Arabic Text ◽

Machine Learning Approach ◽

N Gram ◽

Arabic Text Classification

Download Full-text

Machine Learning Algorithms in Arabic Text Classification: A Review

2019 12th International Conference on Developments in eSystems Engineering (DeSE) ◽

10.1109/dese.2019.00061 ◽

2019 ◽

Author(s):

Sara A. Aboalnaser

Keyword(s):

Machine Learning ◽

Text Classification ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Arabic Text ◽

Arabic Text Classification

Download Full-text

Machine Learning Approach for Text Classification in Cybercrime

2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA) ◽

10.1109/iccubea.2018.8697442 ◽

2018 ◽

Author(s):

Swati Kumari ◽

Zia Saquib ◽

Sanjay Pawar

Keyword(s):

Machine Learning ◽

Text Classification ◽

Learning Approach ◽

Machine Learning Approach

Download Full-text

Extracting and reusing blocks of knowledge in learning classifier systems for text classification: a lifelong machine learning approach

Soft Computing ◽

10.1007/s00500-019-03819-5 ◽

2019 ◽

Vol 23 (23) ◽

pp. 12673-12682

Author(s):

Muhammad Hassan Arif ◽

Muhammad Iqbal ◽

Jianxin Li

Keyword(s):

Machine Learning ◽

Text Classification ◽

Learning Classifier Systems ◽

Learning Approach ◽

Classifier Systems ◽

Learning Classifier ◽

Machine Learning Approach

Download Full-text

Classification of sentiment reviews using n-gram machine learning approach

Expert Systems with Applications ◽

10.1016/j.eswa.2016.03.028 ◽

2016 ◽

Vol 57 ◽

pp. 117-126 ◽

Author(s):

Abinash Tripathy ◽

Ankit Agrawal ◽

Santanu Kumar Rath

Keyword(s):

Machine Learning ◽

Learning Approach ◽

Machine Learning Approach ◽

Download Full-text

Comparison of Pre-Trained Word Vectors for Arabic Text Classification Using Deep Learning Approach

2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) ◽

10.1109/icmla.2018.00239 ◽

2018 ◽

Author(s):

Ali Alwehaibi ◽

Kaushik Roy

Keyword(s):

Deep Learning ◽

Text Classification ◽

Learning Approach ◽

Arabic Text ◽

Arabic Text Classification

Download Full-text

HMATC: Hierarchical multi-label Arabic text classification model using machine learning

Egyptian Informatics Journal ◽

10.1016/j.eij.2020.08.004 ◽

2020 ◽

Author(s):

Nawal Aljedani ◽

Reem Alotaibi ◽

Mounira Taileb

Keyword(s):

Machine Learning ◽

Text Classification ◽

Classification Model ◽

Arabic Text ◽

Arabic Text Classification

Download Full-text

Scalable Arabic text Classification Using Machine Learning Model

2021 12th International Conference on Information and Communication Systems (ICICS) ◽

10.1109/icics52457.2021.9464566 ◽

2021 ◽

Author(s):

Rahaf M. AL Mgheed

Keyword(s):

Machine Learning ◽

Text Classification ◽

Learning Model ◽

Arabic Text ◽

Machine Learning Model ◽

Arabic Text Classification

Download Full-text

A Two-Stage Machine learning approach for temporally-robust text classification

Information Systems ◽

10.1016/j.is.2017.04.004 ◽

2017 ◽

Vol 69 ◽

pp. 40-58 ◽

Author(s):

Thiago Salles ◽

Leonardo Rocha ◽

Fernando Mourão ◽

Marcos Gonçalves ◽

Felipe Viegas ◽

...

Keyword(s):

Machine Learning ◽

Text Classification ◽

Learning Approach ◽

Two Stage ◽

Machine Learning Approach

Download Full-text

A Supervised Machine Learning Approach for Post-OCR Error Detection for Historical Text

10.3384/ecp184170 ◽

2021 ◽

Author(s):

Dana Dannélls ◽

Shafqat Virk

Keyword(s):

Machine Learning ◽

Error Detection ◽

Character Recognition ◽

Optical Character Recognition ◽

Learning Algorithm ◽

Supervised Machine Learning ◽

Learning Approach ◽

Machine Learning Approach ◽

N Gram ◽

Traditional Approaches

Training machine learning models with high accuracy requires careful feature engineering, which involves finding the best feature combinations and extracting their values from the data. The task becomes extremely laborious for specific problems such as post Optical Character Recognition (OCR) error detection because of the diversity of errors in the data. In this paper we present a machine learning approach which exploits character n-gram statistics as the only feature for the OCR error detection task. Our method achieves a significant improvement over the baseline reaching state-of-the-art results of 91% and 89% F1 score on English and Swedish datasets respectively. We report various experiments to select the appropriate machine learning algorithm and to compare our approach to previously reported traditional approaches.

Download Full-text

Content-based Spam Email Detection Using N-gram Machine Learning Approach

10.20944/preprints202109.0236.v1 ◽

2021 ◽

Author(s):

Syed Md. Minhaz Hossain ◽

Iqbal H. Sarker

Keyword(s):

Machine Learning ◽

Support Vector ◽

Learning Approach ◽

Ve Bayes ◽

Spam Filter ◽

Email Address ◽

Machine Learning Approach ◽

Spam Filters ◽

N Gram ◽

Machine Learning Models

Recently, spam emails have become a significant problem with the expanding usage of the Internet. It is to some extend obvious to filter emails. A spam filter is a system that detects undesired and malicious emails and blocks them from getting into the users' inboxes. Spam filters check emails for something "suspicious" in terms of text, email address, header, attachments, and language. However, we have used different features such as word2vec, word n-grams, character n-grams, and a combination of variable length n-grams for comparative analysis in our proposed approach. Different machine learning models such as support vector machine (SVM), decision tree (DT), logistic regression (LR), and multinomial naïve bayes (MNB) are applied to train the extracted features. We use different evaluation metrics such as precision, recall, f1-score, and accuracy to evaluate the experimental results. Among them, SVM provides 97.6 \% of accuracy, 98.8\% of precision, and 94.9\% of f1-score using a combination of n-gram features.

Download Full-text