An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms

In this paper, an experimental study was conducted on three techniques for Arabic text classification. These techniques are Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO), Naïve Bayesian (NB), and J48. The paper assesses the accuracy for each classifier and determines which classifier is more accurate for Arabic text classification based on stop words elimination. The accuracy for each classifier is measured by Percentage split method (holdout), and K-fold cross validation methods, along with the time needed to classify Arabic text. The results show that the SMO classifier achieves the highest accuracy and the lowest error rate, and shows that the time needed to build the SMO model is much lower compared to other classification techniques.

Download Full-text

Hadith Arabic Text Classification Using Convolutional Neural Network and Support Vector Machine

Lecture Notes in Electrical Engineering - Computational Science and Technology ◽

10.1007/978-981-33-4069-5_30 ◽

2021 ◽

pp. 371-382

Author(s):

Irwan Mazlin ◽

Izani Mohamed Rawi ◽

Zaki Zakaria

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Convolutional Neural Network ◽

Text Classification ◽

Support Vector ◽

Arabic Text ◽

Arabic Text Classification

Download Full-text

The Effect of Stemming on Arabic Text Classification

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2011070104 ◽

2011 ◽

Vol 1 (3) ◽

pp. 54-70 ◽

Cited By ~ 11

Author(s):

Abdullah Wahbeh ◽

Mohammed Al-Kabi ◽

Qasem Al-Radaideh ◽

Emad Al-Shawakfa ◽

Izzat Alsmadi

Keyword(s):

Text Classification ◽

Digital Libraries ◽

Arabic Language ◽

Support Vector ◽

Svm Classifier ◽

Arabic Text ◽

Text Documents ◽

Information Retrieval Systems ◽

Arabic Text Classification ◽

The Web

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.

Download Full-text

An experimental study for Arabic text classification techniques

10.1117/12.946039 ◽

2012 ◽

Author(s):

Bassam Al-Shargabi ◽

Fekry Olayah

Keyword(s):

Experimental Study ◽

Text Classification ◽

Arabic Text ◽

Classification Techniques ◽

Arabic Text Classification

Download Full-text

The Effect of Stemming on Arabic Text Classification

Information Retrieval Methods for Multidisciplinary Applications ◽

10.4018/978-1-4666-3898-3.ch013 ◽

2013 ◽

pp. 207-225 ◽

Cited By ~ 3

Author(s):

Abdullah Wahbeh ◽

Mohammed Al-Kabi ◽

Qasem Al-Radaideh ◽

Emad Al-Shawakfa ◽

Izzat Alsmadi

Keyword(s):

Text Classification ◽

Digital Libraries ◽

Arabic Language ◽

Support Vector ◽

Svm Classifier ◽

Arabic Text ◽

Text Documents ◽

Information Retrieval Systems ◽

Arabic Text Classification ◽

The Web

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.

Download Full-text

Improving Arabic Text Classification Using P-Stemmer

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904114023 ◽

2020 ◽

Vol 13 ◽

Author(s):

Tarek Kanan ◽

Bilal Hawashin ◽

Shadi Alzubi ◽

Eyad Almaita ◽

Ahmad Alkhatib ◽

...

Keyword(s):

Language Processing ◽

Text Classification ◽

Text Categorization ◽

English Language ◽

Arabic Language ◽

Online News ◽

Support Vector ◽

Arabic Text ◽

Fast Learning ◽

Arabic Text Classification

Introduction: Stemming is an important preprocessing step in text classification, and could contribute in increasing text classification accuracy. Although many works proposed stemmers for English language, few stemmers were proposed for Arabic text. Arabic language has gained increasing attention in the previous decades and the need is vital to further improve Arabic text classification. Method: This work combined the use of the recently proposed P-Stemmer with various classifiers to find the optimal classifier for the P-stemmer in term of Arabic text classification. As part of this work, a synthesized dataset was collected. Result: The previous experiments show that the use of P-Stemmer has a positive effect on classification. The degree of improvement was classifier-dependent, which is reasonable as classifiers vary in their methodologies. Moreover, the experiments show that the best classifier with the P-Stemmer was NB. This is an interesting result as this classifier is wellknown for its fast learning and classification time. Discussion: First, the continuous improvement of the P-Stemmer by more optimization steps is necessary to further improve the Arabic text categorization. This can be made by combining more classifiers with the stemmer, by optimizing the other natural language processing steps, and by improving the set of stemming rules. Second, the lack of sufficient Arabic datasets, especially large ones, is still an issue. Conclusion: In this work, an improved P-Stemmer was proposed by combining its use with various classifiers. In order to evaluate its performance, and due to the lack of Arabic datasets, a novel Arabic dataset was synthesized from various online news pages. Next, the P-Stemmer was combined with Naïve Bayes, Random Forest, Support Vector Machines, KNearest Neighbor, and K-Star.

Download Full-text

Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

International Journal of Computational Physics Series ◽

10.29167/a1i1p120-130 ◽

2018 ◽

Vol 1 (1) ◽

pp. 120-130 ◽

Cited By ~ 1

Author(s):

Chunxiang Qian ◽

Wence Kang ◽

Hao Ling ◽

Hua Dong ◽

Chengyao Liang ◽

...

Keyword(s):

Support Vector Machine ◽

Environmental Factors ◽

Cross Validation ◽

Concrete Strength ◽

Simulation Method ◽

Support Vector ◽

Svm Model ◽

Artificial Neural Network Ann ◽

Influence Degree ◽

Fold Cross Validation

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.

Download Full-text