An Experimental Study for the Effect of Stop Words Elimination for Arabic Text Classification Algorithms

Author(s):  
Bassam Al-Shargabi ◽  
Fekry Olayah ◽  
Waseem AL Romimah

In this paper, an experimental study was conducted on three techniques for Arabic text classification. These techniques are Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO), Naïve Bayesian (NB), and J48. The paper assesses the accuracy for each classifier and determines which classifier is more accurate for Arabic text classification based on stop words elimination. The accuracy for each classifier is measured by Percentage split method (holdout), and K-fold cross validation methods, along with the time needed to classify Arabic text. The results show that the SMO classifier achieves the highest accuracy and the lowest error rate, and shows that the time needed to build the SMO model is much lower compared to other classification techniques.

Author(s):  
Bassam Al-Shargabi ◽  
Fekry Olayah ◽  
Waseem AL Romimah

In this paper, an experimental study was conducted on three techniques for Arabic text classification. These techniques are Support Vector Machine (SVM) with Sequential Minimal Optimization (SMO), Naïve Bayesian (NB), and J48. The paper assesses the accuracy for each classifier and determines which classifier is more accurate for Arabic text classification based on stop words elimination. The accuracy for each classifier is measured by Percentage split method (holdout), and K-fold cross validation methods, along with the time needed to classify Arabic text. The results show that the SMO classifier achieves the highest accuracy and the lowest error rate, and shows that the time needed to build the SMO model is much lower compared to other classification techniques.


2011 ◽  
Vol 1 (3) ◽  
pp. 54-70 ◽  
Author(s):  
Abdullah Wahbeh ◽  
Mohammed Al-Kabi ◽  
Qasem Al-Radaideh ◽  
Emad Al-Shawakfa ◽  
Izzat Alsmadi

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.


Author(s):  
Abdullah Wahbeh ◽  
Mohammed Al-Kabi ◽  
Qasem Al-Radaideh ◽  
Emad Al-Shawakfa ◽  
Izzat Alsmadi

The information world is rich of documents in different formats or applications, such as databases, digital libraries, and the Web. Text classification is used for aiding search functionality offered by search engines and information retrieval systems to deal with the large number of documents on the web. Many research papers, conducted within the field of text classification, were applied to English, Dutch, Chinese, and other languages, whereas fewer were applied to Arabic language. This paper addresses the issue of automatic classification or classification of Arabic text documents. It applies text classification to Arabic language text documents using stemming as part of the preprocessing steps. Results have showed that applying text classification without using stemming; the support vector machine (SVM) classifier has achieved the highest classification accuracy using the two test modes with 87.79% and 88.54%. On the other hand, stemming has negatively affected the accuracy, where the SVM accuracy using the two test modes dropped down to 84.49% and 86.35%.


Author(s):  
Tarek Kanan ◽  
Bilal Hawashin ◽  
Shadi Alzubi ◽  
Eyad Almaita ◽  
Ahmad Alkhatib ◽  
...  

Introduction: Stemming is an important preprocessing step in text classification, and could contribute in increasing text classification accuracy. Although many works proposed stemmers for English language, few stemmers were proposed for Arabic text. Arabic language has gained increasing attention in the previous decades and the need is vital to further improve Arabic text classification. Method: This work combined the use of the recently proposed P-Stemmer with various classifiers to find the optimal classifier for the P-stemmer in term of Arabic text classification. As part of this work, a synthesized dataset was collected. Result: The previous experiments show that the use of P-Stemmer has a positive effect on classification. The degree of improvement was classifier-dependent, which is reasonable as classifiers vary in their methodologies. Moreover, the experiments show that the best classifier with the P-Stemmer was NB. This is an interesting result as this classifier is wellknown for its fast learning and classification time. Discussion: First, the continuous improvement of the P-Stemmer by more optimization steps is necessary to further improve the Arabic text categorization. This can be made by combining more classifiers with the stemmer, by optimizing the other natural language processing steps, and by improving the set of stemming rules. Second, the lack of sufficient Arabic datasets, especially large ones, is still an issue. Conclusion: In this work, an improved P-Stemmer was proposed by combining its use with various classifiers. In order to evaluate its performance, and due to the lack of Arabic datasets, a novel Arabic dataset was synthesized from various online news pages. Next, the P-Stemmer was combined with Naïve Bayes, Random Forest, Support Vector Machines, KNearest Neighbor, and K-Star.


2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


Sign in / Sign up

Export Citation Format

Share Document