Abstract
Stemming has long been used in data pre-processing in information retrieval, which aims to make affix words into root words. However, there are not many stemming methods for non-formal Indonesian text processing. The existing stemming method has high accuracy for formal Indonesian, but low for non-formal Indonesian. Thus, the stemming method which has high accuracy for non-formal Indonesian classifier model is still an open-ended challenge. This study introduces a new stemming method to solve problems in the non-formal Indonesian text data pre-processing. Furthermore, this study aims to provide comprehensive research on improving the accuracy of text classifier models by strengthening on stemming method. Using the Support Vector Machine algorithm, a text classifier model is developed, and its accuracy is checked. The experimental evaluation was done by testing 550 datasets in Indonesian using two different stemming methods. The results show that using the proposed stemming method, the text classifier model has higher accuracy than the existing methods with a score of 0.85 and 0.73, respectively. In the future, the proposed stemming method can be used to develop the Indonesian text classifier model which can be used for various purposes including text clustering, summarization, detecting hate speech, and other text processing applications.