A Comparison of Text Classification Techniques Applied to Indonesian Text Dataset
In organization, statement contained opinion and complaint to a service or program by it organization. can be proceed using machine learning and the result can be used by organization to improve and enhance their quality. This research attempted to classify the reports from social media based on complaint and non-complaint using machine learning algorithm named Logistic regression (LR) and eXtreme Gradient Boosting (XGBoost). Logistic Regression model using CountVectorizer feature extraction and TfidfVectorizer. Moreover, the XGBoost algorithm uses multiple parameters so that it can be improved by tuning the parameters, i.e. eta or learning rate, gamma, max_depth, min_child_weight, subsample, colsample_bytree and alpha. As the result, the best value for XGBoost with parameter are 'reg_alpha': 0.01, 'colsample_bytree': 0.9, 'learning_rate': 0.5, 'min_child_weight': 1, 'subsample': 0.8, 'max_depth': 3, 'gamma': 0.0, in wich the computational time is 13870.012468 and the best accuracy that achieved is 0.927943760984. Furthermore, the performance evaluation results for Logistic Regression using TfidfVectorizer and CountVectorizer feature extraction are 0.9181 and 0.9356.