Evaluation of Synthetic Datasets Generation for Intent Classification Tasks in Portuguese
A chatbot is an artificial intelligence based system aimed at chatting with users, commonly used as a virtual assistant to help people or answer questions. Intent classification is an essential task for chatbots where it aims to identify what the user wants in a certain dialogue. However, for many domains, little data are available to properly train those systems. In this work, we evaluate the performance of two methods to generate synthetic data for chatbots, one based on template questions and another based on neural text generation. We build four datasets that are used training chatbot components in the intent classification task. We intend to simulate the task of migrating a search-based portal to an interactive dialogue-based information service by using artificial datasets for initial model training. Our results show that template-based datasets are slightly superior to those neural-based generated in our application domain, however, neural-generated present good results and they are a viable option when one has limited access to domain experts to hand-code text templates.