Intent Identification in Unattended Customer Queries Using an Unsupervised Approach
Customer’s satisfaction is crucial for companies worldwide. An integrated strategy composes omnichannel communication systems, in which chabot is widely used. This system is supervised, and the key point is that the required training data are originally unlabelled. Labelling data manually is unfeasible mainly nowadays due to the considerable volume. Moreover, customer behaviour is often hidden in the data even for experts. This work proposes a methodology to find unknown entities and intents automatically using unsupervised learning. This is based on natural language processing (NLP) for text data preparation and on machine learning (ML) for clustering model identification. Several combinations for preprocessing, vectorisation, dimensionality reduction and clustering techniques, were investigated. The case study refers to a Brazilian electric energy company, with a data set of failed customer queries, that is, not met by the company for any reason. They correspond to about 30% (4,044 queries) of the original data set. The best identified intent model employed stemming for preprocessing, word frequency analysis for vectorisation, latent Dirichlet allocation (LDA) for dimensionality reduction, and mini-batch [Formula: see text]-means for clustering. This system was able to allocate 62% of the failed queries in one of the seven found intents. For instance, this new labelled data can be used for the training of NLP-based chatbots contributing to a greater generalisation capacity, and ultimately, to increase customer satisfaction.