Improving Text Analysis Using Sentence Conjunctions and Punctuation
User-generated content in the form of customer reviews, blogs, and tweets is an emerging and rich source of data for marketers. Topic models have been successfully applied to such data, demonstrating that empirical text analysis benefits greatly from a latent variable approach that summarizes high-level interactions among words. We propose a new topic model that allows for serial dependency of topics in text. That is, topics may carry over from word to word in a document, violating the bag-of-words assumption in traditional topic models. In the proposed model, topic carryover is informed by sentence conjunctions and punctuation. Typically, such observed information is eliminated prior to analyzing text data (i.e., preprocessing) because words such as “and” and “but” do not differentiate topics. We find that these elements of grammar contain information relevant to topic changes. We examine the performance of our models using multiple data sets and establish boundary conditions for when our model leads to improved inference about customer evaluations. Implications and opportunities for future research are discussed.