Improving Text Analysis Using Sentence Conjunctions and Punctuation

User-generated content in the form of customer reviews, blogs, and tweets is an emerging and rich source of data for marketers. Topic models have been successfully applied to such data, demonstrating that empirical text analysis benefits greatly from a latent variable approach that summarizes high-level interactions among words. We propose a new topic model that allows for serial dependency of topics in text. That is, topics may carry over from word to word in a document, violating the bag-of-words assumption in traditional topic models. In the proposed model, topic carryover is informed by sentence conjunctions and punctuation. Typically, such observed information is eliminated prior to analyzing text data (i.e., preprocessing) because words such as “and” and “but” do not differentiate topics. We find that these elements of grammar contain information relevant to topic changes. We examine the performance of our models using multiple data sets and establish boundary conditions for when our model leads to improved inference about customer evaluations. Implications and opportunities for future research are discussed.

Download Full-text

A latent variable approach to measuring wartime sexual violence

Journal of Peace Research ◽

10.1177/0022343320961147 ◽

2020 ◽

Vol 57 (6) ◽

pp. 728-739

Author(s):

Jule Krüger ◽

Ragnhild Nordås

Keyword(s):

Sexual Violence ◽

Latent Variable ◽

Best Practice ◽

Temporal Trends ◽

Credible Interval ◽

Future Research ◽

Variable Model ◽

Multiple Indicators ◽

Variable Approach ◽

Security Problem

Conflict-related sexual violence is an international security problem and is sometimes used as a weapon of war. It is also a complex and hard-to-observe phenomenon, constituting perhaps one of the most hidden forms of wartime violence. Latent variable models (LVM) offer a promising avenue to account for differences in observed measures. Three annual human rights sources report on the sexual violence practices of armed conflict actors around the world since 1989 and were coded into ordinal indicators of conflict-year prevalence. Because information diverges significantly across these measures, we currently have a poor scientific understanding with regard to trends and patterns of the problem. In this article, we use an LVM approach to leverage information across multiple indicators of wartime sexual violence to estimate its true extent, to express uncertainty in the form of a credible interval, and to account for temporal trends in the underlying data. We argue that a dynamic LVM parametrization constitutes the best fit in this context. It outperforms a static latent variable model, as well as analysis of observed indicators. Based on our findings, we argue that an LVM approach currently constitutes the best practice for this line of inquiry and conclude with suggestions for future research.

Download Full-text

Stimulating critical thinking in a virtual learning community with instructor moderations and peer reviews

Knowledge Management & E-Learning: An International Journal ◽

10.34105/j.kmel.2011.03.036 ◽

2011 ◽

pp. 534-547

Keyword(s):

Critical Thinking ◽

Learning Community ◽

Online Teaching ◽

Teaching And Learning ◽

Virtual Learning ◽

Future Research ◽

Virtual Learning Community ◽

Peer Reviews ◽

Multiple Data ◽

Multiple Data Sets

This mixed methods study investigated the dynamic impacts of instructor moderations and peer reviews on critical thinking (CT) in a virtual learning community. Multiple data sets were collected from online discourses, participants’ written reflections, and learning artifacts, and analyzed and triangulated with both quantitative and qualitative methods. Both instructor moderations and peer reviews had great impacts on learner’s CT in multiple ways, and stimulated CT development throughout the semester. As learners grew with more CT skills, the needs for instructor moderations decreased; yet peer reviews peaked in terms of quantity, length, and depth of discussions. Peer reviews in this study also demonstrated effective questioning patterns, which were positively accepted by students being questioned or criticized, and resulted in changes and improvements in the final learning artifacts. Practical implications for online teaching and learning and community building are discussed, together with suggestions for future research.

Download Full-text

Method of Moments for Topic Models with Mixed Discrete and Continuous Features

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/333 ◽

2021 ◽

Author(s):

Joachim Giesen ◽

Paul Kahlmeyer ◽

Sören Laue ◽

Matthias Mitterreiter ◽

Frank Nussbaum ◽

...

Keyword(s):

Method Of Moments ◽

Latent Variable ◽

Latent Dirichlet Allocation ◽

Latent Class ◽

Topic Model ◽

Likelihood Function ◽

Natural Extension ◽

Topic Models ◽

Discrete Variables ◽

Continuous State

Topic models are characterized by a latent class variable that represents the different topics. Traditionally, their observable variables are modeled as discrete variables like, for instance, in the prototypical latent Dirichlet allocation (LDA) topic model. In LDA, words in text documents are encoded by discrete count vectors with respect to some dictionary. The classical approach for learning topic models optimizes a likelihood function that is non-concave due to the presence of the latent variable. Hence, this approach mostly boils down to using search heuristics like the EM algorithm for parameter estimation. Recently, it was shown that topic models can be learned with strong algorithmic and statistical guarantees through Pearson's method of moments. Here, we extend this line of work to topic models that feature discrete as well as continuous observable variables (features). Moving beyond discrete variables as in LDA allows for more sophisticated features and a natural extension of topic models to other modalities than text, like, for instance, images. We provide algorithmic and statistical guarantees for the method of moments applied to the extended topic model that we corroborate experimentally on synthetic data. We also demonstrate the applicability of our model on real-world document data with embedded images that we preprocess into continuous state-of-the-art feature vectors.

Download Full-text

Career Break, Not a Brake on Career: A Study of the Reasons and Enablers of Women’s Re-entry to Technology Careers in India

Business Perspectives and Research ◽

10.1177/2278533720964328 ◽

2020 ◽

pp. 227853372096432

Author(s):

Swati Singh ◽

Sita Vanka

Keyword(s):

Text Analysis ◽

Women Professionals ◽

Future Research ◽

Analysis Software ◽

Work Centrality ◽

Technology Sector ◽

Future Research Agenda ◽

Collection Data ◽

Unexplored Area ◽

High Level

Career re-entry of women in the technology sector remains an unexplored area. With the increasing focus of information technology (IT) organisations to attract, retain and promote women at the workplace, career re-entry among women professionals’ merits attention. The purpose of this study is to investigate the reasons and enablers of career re-entry among women who plan a re-entry in the IT sector in India. This study employed a qualitative research method and used interviews as a tool for data collection. Data collected through the interviews of re-entry women ( n = 28) was analysed with the help of qualitative analysis software ATLAS.ti. Further, text analysis was also performed through Voyant tools. Findings suggest that a strong career identity, a high level of work centrality and an urge to regain financial independence motivated women to return to IT careers. Findings revealed seven distinct enablers of career re-entry. Based on this finding, a model of the support ecosystem is discussed that presents an intricate relationship between the enablers of career re-entry, support ecosystem and career resumption. Moreover, findings indicate that an active agency of women, a support ecosystem and favourable life events lead to career re-entry. Managerial and theoretical implications of findings are discussed. The article concludes with limitations and future research agenda.

Download Full-text

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

10.31219/osf.io/cuxha ◽

2017 ◽

Cited By ~ 1

Author(s):

Erik de Vries ◽

Martijn Schoonvelde ◽

Gijs Schumacher

Keyword(s):

Machine Translation ◽

Text Analysis ◽

Gold Standard ◽

Topic Model ◽

Topic Models ◽

Bag Of Words ◽

Text Corpora ◽

Automated Text Analysis ◽

Lost In Translation

Automated text analysis allows researchers to analyze large quantities of text. Yet, comparative researchers are presented with a big challenge: across countries people speak different languages. To address this issue, some analysts have suggested using Google Translate to convert all texts into English before starting the analysis (Lucas et al., 2015). But in doing so, do we get lost in translation? This paper evaluates the usefulness of machine translation for bag-of-words models – such as topic models. We use the europarl dataset and compare term-document matrices as well as topic model results from gold standard translated text and machine-translated text. We evaluate results at both the document and the corpus level. We first find term-document matrices for both text corpora to be highly similar, with significant but minor differences across languages. What is more, we find considerable overlap in the set of features generated from human-translated and machine-translated texts. With regards to LDA topic models, we find topical prevalence and topical content to be highly similar with only small differences across languages. We conclude that Google Translate is a useful tool for comparative researchers when using bag-of-words text models.

Download Full-text

No Longer Lost in Translation: Evidence that Google Translate Works for Comparative Bag-of-Words Text Applications

Political Analysis ◽

10.1017/pan.2018.26 ◽

2018 ◽

Vol 26 (4) ◽

pp. 417-430 ◽

Cited By ~ 29

Author(s):

Erik de Vries ◽

Martijn Schoonvelde ◽

Gijs Schumacher

Keyword(s):

Machine Translation ◽

Text Analysis ◽

Gold Standard ◽

Topic Model ◽

Topic Models ◽

Bag Of Words ◽

Text Corpora ◽

Automated Text Analysis ◽

Lost In Translation

Automated text analysis allows researchers to analyze large quantities of text. Yet, comparative researchers are presented with a big challenge: across countries people speak different languages. To address this issue, some analysts have suggested using Google Translate to convert all texts into English before starting the analysis (Lucas et al. 2015). But in doing so, do we get lost in translation? This paper evaluates the usefulness of machine translation for bag-of-words models—such as topic models. We use the europarl dataset and compare term-document matrices (TDMs) as well as topic model results from gold standard translated text and machine-translated text. We evaluate results at both the document and the corpus level. We first find TDMs for both text corpora to be highly similar, with minor differences across languages. What is more, we find considerable overlap in the set of features generated from human-translated and machine-translated texts. With regard to LDA topic models, we find topical prevalence and topical content to be highly similar with again only small differences across languages. We conclude that Google Translate is a useful tool for comparative researchers when using bag-of-words text models.

Download Full-text

Power of Modified Brown-Forsythe and Mixed-Model Approaches in Split-Plot Designs

Methodology ◽

10.1027/1614-2241/a000124 ◽

2017 ◽

Vol 13 (1) ◽

pp. 9-22 ◽

Cited By ~ 1

Author(s):

Pablo Livacic-Rojas ◽

Guillermo Vallejo ◽

Paula Fernández ◽

Ellián Tuero-Herrero

Keyword(s):

Repeated Measures ◽

Statistical Power ◽

Mixed Model ◽

Covariance Structure ◽

Simulation Method ◽

Future Research ◽

Repeated Measures Design ◽

Fixed And Random Effects ◽

Split Plot ◽

High Level

Abstract. Low precision of the inferences of data analyzed with univariate or multivariate models of the Analysis of Variance (ANOVA) in repeated-measures design is associated to the absence of normality distribution of data, nonspherical covariance structures and free variation of the variance and covariance, the lack of knowledge of the error structure underlying the data, and the wrong choice of covariance structure from different selectors. In this study, levels of statistical power presented the Modified Brown Forsythe (MBF) and two procedures with the Mixed-Model Approaches (the Akaike’s Criterion, the Correctly Identified Model [CIM]) are compared. The data were analyzed using Monte Carlo simulation method with the statistical package SAS 9.2, a split-plot design, and considering six manipulated variables. The results show that the procedures exhibit high statistical power levels for within and interactional effects, and moderate and low levels for the between-groups effects under the different conditions analyzed. For the latter, only the Modified Brown Forsythe shows high level of power mainly for groups with 30 cases and Unstructured (UN) and Autoregressive Heterogeneity (ARH) matrices. For this reason, we recommend using this procedure since it exhibits higher levels of power for all effects and does not require a matrix type that underlies the structure of the data. Future research needs to be done in order to compare the power with corrected selectors using single-level and multilevel designs for fixed and random effects.

Download Full-text

Distinguishing Adequate Versus Inadequate Cancer Health Literacy Levels: A Discrete Latent Variable Approach

PsycEXTRA Dataset ◽

10.1037/e581432013-001 ◽

2013 ◽

Author(s):

Levent Dumenci ◽

Robin Matsuyama ◽

Robert Perera ◽

Laura Kuhn ◽

Laura Siminoff

Keyword(s):

Health Literacy ◽

Latent Variable ◽

Variable Approach ◽

Latent Variable Approach

Download Full-text

Heterogeneous Effects of the De Jure and De Facto Business Environment: Findings from Multiple Data Sets on the Business Environment

10.1596/1813-9450-9115 ◽

2020 ◽

Author(s):

Christine Zhenwei Qiang ◽

He Wang ◽

L. Colin Xu

Keyword(s):

Business Environment ◽

Data Sets ◽

Multiple Data ◽

Heterogeneous Effects ◽

Multiple Data Sets

Download Full-text

Executive Functions and Metacognitive Monitoring Are Not Interchangeable in Educational Settings: Their Shared and Unique Contribution to Academic Outcomes

10.31234/osf.io/4jhnz ◽

2019 ◽

Author(s):

Rina PY Lai ◽

Michelle Renee Ellefson ◽

Claire Hughes

Keyword(s):

Executive Functions ◽

Academic Outcomes ◽

Cognitive Skills ◽

Latent Variable ◽

Unique Contribution ◽

Metacognitive Monitoring ◽

Cognitive Predictors ◽

Domain Specific ◽

Variable Approach ◽

Latent Variable Approach

Executive functions and metacognition are two cognitive predictors with well-established connections to academic performance. Despite sharing several theoretical characteristics, their overlap or independence concerning multiple academic outcomes remain under-researched. To address this gap, the present study applies a latent-variable approach to test a novel theoretical model that delineates the structural link between executive functions, metacognition, and academic outcomes. In whole-class sessions, 469 children aged 9 to 14 years (M = 11.93; SD = 0.92) completed four computerized executive function tasks (inhibition, working memory, cognitive flexibility, and planning), a self-reported metacognitive monitoring questionnaire, and three standardized tests of academic ability. The results suggest that executive functions and metacognitive monitoring are not interchangeable in the educational context and that they have both shared and unique contributions to diverse academic outcomes. The findings are important for elucidating the role between two domain-general cognitive skills (executive functions and metacognition) and domain-specific academic skills.

Download Full-text