scholarly journals Creating and Testing Specialized Dictionaries for Text Analysis

2019 ◽  
Vol 6 (1) ◽  
pp. 65-75 ◽  
Author(s):  
Роман Тарабань ◽  
Джесіка Піттман ◽  
Талін Налбандян ◽  
Winson Fu Zun Yang ◽  
Вільям Марсі ◽  
...  

Practitioners in many domains–e.g., clinical psychologists, college instructors, researchers–collect written responses from clients. A well-developed method that has been applied to texts from sources like these is the computer application Linguistic Inquiry and Word Count (LIWC). LIWC uses the words in texts as cues to a person’s thought processes, emotional states, intentions, and motivations. In the present study, we adopt analytic principles from LIWC and develop and test an alternative method of text analysis using naïve Bayes methods. We further show how output from the naïve Bayes analysis can be used for mark up of student work in order to provide immediate, constructive feedback to students and instructors. References Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993-1022. Boot, P., Zijlstra, H., & Geenen, R. (2017). The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary. Dutch Journal of Applied Linguistics, 6(1), 65-76. Chung, C. K., & Pennebaker, J. W. (2008). Revealing dimensions of thinking in open-ended self-descriptions: An automated meaning extraction method for natural language. Journal of research in personality, 42(1), 96-132. Hsieh, H-F., & Shannon, S. E. (2005).Three approaches to qualitative content analysis. Qualitative health research, 15(9), 277-1288. Kintsch, W. (1998). Comprehension: A paradigm for cognition. New York: Cambridge University Press. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic ana­lysis. Discourse processes, 25(2-3), 259-284. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior Research Methods, Instruments, & Computers, 28(2), 203-208. Massó, G., Lambert, P., Penagos, C. R., & Saurí, R. (2013, December). Generating New LIWC Dictionaries by Triangulation. In Asia Information Retrieval Symposium (pp. 263-271). Springer, Berlin, Heidelberg. Newman, M., Groom, C.J., Handelman, L.D., & Pennebaker, J.W. (2008). Gender differences in language use: An analysis of 14,000 text samples. Discourse Processes, 45(3), 211-236. Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC 2015. Austin, TX: University of Texas at Austin. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology, 29(1), 24-54. Van Wissen, L., & Boot, P. (2017, September). An Electronic Translation of the LIWC Dictionary into Dutch. In: Electronic lexicography in the 21st century: Proceedings of eLex 2017 Conference. (pp. 703-715). Lexical Computing.

2019 ◽  
Vol 6 (2) ◽  
pp. 107-118
Author(s):  
Roman Taraban ◽  
Abusal Khaleel

Machine methods for automatically analyzing text have been investigated for decades. Yet the availability and usability of these methods for classifying and scoring specialized essays in small samples–as is typical for ordinary coursework–remains unclear. In this paper we analyzed 156 essays submitted by students in a first-year college rhetoric course. Using cognitive and affective measures within Linguistic Inquiry and Word Count (LIWC), we tested whether machine analyses could i) distinguish among essay topics, ii) distinguish between high and low writing quality, and iii) identify differences due to changes in rhetorical context across writing assignments. The results showed positive results for all three tests. We consider ways that LIWC may benefit college instructors in assessing student compositions and in monitoring the effectiveness of the course curriculum. We also consider extensions of machine assessments for instructional applications. References Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022. Boot, P., Zijlstra, H., & Geenen, R. (2017). The Dutch translation of the Linguistic Inquiry and Word Count (LIWC) 2007 dictionary. Dutch Journal of Applied Linguistics, 6(1), 65-76. Carroll, D. W. (2007). Patterns of student writing in a critical thinking course: A quantitative analysis. Assessing Writing, 12, 213–227. Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse processes, 25(2-3), 259-284. Lord, S. P., Sheng, E., Imel, Z. E., Baer, J., & Atkins, D. C. (2015). More than reflections: Empathy in motivational interviewing includes language style synchrony between therapist and client. Behavior therapy, 46(3), 296-303. Lund, K., & Burgess, C. (1996). Producing high-dimensional semantic spaces from lexical co-occurrence. Behavior research methods, instruments, & computers, 28(2), 203-208. Lunsford, A. A. (2016). St. Martin’s handbook (8th ed.): MLA supplement. Bedford/St. Martin’s Press. Massó, G., Lambert, P., Penagos, C. R., & Saurí, R. (2013, December). Generating New LIWC Dictionaries by Triangulation. In Asia Information Retrieval Symposium (pp. 263-271). Springer, Berlin, Heidelberg. Pennebaker, J. W. (2004). Theories, therapies, and taxpayers: On the complexities of the expressive writing paradigm. Clinical Psychology: Science and Practice, 11(2), 138-142. Pennebaker, J.W., Boyd, R.L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC 2015. Austin, TX: University of Texas at Austin. Pennebaker, J. W., Chung, C. K., Frazee, J., Lavergne, G. M., & Beaver, D. I. (2014). When small words foretell academic success: The case of college admissions essays. PLoS ONE, 9(12), e115844. Pennebaker, J. W., & King, L. A. (1999). Linguistic styles: Language use as an individual difference. Journal of Personality and Social Psychology 77(6), 1296-1312. Robertson, K., & Doig, A. (2010). An Empirical Investigation of Variations in Real‐Estate Marketing Language over a Market Cycle. Housing, Theory and Society, 27(2), 178-189. Robinson, R. L., Navea, R., & Ickes, W. (2013). Predicting final course performance from students’ written self-introductions: A LIWC analysis. Journal of Language and Social Psychology, 32(4), 469 – 479. Taraban, R., Pittman, J., Nalabandian, T., Yang, W. F. Z., Marcy, W. M., & Gunturu, S. M. (2019). Creating and testing specialized dictionaries for text analysis. East European Journal of Psycholinguistics, 6(1), 65-75. Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of language and social psychology, 29, 24-54. Van Wissen, L., & Boot, P. (2017, September). An electronic translation of the LIWC dictionary into Dutch. In Electronic lexicography in the 21st century: Proceedings of eLex 2017 Conference (pp. 703-715). Lexical Computing.


2011 ◽  
Vol 109 (1) ◽  
pp. 73-76 ◽  
Author(s):  
David Lester ◽  
Stephanie McSwain

Changes in the words used in the poems of Sylvia Plath were examined using the Linguistic Inquiry and Word Count, a computer program for analyzing the content of texts. Major changes in the content of her poems were observed over the course of Plath's career, as well as in the final year of her life. As the time of her suicide came closer, words expressing positive emotions became more frequent, while words concerned with causation and insight became less frequent.


2019 ◽  
Vol 26 (8-9) ◽  
pp. 759-766 ◽  
Author(s):  
Young Ji Lee ◽  
Charles Kamen ◽  
Liz Margolies ◽  
Ulrike Boehmer

Abstract Objective The study sought to explore online health communities (OHCs) for sexual minority women (SMW) with cancer by conducting computational text analysis on posts. Materials and Methods Eight moderated OHCs were hosted by the National LGBT Cancer Network from 2013 to 2015. Forty-six SMW wrote a total of 885 posts across the OHCs, which were analyzed using Linguistic Inquiry and Word Count and latent Dirichlet allocation. Pearson correlation was calculated between Linguistic Inquiry and Word Count word categories and participant engagement in the OHCs. Latent Dirichlet allocation was used to derive main topics. Results Participants (average age 46 years; 89% white/non-Hispanic) who used more sadness, female-reference, drives, and religion-related words were more likely to post in the OHCs. Ten topics emerged: coping, holidays and vacation, cancer diagnosis and treatment, structure of day-to-day life, self-care, loved ones, physical recovery, support systems, body image, and symptom management. Coping was the most common topic; symptom management was the least common topic. Discussion Highly engaged SMW in the OHCs connected to others via their shared female gender identity. Topics discussed in these OHCs were similar to OHCs for heterosexual women, and sexual identity was not a dominant topic. The presence of OHC moderators may have driven participation. Formal comparison between sexual minority and heterosexual women’s OHCs are needed. Conclusions Our findings contribute to a better understanding of the experiences of SMW cancer survivors and can inform the development of tailored OHC-based interventions for SMW who are survivors of cancer.


2020 ◽  
pp. 0261927X2096564
Author(s):  
Kate G. Blackburn ◽  
Weixi Wang ◽  
Rhea Pedler ◽  
Rachel Thompson ◽  
Diana Gonzales

This study analyzed thousands of women’s online conversations in relation to their miscarriage or abortion experiences, classified as unplanned and planned traumas, respectively. Linguistic Inquiry Word Count text analysis revealed that people experiencing a planned trauma use distancing language patterns in higher frequency and engage in emotion regulation more than those who experienced trauma unexpectedly. On the other hand, planned trauma conversations used more self-focused language and more social-based language. Implications and future directions for trauma research are discussed.


2017 ◽  
Author(s):  
Yupa Umigi Al-khairi ◽  
Yudi Wibisono ◽  
Budi Laksono Putro

Bagi orang-orang yang bergerak di bidang fashion mengetahui tren fashion adalah hal yang penting. Salah satu cara untuk mengetahui tren adalah dengan mendeteksi topik mengenai fashion yang dibicarakan di media sosial. Penelitian ini mengimplementasikan algoritma Latent Dirichlet Allocation untuk mendeteksi topik fashion di Twitter. Tweet yang didapat, diklasifikasi dengan metode Naive Bayes lalu dibersihkan dengan cara menghapus URL, simbol, angka dan merubah setiap kata menjadi huruf kecil. Tweet lalu dibentuk menjadi kumpulan kata dan dikelompokan dengan algoritma Latent Dirichlet Allocation. Berdasarkan hasil eksperimen, konfigurasi paramater 20 topik dengan 1000 iterasi memperoleh skor UMass terbaik dengan nilai -56.342, dan konfigurasi parameter 50 topik dengan 1000 iterasi memperoleh skor PMI terbaik dengan nilai 6.272.


2021 ◽  
Vol 5 (1) ◽  
pp. 123-131
Author(s):  
Ni Luh Putu Merawati Putu ◽  
Ahmad Zuli Amrullah ◽  
Ismarmiaty

Lombok Island is one of the favorite tourist destinations. Various topics and comments about Lombok tourism experience through social media accounts are difficult to manually identify public sentiments and topics. The opinion expressed by tourists through social media is interesting for further research. This study aims to classify tourists' opinions into two classes, positive and negative, and topics modelling by using the Naive Bayes method and modeling the topic by using Latent Dirichlet Allocation (LDA). The stages of this research include data collection, data cleaning, data transformation, data classification. The results performance testing of the classification model using Naive Bayes method is shown with an accuracy value of 92%, precision of 100%, recall of 84% and specificity of 100%. The results of modeling topics using LDA in each positive and negative class from the coherence value shows the highest value for the positive class was obtained on the 8th topic with a value of 0.613 and for the negative class on the 12th topic with a value of 0.528. The use of the Naive Bayes and LDA algorithms is considered effective for analyzing the sentiment and topic modelling for Lombok tourism.  


Author(s):  
Sanaz Aghazadeh ◽  
Kris Hoang ◽  
Bradley Pomeroy

This paper provides methodological guidance for judgment and decision-making (JDM) researchers in accounting who are interested in using the Linguistic Inquiry Word Count (LIWC) text analysis program to analyze research participants’ written responses to open-ended questions. We discuss how LIWC’s measures of psychological constructs were developed and validated in psycholinguistic research. We then use data from an audit JDM study to illustrate the use of LIWC to guide researchers in identifying suitable measures, performing quality control procedures, and reporting the analysis. We also discuss research design considerations that will strengthen the inferences drawn from LIWC analysis. The paper concludes with examples where LIWC analysis has the potential to reveal participants’ deep, complex, effortful psychological processing and affective states from their written responses.


2019 ◽  
Vol 64 (1) ◽  
pp. 97-117 ◽  
Author(s):  
William A. Donohue ◽  
Qi Hao ◽  
Richard Spreng ◽  
Charles Owen

The purpose of this article is to illustrate innovations in text analysis associated with understanding conflict-related communication events. Two innovations will be explored: LIWC (Linguistic Inquiry and Word Count), the text modeling program from the open-source data analysis software program R, and SPSS Modeler. The LIWC analysis revisits the 2009 study by Donohue and Druckman and the 2014 study by Donohue, Liang, and Druckman focusing on text analyses of the Oslo I Accords between the Palestinians and Israelis to illustrate this approach. The R and SPSS modeling of text analysis use the same data set as the LIWC analysis to provide a different set of pictures associated with each leader’s rhetoric during the period in which the Oslo I accords were being negotiated. Each innovation provides different insights into the mind-set of the two groups of leaders as the secret talks were emerging. The implications of each approach in establishing an understanding of the communication exchanges are discussed to conclude the article.


2021 ◽  
Author(s):  
Peter Boot

Linguistic Inquiry and Word Count (LIWC) is a text analysis program developed by James Pennebaker and colleagues. At the basis of LIWC is a dictionary that assigns words to categories. This dictionary is specific to English. Researchers who want to use LIWC on non-English texts have typically relied on translations of the dictionary into the language of the texts. Dictionary translation, however, is a labour-intensive procedure. In this paper, we investigate an alternative approach: to use Machine Translation (MT) to translate the texts that must be analysed into English, and then use the English dictionary to analyse the texts. We test several LIWC versions, languages and MT engines, and consistently find the machine-translated text approach performs better than the translated-dictionary approach. We argue that for languages for which effective MT technology is available, there is no need to create new LIWC dictionary translations.


Sign in / Sign up

Export Citation Format

Share Document