scholarly journals Semantic Measures: Using Natural Language Processing to Measure, Differentiate and Describe Psychological Constructs

2018 ◽  
Author(s):  
Oscar Nils Erik Kjell ◽  
Katarina Kjell ◽  
Danilo Garcia ◽  
Sverker Sikström

Psychological constructs, such as emotions, thoughts and attitudes are often measured by asking individuals to reply to questions using closed-ended numerical rating scales. However, when asking people about their state of mind in a natural context (“How are you?”), we receive open-ended answers using words (“fine and happy!”) and not closed-ended answers using numbers (“7”) or categories (“A lot”). Nevertheless, to date it has been difficult to objectively quantify responses to open-ended questions. We develop an approach using open-ended questions in which the responses are analyzed using natural language processing (Latent Semantic Analyses). This approach of using open-ended, semantic questions is compared with traditional rating scales in nine studies (N=92-854), including two different study paradigms. The first paradigm requires participants to describe psychological aspects of external stimuli (facial expressions) and the second paradigm involves asking participants to report their subjective well-being and mental health problems. The results demonstrate that the approach using semantic questions yields good statistical properties with competitive, or higher, validity and reliability compared with corresponding numerical rating scales. As these semantic measures are based on natural language and measure, differentiate and describe psychological constructs, they have the potential of complementing and extending traditional rating scales.

2018 ◽  
Author(s):  
Oscar Nils Erik Kjell

How to define and measure individuals’ well-being is important, as this has an impact on both research and society at large. This thesis concerns how to define and measure the self-reported well-being of individuals, which involves both theorizing as well as developing and applying empirical and statistical methods in order to gain a better understanding of well-being.The first paper critically reviews the literature on well-being. It identifies an individualistic bias in current approaches and accompanying measures related to well-being and happiness; for example, through an over-emphasis on the importance of self-centered aspects of well-being (e.g., the unprecedented focus on satisfaction with life) whilst disregarding the importance of harmony in life, interconnectedness and psychological balance in relation to well- being. It is also discussed how closed-ended well-being measures impose the researchers’ values and limit the ability of respondents to express themselves in regard to their perceived well-being.The second paper addresses concerns regarding this individualistic bias by developing the harmony in life scale, which focuses on interconnectedness and psychological balance. In addition, an open-ended approach is developed in the paper, allowing individuals to freely describe their pursuit of well-being by means of open-ended responses analyzed using statistical semantics (including techniques from artificial intelligence such as natural language processing and machine learning). The results show that the harmony in life scale and the traditional satisfaction with life scale form a two-factor model of well-being, where the harmony in life scale explains more unique variance in measures of psychological well-being, stress, depression and anxiety, but not happiness. It is further demonstrated that participants describe their pursuit of harmony in life using words related to interconnectedness (including words such as: peace, balance, cooperation), whereas they describe their pursuit of satisfaction with life using words related to independence (including words such as: money, achievement, fulfillment). It is concluded that the harmony in life scale complements the satisfaction with life scale for a more comprehensive understanding of subjective well-being.The third paper focuses on developing and evaluating a method for measuring and describing psychological constructs using open-ended questions analyzed by means of statistical semantics rather than closed-ended numerical rating scales. This semantic measures approach is tested and compared with traditional rating scales in nine studies, including two different paradigms involving reports regarding objective stimuli (i.e., the evaluation of facial expressions) and reports regarding subjective states (i.e., the self-reporting of harmony in life, satisfaction with life, depression and worry). The results indicate that semantic measures encompass higher, or competitive, levels of reliability and validity compared to traditional numerical rating scales. In addition, semantic measures appear to be better suited for differentiating between psychological constructs, such as harmony in life versus satisfaction with life as well as depression versus worry.In this thesis, the findings from these three papers are elaborated and integrated into two independent perspectives. The first perspective focuses on the theoretical and empirical differences between harmony in life and satisfaction with life within a context of societal and national progress. It is concluded that harmony in life complements satisfaction with life. The second perspective focuses on the open-ended, statistical semantics approach. It is proposed that statistical semantics may beneficially be used more widely as a research tool within psychological research.


2020 ◽  
Author(s):  
Daniel Mark Low ◽  
Laurie Rumker ◽  
Tanya Talkar ◽  
John Torous ◽  
Guillermo Cecchi ◽  
...  

Background: The COVID-19 pandemic is exerting a devastating impact on mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective: We leverage natural language processing (NLP) with the goal of characterizing changes in fifteen of the world's largest mental health support groups (e.g., r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with eleven non-mental health groups (e.g., r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods: We create and release the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyze trends from 90 text-derived features such as sentiment analysis, personal pronouns, and a “guns” semantic category. Using supervised machine learning, we classify posts into their respective support group and interpret important features to understand how different problems manifest in language. We apply unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results: We find that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately two months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress”, “isolation”, and “home” while others such as “motion” significantly decreased. We find that support groups related to attention deficit hyperactivity disorder (ADHD), eating disorders (ED), and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discover that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ = -0.96, P<.001). Using unsupervised clustering, we find the Suicidality and Loneliness clusters more than doubled in amount of posts during the pandemic. Specifically, the support groups for borderline personality disorder and post-traumatic stress disorder became significantly associated with the Suicidality cluster. Furthermore, clusters surrounding Self-Harm and Entertainment emerged. Conclusions: By using a broad set of NLP techniques and analyzing a baseline of pre-pandemic posts, we uncover patterns of how specific mental health problems manifest in language, identify at-risk users, and reveal the distribution of concerns across Reddit which could help provide better resources to its millions of users. We then demonstrate that textual analysis is sensitive to uncover mental health complaints as they arise in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests from the present or the past.


2021 ◽  
Author(s):  
Taishiro Kishimoto ◽  
Hironobu Nakamura ◽  
Yoshinobu Kano ◽  
Yoko Eguchi ◽  
Momoko Kitazawa ◽  
...  

AbstractIntroductionPsychiatric disorders are diagnosed according to diagnostic criteria such as the DSM-5 and ICD-11. Basically, psychiatrists extract symptoms and make a diagnosis by conversing with patients. However, such processes often lack objectivity. In contrast, specific linguistic features can be observed in some psychiatric disorders, such as a loosening of associations in schizophrenia. The purposes of the present study are to quantify the language features of psychiatric disorders and neurocognitive disorders using natural language processing and to identify features that differentiate disorders from one another and from healthy subjects.MethodsThis study will have a multi-center prospective design. Major depressive disorder, bipolar disorder, schizophrenia, anxiety disorder including obsessive compulsive disorder and, major and minor neurocognitive disorders, as well as healthy subjects will be recruited. A psychiatrist or psychologist will conduct 30-to-60-min interviews with each participant and these interviews will be recorded using a microphone headset. In addition, the severity of disorders will be assessed using clinical rating scales. Data will be collected from each participant at least twice during the study period and up to a maximum of five times.DiscussionThe overall goal of this proposed study, the Understanding Psychiatric Illness Through Natural Language Processing (UNDERPIN), is to develop objective and easy-to-use biomarkers for diagnosing and assessing the severity of each psychiatric disorder using natural language processing. As of August 2021, we have collected a total of >900 datasets from >350 participants. To the best of our knowledge, this data sample is one of the largest in this field.Trial registrationUMIN000032141, University Hospital Medical Information Network (UMIN).


10.2196/22635 ◽  
2020 ◽  
Vol 22 (10) ◽  
pp. e22635 ◽  
Author(s):  
Daniel M Low ◽  
Laurie Rumker ◽  
Tanya Talkar ◽  
John Torous ◽  
Guillermo Cecchi ◽  
...  

Background The COVID-19 pandemic is impacting mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. Objective The aim of this study is to leverage natural language processing (NLP) with the goal of characterizing changes in 15 of the world’s largest mental health support groups (eg, r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with 11 non–mental health groups (eg, r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. Methods We created and released the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyzed trends from 90 text-derived features such as sentiment analysis, personal pronouns, and semantic categories. Using supervised machine learning, we classified posts into their respective support groups and interpreted important features to understand how different problems manifest in language. We applied unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. Results We found that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately 2 months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress,” “isolation,” and “home,” while others such as “motion” significantly decreased. We found that support groups related to attention-deficit/hyperactivity disorder, eating disorders, and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discovered that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ=–0.96, P<.001). Using unsupervised clustering, we found the suicidality and loneliness clusters more than doubled in the number of posts during the pandemic. Specifically, the support groups for borderline personality disorder and posttraumatic stress disorder became significantly associated with the suicidality cluster. Furthermore, clusters surrounding self-harm and entertainment emerged. Conclusions By using a broad set of NLP techniques and analyzing a baseline of prepandemic posts, we uncovered patterns of how specific mental health problems manifest in language, identified at-risk users, and revealed the distribution of concerns across Reddit, which could help provide better resources to its millions of users. We then demonstrated that textual analysis is sensitive to uncover mental health complaints as they appear in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests.


2019 ◽  
Vol 24 (1) ◽  
pp. 92-115 ◽  
Author(s):  
Oscar N. E. Kjell ◽  
Katarina Kjell ◽  
Danilo Garcia ◽  
Sverker Sikström

2021 ◽  
Vol 12 ◽  
Author(s):  
Juan Antonio Lossio-Ventura ◽  
Angela Yuson Lee ◽  
Jeffrey T. Hancock ◽  
Natalia Linos ◽  
Eleni Linos

COVID-19 has presented an unprecedented challenge to human welfare. Indeed, we have witnessed people experiencing a rise of depression, acute stress disorder, and worsening levels of subclinical psychological distress. Finding ways to support individuals' mental health has been particularly difficult during this pandemic. An opportunity for intervention to protect individuals' health &amp; well-being is to identify the existing sources of consolation and hope that have helped people persevere through the early days of the pandemic. In this paper, we identified positive aspects, or “silver linings,” that people experienced during the COVID-19 crisis using computational natural language processing methods and qualitative thematic content analysis. These silver linings revealed sources of strength that included finding a sense of community, closeness, gratitude, and a belief that the pandemic may spur positive social change. People's abilities to engage in benefit-finding and leverage protective factors can be bolstered and reinforced by public health policy to improve society's resilience to the distress of this pandemic and potential future health crises.


2020 ◽  
Author(s):  
Esra Kahya Özyirmidokuz ◽  
Kumru Uyar ◽  
Raian Ali ◽  
Eduard Alexandru Stoica ◽  
Betül Karakaş

BACKGROUND Measuring online Turkish happiness requires a Turkish happiness dictionary which could reflect norms and social values more culturally and linguistically instead of using a translation-oriented method. Analyzing data without neglecting cultural characteristics will not be reliable. Turkish translation of an English word in the Affective Norms of English Words (ANEW) dictionary does not express the same feeling of a Turkish word. In addition, existing emotional dictionaries are not developed for specifically for the social networks with emoticons. OBJECTIVE This research presents the Turkish Happiness Index (THI) which is a set of psychological normative happiness scores to measure an average level of happiness of Turkish online unstructured large-scale data. A well-being informatics analytics research is also done by using THI. METHODS Turkish Happiness Index was completely generated on social networks. 20000 words were extracted with web text mining from social networks. Natural Language Processing algorithms were applied. After data reduction quantitative research methodology is applied. The happiness scores were based detected based on 667 participants’ subjective happiness levels and their thoughts about the 1874 Turkish words. Alexithymia scale was also used to identify the emotional awareness of the participants. The evaluations of the words were done in the dimension of valence using the Self-Assessment Manikin in an online platform. NLP was used to measure online Turkish happiness of data. Data was collected from Facebook with negative #war and positive #family hashtags in a duration of one month using a 3rd party software tool. Natural language processing algorithms including tokenization, transformation, filtering and stemming after converting data to documents. The happiness levels of the documents based on hashtags were determined using the Turkish Happiness Index dictionary. RESULTS THI which contains 345 words and their happiness scores in the Turkish language was developed. The THI is given in Appendix 1. We also put a comparison between words of dictionaries to understand the cultural differences. CONCLUSIONS THI provide researchers with standard materials through which they can automatically measure online happiness of Turkish large-scale data. THI can be used in in real-time big data analytics.


2021 ◽  
Vol 3 ◽  
Author(s):  
Aurelie Mascio ◽  
Robert Stewart ◽  
Riley Botelle ◽  
Marcus Williams ◽  
Luwaiza Mirza ◽  
...  

Background: Cognitive impairments are a neglected aspect of schizophrenia despite being a major factor of poor functional outcome. They are usually measured using various rating scales, however, these necessitate trained practitioners and are rarely routinely applied in clinical settings. Recent advances in natural language processing techniques allow us to extract such information from unstructured portions of text at a large scale and in a cost effective manner. We aimed to identify cognitive problems in the clinical records of a large sample of patients with schizophrenia, and assess their association with clinical outcomes.Methods: We developed a natural language processing based application identifying cognitive dysfunctions from the free text of medical records, and assessed its performance against a rating scale widely used in the United Kingdom, the cognitive component of the Health of the Nation Outcome Scales (HoNOS). Furthermore, we analyzed cognitive trajectories over the course of patient treatment, and evaluated their relationship with various socio-demographic factors and clinical outcomes.Results: We found a high prevalence of cognitive impairments in patients with schizophrenia, and a strong correlation with several socio-demographic factors (gender, education, ethnicity, marital status, and employment) as well as adverse clinical outcomes. Results obtained from the free text were broadly in line with those obtained using the HoNOS subscale, and shed light on additional associations, notably related to attention and social impairments for patients with higher education.Conclusions: Our findings demonstrate that cognitive problems are common in patients with schizophrenia, can be reliably extracted from clinical records using natural language processing, and are associated with adverse clinical outcomes. Harvesting the free text from medical records provides a larger coverage in contrast to neurocognitive batteries or rating scales, and access to additional socio-demographic and clinical variables. Text mining tools can therefore facilitate large scale patient screening and early symptoms detection, and ultimately help inform clinical decisions.


2020 ◽  
Author(s):  
Daniel M Low ◽  
Laurie Rumker ◽  
Tanya Talkar ◽  
John Torous ◽  
Guillermo Cecchi ◽  
...  

BACKGROUND The COVID-19 pandemic is impacting mental health, but it is not clear how people with different types of mental health problems were differentially impacted as the initial wave of cases hit. OBJECTIVE The aim of this study is to leverage natural language processing (NLP) with the goal of characterizing changes in 15 of the world’s largest mental health support groups (eg, r/schizophrenia, r/SuicideWatch, r/Depression) found on the website Reddit, along with 11 non–mental health groups (eg, r/PersonalFinance, r/conspiracy) during the initial stage of the pandemic. METHODS We created and released the Reddit Mental Health Dataset including posts from 826,961 unique users from 2018 to 2020. Using regression, we analyzed trends from 90 text-derived features such as sentiment analysis, personal pronouns, and semantic categories. Using supervised machine learning, we classified posts into their respective support groups and interpreted important features to understand how different problems manifest in language. We applied unsupervised methods such as topic modeling and unsupervised clustering to uncover concerns throughout Reddit before and during the pandemic. RESULTS We found that the r/HealthAnxiety forum showed spikes in posts about COVID-19 early on in January, approximately 2 months before other support groups started posting about the pandemic. There were many features that significantly increased during COVID-19 for specific groups including the categories “economic stress,” “isolation,” and “home,” while others such as “motion” significantly decreased. We found that support groups related to attention-deficit/hyperactivity disorder, eating disorders, and anxiety showed the most negative semantic change during the pandemic out of all mental health groups. Health anxiety emerged as a general theme across Reddit through independent supervised and unsupervised machine learning analyses. For instance, we provide evidence that the concerns of a diverse set of individuals are converging in this unique moment of history; we discovered that the more users posted about COVID-19, the more linguistically similar (less distant) the mental health support groups became to r/HealthAnxiety (ρ=–0.96, <i>P</i>&lt;.001). Using unsupervised clustering, we found the suicidality and loneliness clusters more than doubled in the number of posts during the pandemic. Specifically, the support groups for borderline personality disorder and posttraumatic stress disorder became significantly associated with the suicidality cluster. Furthermore, clusters surrounding self-harm and entertainment emerged. CONCLUSIONS By using a broad set of NLP techniques and analyzing a baseline of prepandemic posts, we uncovered patterns of how specific mental health problems manifest in language, identified at-risk users, and revealed the distribution of concerns across Reddit, which could help provide better resources to its millions of users. We then demonstrated that textual analysis is sensitive to uncover mental health complaints as they appear in real time, identifying vulnerable groups and alarming themes during COVID-19, and thus may have utility during the ongoing pandemic and other world-changing events such as elections and protests.


Sign in / Sign up

Export Citation Format

Share Document