Multiple-Choice Question Answering Models for Automatic Depression Severity Estimation

Jorge Gabín; Anxo Pérez; Javier Parapar

doi:10.3390/engproc2021007023

Multiple-Choice Question Answering Models for Automatic Depression Severity Estimation

Engineering Proceedings ◽

10.3390/engproc2021007023 ◽

2021 ◽

Vol 7 (1) ◽

pp. 23

Author(s):

Jorge Gabín ◽

Anxo Pérez ◽

Javier Parapar

Keyword(s):

Social Media ◽

Question Answering ◽

Multiple Choice ◽

Multiple Choice Question ◽

Self Report ◽

Training Data ◽

Language Models ◽

Depression Severity ◽

Suitable Alternative ◽

Severity Estimation

Depression is one of the most prevalent mental health diseases. Although there are effective treatments, the main problem relies on providing early and effective risk detection. Medical experts use self-reporting questionnaires to elaborate their diagnosis, but these questionnaires have some limitations. Social stigmas and the lack of awareness often negatively affect the success of these self-report questionnaires. This article aims to describe techniques to automatically estimate the depression severity from users on social media. We explored the use of pre-trained language models over the subject’s writings. We addressed the task “Measuring the Severity of the Signs of Depression” of eRisk 2020, an initiative in the CLEF Conference. In this task, participants have to fill the Beck Depression Questionnaire (BDI-II). Our proposal explores the application of pre-trained Multiple-Choice Question Answering (MCQA) models to predict user’s answers to the BDI-II questionnaire using their posts on social media. These MCQA models are built over the BERT (Bidirectional Encoder Representations from Transformers) architecture. Our results showed that multiple-choice question answering models could be a suitable alternative for estimating the depression degree, even when small amounts of training data are available (20 users).

Download Full-text

Automatic Depression Score Estimation with Word Embedding Models (Preprint)

10.2196/preprints.30484 ◽

2021 ◽

Author(s):

Anxo Pérez ◽

Javier Parapar ◽

Alvaro Barreiro

Keyword(s):

Social Media ◽

Question Answering ◽

Rating Scales ◽

Mixed Model ◽

Depression Score ◽

Self Report ◽

Training Data ◽

Language Models ◽

Patients At Risk ◽

Almost All

BACKGROUND Depression is one of the most common mental health illnesses. Despite existing effective treatments, the biggest obstacle lies in an efficient and early detection of the disorder. Self-report questionnaires are the instruments exploited to elaborate a diagnosis by medical experts. However, questionnaires often encounter certain limitations. Factors such as the lack of awareness and social stigmas negatively affect the success of self-report questionnaires. In this context, social media platforms provide non-direct means of communication capable of being a source of evidence to detect patients at risk. OBJECTIVE This paper aims to describe techniques to automatically estimate the degree of depression from users on social media. We aimed to explore neural language models to exploit various aspects of the subject's writings. Our proposals have focused on automatically completing the Beck Depression Inventory-II (BDI-II). BDI-II is a validated psychometric test consisting of 21 items, each one associated with a different symptom of depression. METHODS We presented three approaches for automatically filling the BDI-II questionnaire based on neural language models. The first proposal captures the overall use of language and communication patterns evidenced by individuals. In the second proposal, we narrow the user's representation by only using limited extracted answers from their posts to the items in the BDI-II. For that, we use state-of-the-art Question Answering models based on bidirectional encoder representations. Finally, we propose a mixed model that selects whether to automatically fill an item using the first or the second model. The rationale behind the mixed model is that, on the one hand, users easily comment the answer to some items in their texts, which made the second method appropriate. On the other hand, on more private or sensitive items, the first method is the best alternative, given that users avoid writing about them explicitly. RESULTS We addressed the task "Measuring the Severity of the Signs of Depression" of eRisk 2020, an initiative in the CLEF Conference. In this task, the participants have to fill in the BDI-II for the collection delivered by the task. We measured our results using the same accuracy metrics proposed by the competition. We compared them with the rest of the 17 methods presented by participants. Our proposals outperformed almost all participants for every official metric. CONCLUSIONS Our results showed that techniques based on neural language models are a feasible alternative for estimating rating scales for depression, even when small amounts of training data are available (20 users). We observe that depending on the symptom, it will be more appropriate to use general language patterns or looking for direct concerns about the particular symptom. In summary, the results of this study have demonstrated the potential of automatic text mining models to serve as a tool helping to diagnose depression disease.

Download Full-text

Audio-aware Spoken Multiple-choice Question Answering with Pre-trained Language Models

IEEE/ACM Transactions on Audio Speech and Language Processing ◽

10.1109/taslp.2021.3120638 ◽

2021 ◽

pp. 1-1

Author(s):

Chia-Chih Kuo ◽

Kuan-Yu Chen ◽

Shang-Bao Luo

Keyword(s):

Question Answering ◽

Multiple Choice ◽

Multiple Choice Question ◽

Language Models

Download Full-text

QASC: A Dataset for Question Answering via Sentence Composition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6319 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8082-8090

Author(s):

Tushar Khot ◽

Peter Clark ◽

Michal Guerquin ◽

Peter Jansen ◽

Ashish Sabharwal

Keyword(s):

Common Sense ◽

Human Performance ◽

Question Answering ◽

State Of The Art ◽

Multiple Choice ◽

Training Data ◽

Language Models ◽

Current State ◽

New Concepts ◽

Large Corpus

Composing knowledge from multiple pieces of texts is a key challenge in multi-hop question answering. We present a multi-hop reasoning dataset, Question Answering via Sentence Composition (QASC), that requires retrieving facts from a large corpus and composing them to answer a multiple-choice question. QASC is the first dataset to offer two desirable properties: (a) the facts to be composed are annotated in a large corpus, and (b) the decomposition into these facts is not evident from the question itself. The latter makes retrieval challenging as the system must introduce new concepts or relations in order to discover potential decompositions. Further, the reasoning model must then learn to identify valid compositions of these retrieved facts using common-sense reasoning. To help address these challenges, we provide annotation for supporting facts as well as their composition. Guided by these annotations, we present a two-step approach to mitigate the retrieval challenges. We use other multiple-choice datasets as additional training data to strengthen the reasoning model. Our proposed approach improves over current state-of-the-art language models by 11% (absolute). The reasoning and retrieval problems, however, remain unsolved as this model still lags by 20% behind human performance.

Download Full-text

Multi-source Meta Transfer for Low Resource Multiple-Choice Question Answering

10.18653/v1/2020.acl-main.654 ◽

2020 ◽

Author(s):

Ming Yan ◽

Hao Zhang ◽

Di Jin ◽

Joey Tianyi Zhou

Keyword(s):

Question Answering ◽

Multiple Choice ◽

Multiple Choice Question ◽

Low Resource

Download Full-text

Analyzing the Effect of Masking Length Distribution of MLM: An Evaluation Framework and Case Study on Chinese MRC Datasets

Wireless Communications and Mobile Computing ◽

10.1155/2021/5375334 ◽

2021 ◽

Vol 2021 ◽

pp. 1-17

Author(s):

Changchang Zeng ◽

Shaobo Li

Keyword(s):

Reading Comprehension ◽

Language Processing ◽

Question Answering ◽

Multiple Choice ◽

Length Distribution ◽

Research Field ◽

Evaluation Framework ◽

Language Models ◽

Training Objective ◽

Machine Reading

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human-computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple-choice cloze task, and long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.

Download Full-text

An Audio-Enriched BERT-Based Framework for Spoken Multiple-Choice Question Answering

10.21437/interspeech.2020-1763 ◽

2020 ◽

Author(s):

Chia-Chih Kuo ◽

Shang-Bao Luo ◽

Kuan-Yu Chen

Keyword(s):

Question Answering ◽

Multiple Choice ◽

Multiple Choice Question

Download Full-text

Multiple Choice Question Answering in the Legal Domain Using Reinforced Co-occurrence

Lecture Notes in Computer Science - Database and Expert Systems Applications ◽

10.1007/978-3-030-27615-7_10 ◽

2019 ◽

pp. 138-148

Author(s):

Jorge Martinez-Gil ◽

Bernhard Freudenthaler ◽

A Min Tjoa

Keyword(s):

Question Answering ◽

Multiple Choice ◽

Multiple Choice Question ◽

Legal Domain

Download Full-text

From ‘F’ to ‘A’ on the N.Y. Regents Science Exams: An Overview of the Aristo Project

AI Magazine ◽

10.1609/aimag.v41i4.5304 ◽

2020 ◽

Vol 41 (4) ◽

pp. 39-53

Author(s):

Peter Clark ◽

Oren Etzioni ◽

Tushar Khot ◽

Daniel Khashabi ◽

Bhavana Mishra ◽

...

Keyword(s):

New York ◽

Language Processing ◽

Question Answering ◽

Multiple Choice ◽

Language Models ◽

General Question ◽

8Th Grade ◽

The Rich ◽

Full Solution ◽

Standardized Exams

AI has achieved remarkable mastery over games such as Chess, Go, and Poker, and even Jeopardy!, but the rich variety of standardized exams has remained a landmark challenge. Even as recently as 2016, the best AI system could achieve merely 59.3 percent on an 8th grade science exam. This article reports success on the Grade 8 New York Regents Science Exam, where for the first time a system scores more than 90 percent on the exam’s nondiagram, multiple choice (NDMC) questions. In addition, our Aristo system, building upon the success of recent language models, exceeded 83 percent on the corresponding Grade 12 Science Exam NDMC questions. The results, on unseen test questions, are robust across different test years and different variations of this kind of test. They demonstrate that modern natural language processing methods can result in mastery on this task. While not a full solution to general question-answering (the questions are limited to 8th grade multiple-choice science) it represents a significant milestone for the field.

Download Full-text