scholarly journals Learning to Classify the Wrong Answers for Multiple Choice Question Answering (Student Abstract)

2020 ◽  
Vol 34 (10) ◽  
pp. 13843-13844 ◽  
Author(s):  
Hyeondey Kim ◽  
Pascale Fung

Multiple-Choice Question Answering (MCQA) is the most challenging area of Machine Reading Comprehension (MRC) and Question Answering (QA), since it not only requires natural language understanding, but also problem-solving techniques. We propose a novel method, Wrong Answer Ensemble (WAE), which can be applied to various MCQA tasks easily. To improve performance of MCQA tasks, humans intuitively exclude unlikely options to solve the MCQA problem. Mimicking this strategy, we train our model with the wrong answer loss and correct answer loss to generalize the features of our model, and exclude likely but wrong options. An experiment on a dialogue-based examination dataset shows the effectiveness of our approach. Our method improves the results on a fine-tuned transformer by 2.7%.

2020 ◽  
Author(s):  
Ming Yan ◽  
Hao Zhang ◽  
Di Jin ◽  
Joey Tianyi Zhou

2021 ◽  
Vol 2021 ◽  
pp. 1-17
Author(s):  
Changchang Zeng ◽  
Shaobo Li

Machine reading comprehension (MRC) is a challenging natural language processing (NLP) task. It has a wide application potential in the fields of question answering robots, human-computer interactions in mobile virtual reality systems, etc. Recently, the emergence of pretrained models (PTMs) has brought this research field into a new era, in which the training objective plays a key role. The masked language model (MLM) is a self-supervised training objective widely used in various PTMs. With the development of training objectives, many variants of MLM have been proposed, such as whole word masking, entity masking, phrase masking, and span masking. In different MLMs, the length of the masked tokens is different. Similarly, in different machine reading comprehension tasks, the length of the answer is also different, and the answer is often a word, phrase, or sentence. Thus, in MRC tasks with different answer lengths, whether the length of MLM is related to performance is a question worth studying. If this hypothesis is true, it can guide us on how to pretrain the MLM with a relatively suitable mask length distribution for MRC tasks. In this paper, we try to uncover how much of MLM’s success in the machine reading comprehension tasks comes from the correlation between masking length distribution and answer length in the MRC dataset. In order to address this issue, herein, (1) we propose four MRC tasks with different answer length distributions, namely, the short span extraction task, long span extraction task, short multiple-choice cloze task, and long multiple-choice cloze task; (2) four Chinese MRC datasets are created for these tasks; (3) we also have pretrained four masked language models according to the answer length distributions of these datasets; and (4) ablation experiments are conducted on the datasets to verify our hypothesis. The experimental results demonstrate that our hypothesis is true. On four different machine reading comprehension datasets, the performance of the model with correlation length distribution surpasses the model without correlation.


2021 ◽  
Vol 7 (1) ◽  
pp. 23
Author(s):  
Jorge Gabín ◽  
Anxo Pérez ◽  
Javier Parapar

Depression is one of the most prevalent mental health diseases. Although there are effective treatments, the main problem relies on providing early and effective risk detection. Medical experts use self-reporting questionnaires to elaborate their diagnosis, but these questionnaires have some limitations. Social stigmas and the lack of awareness often negatively affect the success of these self-report questionnaires. This article aims to describe techniques to automatically estimate the depression severity from users on social media. We explored the use of pre-trained language models over the subject’s writings. We addressed the task “Measuring the Severity of the Signs of Depression” of eRisk 2020, an initiative in the CLEF Conference. In this task, participants have to fill the Beck Depression Questionnaire (BDI-II). Our proposal explores the application of pre-trained Multiple-Choice Question Answering (MCQA) models to predict user’s answers to the BDI-II questionnaire using their posts on social media. These MCQA models are built over the BERT (Bidirectional Encoder Representations from Transformers) architecture. Our results showed that multiple-choice question answering models could be a suitable alternative for estimating the depression degree, even when small amounts of training data are available (20 users).


Author(s):  
Gennaro Costagliola ◽  
Filomena Ferrucci ◽  
Vittorio Fuccella

Online Testing, also known as Computer Assisted Assessment (CAA), is a sector of e-learning aimed at assessing learner’s knowledge through e-learning means. In recent years, the means for knowledge evaluation have evolved in order to satisfy the necessity of evaluating a big mass of learners in strict times: objective tests, more rapidly assessable, have gained a heavier weight in the determination of learners’ results. Multiple Choice question type is extremely popular in objective tests, since, among other advantages, a large number of tests based on it can be easily corrected automatically. These items are composed of a stem and a list of options. The stem is the text that states the question. The only correct answer is called the key, whilst the incorrect answers are called distractors (Woodford & Bancroft, 2005).


2019 ◽  
Vol 184 (9-10) ◽  
pp. 509-514
Author(s):  
Ana Elizabeth Markelz ◽  
Alice Barsoumian ◽  
Heather Yun

Abstract Introduction There are many unique aspects to the practice of military Infectious Diseases (ID). San Antonio Uniformed Services Health Consortium Infectious Disease (ID) Fellowship is a combined Army and Air Force active duty program. Program leadership thought ID military unique curriculum (MUC) was well integrated into the program. We sought to verify this assumption to guide the decision to formalize the ID MUC. This study describes our strategy for the refinement and implementation of ID specific MUC, assesses the fellow and faculty response to these changes, and provides an example for other programs to follow. Methods We identified important ID areas through lessons learned from personal military experience, data from the ID Army Knowledge Online e-mail consult service, input from military ID physicians, and the Army and Air Force ID consultants to the Surgeons General. The consultants provided feedback on perceived gaps, appropriateness, and strategy. Due to restrictions in available curricular time, we devised a three-pronged strategy for revision: adapt current curricular practices to include MUC content, develop new learning activities targeted at the key content area, and sustain existing, effective MUC experiences. Learners were assessed by multiple choice question correct answer rate, performance during the simulation exercise, and burn rotation evaluation. Data on correct answer rate were analyzed according to level of training by using Mann–Whitney U test. Program assessment was conducted through anonymous feedback at midyear and end of year program evaluations. Results Twelve military unique ID content areas were identified. Diseases of pandemic potential and blood borne pathogen management were added after consultant input. Five experiences were adapted to include military content: core and noon conference series, simulation exercises, multiple choice quizzes, and infection control essay questions. A burn intensive care unit (ICU) rotation, Transport Isolation System exercise, and tour of trainee health facilities were the new learning activities introduced. The formal tropical medicine course, infection prevention in the deployed environment course, research opportunities and participation in trainee health outbreak investigations were sustained activities. Ten fellows participated in the military-unique spaced-education multiple-choice question series. Twenty-seven questions were attempted 814 times. 50.37% of questions were answered correctly the first time, increasing to 100% correct by the end of the activity. No difference was seen in the initial correct answer rate between the four senior fellows (median 55% [IQR 49.75, 63.25]) and the six first-year fellows (median 44% [IQR 39.25, 53]) (p = 0.114). Six fellows participated in the simulated deployment scenario. No failure of material synthesis was noted during the simulation exercise and all of the fellows satisfied the stated objectives. One fellow successfully completed the piloted burn ICU rotation. Fellows and faculty reported high satisfaction with the new curriculum. Conclusions Military GME programs are required by congress to address the unique aspects of military medicine. Senior fellow knowledge using the spaced interval multiple-choice quizzes did not differ from junior fellow rate, supporting our concern that the ID MUC needed to be enhanced. Enhancement of the MUC experience can be accomplished with minimal increases to curricular and faculty time.


2021 ◽  
Vol 1 (1) ◽  
pp. 8-17
Author(s):  
Ersika Puspita Dani

Abstract. This study deals with the student’s difficulties in using participle in sentences.  The purposes of the study were to find out whether or not the students found difficulties in using participle in sentence and to find out the type of difficulties they faced.The population of the study was the 2020/2021 of the Mechanic Otomotif (MO) students at SMK YAPIM Kabanjahe.  In this sampling, all the population has equal chance to be selected for the sample.  The total numbers of samples was 30 students. The instrument used to collect the data was multiple choice test.  This research was conducted by applying the descriptive quantitative design.  The reliability of the test is counted by using KR21 formula.  The formula testing result showed that the reliability of the test was 0,89, it means that the test was very good.   The finding showed that the students found some difficulties in using participle, they were : Present Participle (8,67 %), Past Participle (9,34 %) and Perfect Participle (9,33 %).  Perfect Participle was regarded as the most difficult type for them, especially in using it after certain verbs and in replacing relative pronoun, and then followed Past Participle by especially in using in it replacing relative pronoun and after certain verb. And the last was Present Participle.  The percentage of each difficulty was taken by dividing the wrong answer to the total correct answer of the test.


Sign in / Sign up

Export Citation Format

Share Document