question difficulty
Recently Published Documents


TOTAL DOCUMENTS

75
(FIVE YEARS 22)

H-INDEX

9
(FIVE YEARS 1)

2021 ◽  
Vol 11 (24) ◽  
pp. 12023
Author(s):  
Hyun-Je Song ◽  
Su-Hwan Yoon ◽  
Seong-Bae Park

This paper addresses a question difficulty estimation of which goal is to estimate the difficulty level of a given question in question-answering (QA) tasks. Since a question in the tasks is composed of a questionary sentence and a set of information components such as a description and candidate answers, it is important to model the relationship among the information components to estimate the difficulty level of the question. However, existing approaches to this task modeled a simple relationship such as a relationship between a questionary sentence and a description, but such simple relationships are insufficient to predict the difficulty level accurately. Therefore, this paper proposes an attention-based model to consider the complicated relationship among the information components. The proposed model first represents bi-directional relationships between a questionary sentence and each information component using a dual multi-head co-attention, since the questionary sentence is a key factor in the QA questions and it affects and is affected by information components. Then, the proposed model considers inter-information relationship over the bi-directional representations through a self-attention model. The inter-information relationship helps predict the difficulty of the questions accurately which require reasoning over multiple kinds of information components. The experimental results from three well-known and real-world QA data sets prove that the proposed model outperforms the previous state-of-the-art and pre-trained language model baselines. It is also shown that the proposed model is robust against the increase of the number of information components.


2021 ◽  
Author(s):  
Jun He ◽  
Li Peng ◽  
Bo Sun ◽  
Lejun Yu ◽  
Yinghui Zhang

Author(s):  
Pandu Meidian Pratama

The Indonesian Language Proficiency Test (UKBI) is one of the Indonesian language proficiency tests that aims to measure written and oral proficiency, both by spoken and foreign speech. In its development, UKBI developed into an adaptive UKBI which can display different levels of question difficulty for each examiner, so that UKBI has a variety of questions that are not the same between participants and a different number of questions. The method used in this research is a simple qualitative and quantitative descriptive method with primary data obtained through survey results. The results in this study were that of the 51 participants who took part in the activity, 30 of them took part in the UKBI simulation activity with the lowest UKBI simulation score being 60 and the highest being 100. After attending the language clinic, the participants who were very interested in taking the UKBI test visited 15 people, interested in participating in the UKBI test. opened 16 people, and less interested in opening 2 people or 92%.


2021 ◽  
pp. 089443932110329
Author(s):  
Amanda Fernández-Fontelo ◽  
Pascal J. Kieslich ◽  
Felix Henninger ◽  
Frauke Kreuter ◽  
Sonja Greven

Survey research aims to collect robust and reliable data from respondents. However, despite researchers’ efforts in designing questionnaires, survey instruments may be imperfect, and question structure not as clear as could be, thus creating a burden for respondents. If it were possible to detect such problems, this knowledge could be used to predict problems in a questionnaire during pretesting, inform real-time interventions through responsive questionnaire design, or to indicate and correct measurement error after the fact. Previous research has used paradata, specifically response times, to detect difficulties and help improve user experience and data quality. Today, richer data sources are available, for example, movements respondents make with their mouse, as an additional detailed indicator for the respondent–survey interaction. This article uses machine learning techniques to explore the predictive value of mouse-tracking data regarding a question’s difficulty. We use data from a survey on respondents’ employment history and demographic information, in which we experimentally manipulate the difficulty of several questions. Using measures derived from mouse movements, we predict whether respondents have answered the easy or difficult version of a question, using and comparing several state-of-the-art supervised learning methods. We have also developed a personalization method that adjusts for respondents’ baseline mouse behavior and evaluate its performance. For all three manipulated survey questions, we find that including the full set of mouse movement measures and accounting for individual differences in these measures improve prediction performance over response-time-only models.


Author(s):  
Ulrike Padó ◽  
Sebastian Padó

AbstractThe ’short answer’ question format is a widely used tool in educational assessment, in which students write one to three sentences in response to an open question. The answers are subsequently rated by expert graders. The agreement between these graders is crucial for reliable analysis, both in terms of educational strategies and in terms of developing automatic models for short answer grading (SAG), an active research topic in NLP. This makes it important to understand the properties that influence grader agreement (such as question difficulty, answer length, and answer correctness). However, the twin challenges towards such an understanding are the wide range of SAG corpora in use (which differ along a number of dimensions) and the hierarchical structure of potentially relevant properties (which can be located at the corpus, answer, or question levels). This article uses generalized mixed effects models to analyze the effect of various such properties on grader agreement in six major SAG corpora for two main assessment tasks (language and content assessment). Overall, we find broad agreement among corpora, with a number of properties behaving similarly across corpora (e.g., shorter answers and correct answers are easier to grade). Some properties show more corpus-specific behavior (e.g., the question difficulty level), and some corpora are more in line with general tendencies than others. In sum, we obtain a nuanced picture of how the major short answer grading corpora are similar and dissimilar from which we derive suggestions for corpus development and analysis.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Sujan Kumar Saha

AbstractIn this paper, we present a system for automatic evaluation of the quality of a question paper. Question paper plays a major role in educational assessment. The quality of a question paper is crucial to fulfilling the purpose of the assessment. In many education sectors, question papers are prepared manually. A prior analysis of a question paper might help in finding the errors in the question paper, and better achieving the goals of the assessment. In this experiment, we focus on higher education in the technical domain. First, we conducted a student survey to identify the key factors that affect the quality of a question paper. The top factors we identified are question relevance, question difficulty, and time requirement. We explored the strategies to handle these factors and implemented them. We employ various concepts and techniques for the implementation. The system finally assigns a numerical quality score against these factors. The system is evaluated using a set of question papers collected from various sources. The experimental results show that the proposed system is quite promising.


2021 ◽  
pp. 362-375
Author(s):  
Huan Dai ◽  
Yupei Zhang ◽  
Yue Yun ◽  
Xuequn Shang

2021 ◽  
Author(s):  
Ekaterina Loginova ◽  
◽  
Luca Benedetto ◽  
Dries Benoit ◽  
Paolo Cremonesi ◽  
...  
Keyword(s):  

2020 ◽  
Vol 7 (1) ◽  
Author(s):  
Mats Lundström ◽  
Maria Kugelberg ◽  
Per Montan ◽  
Ingela Nilsson ◽  
Madeleine Zetterberg ◽  
...  

Abstract Background The Swedish National Cataract Register (NCR) collects data on cataract surgery outcomes during March, including patient-reported outcomes using the Catquest-9SF questionnaire for over 11 years. Previous studies from NCR have shown that the preoperative visual acuity has improved over time. The main purpose of this study was to evaluate the Catquest-9SF Rasch scoring performance in this changing environment. A second purpose was to describe clinical data over the same period for those who completed the questionnaire. Methods The performance of the Catquest-9SF was analysed by a separate Rasch analysis for each year, resulting in a preoperative and postoperative score for each participating patient in the annual cohorts. The clinical data and questionnaire scoring were analysed for each year in the period 2008–2018 inclusive. Results Data were available for 42,023 eyes for 11 annual cohorts (2008–2018). The psychometric properties of the questionnaire were stable during the study period. Person separation (precision) for the whole period was 2.58 and varied between 2.45 and 2.72. The person reliability was 0.87 and varied between 0.86 and 0.88. The targeting of question difficulty to person ability became less accurate over time meaning that the item activities became easier to carry out without difficulty. The average targeting for the whole period was −2.06 and changed from −1.92 in 2008 to −2.31 in 2018. The person score improved both before surgery and after surgery, indicating that patients are undergoing surgery at a more able level and getting better outcomes. The average improvement by surgery decreased from 3.41 logits in 2008 to 3.21 logits in 2018 (p = 0.003). Over time, patient age decreased from 75 to 74 years (p < 0.001) and the proportion of women decreased from 63.9 to 57.9% (p < 0.001). The mean preoperative visual acuity in both the operated eye and the better eye improved over time (0.47 to 0.40 logMAR, p < 0.001 and 0.22 to 0.19 logMAR, p < 0.001, respectively), as did the mean postoperative visual acuity in the operated eye (0.14 to 0.09 logMAR, p < 0.001). Conclusions The Catquest-9SF retained stable psychometric properties over this 11-year period although more recent cohorts included slightly younger patients with somewhat better vision.


Sign in / Sign up

Export Citation Format

Share Document