Person Fit Across Subgroups: An Achievement Testing Example

Dynamic Assessment Language Tasks and the Prediction of Performance on Year-End Language Skills in Preschool Dual Language Learners

American Journal of Speech-Language Pathology ◽

10.1044/2019_ajslp-19-00120 ◽

2020 ◽

Vol 29 (3) ◽

pp. 1226-1240

Author(s):

Janet L. Patterson ◽

Barbara L. Rodríguez ◽

Philip S. Dale

Keyword(s):

Head Start ◽

Language Learners ◽

Language Impairment ◽

Dual Language ◽

Dynamic Assessment ◽

Dual Language Learners ◽

Achievement Testing ◽

Potential Applications ◽

Language Screening ◽

Relationship Of

Purpose Early identification is a key element for accessing appropriate services for preschool children with language impairment. However, there is a high risk of misidentifying typically developing dual language learners as having language impairment if inappropriate tools designed for monolingual children are used. In this study of children with bilingual exposure, we explored performance on brief dynamic assessment (DA) language tasks using graduated prompting because this approach has potential applications for screening. We asked if children's performance on DA language tasks earlier in the year was related to their performance on a year-end language achievement measure. Method Twenty 4-year-old children from Spanish-speaking homes attending Head Start preschools in the southwestern United States completed three DA graduated prompting language tasks 3–6 months prior to the Head Start preschools' year-end achievement testing. The DA tasks, Novel Adjective Learning, Similarities in Function, and Prediction, were administered in Spanish, but correct responses in English or Spanish were accepted. The year-end achievement measure, the Learning Accomplishment Profile–Third Edition (LAP3), was administered by the children's Head Start teachers, who also credited correct responses in either language. Results Children's performance on two of the three DA language tasks was significantly and positively related to year-end LAP3 language scores, and there was a moderate and significant relationship for one of the DA tasks, even when controlling for age and initial LAP3 scores. Conclusions Although the relationship of performance on DA with year-end performance varies across tasks, the findings indicate potential for using a graduated prompting approach to language screening with young dual language learners. Further research is needed to select the best tasks for administration in a graduated prompting framework and determine accuracy of identification of language impairment.

Download Full-text

Detecting Random Responses in a Personality Scale Using IRT-Based Person-Fit Indices

European Journal of Psychological Assessment ◽

10.1027/1015-5759/a000369 ◽

2019 ◽

Vol 35 (1) ◽

pp. 126-136 ◽

Cited By ~ 2

Author(s):

Tour Liu ◽

Tian Lan ◽

Tao Xin

Keyword(s):

Experimental System ◽

Real Data ◽

Fit Indices ◽

Random Response ◽

Person Fit ◽

Index Method ◽

Single Person ◽

Person Fit Index ◽

Random Responses ◽

Response Behaviors

Abstract. Random response is a very common aberrant response behavior in personality tests and may negatively affect the reliability, validity, or other analytical aspects of psychological assessment. Typically, researchers use a single person-fit index to identify random responses. This study recommends a three-step person-fit analysis procedure. Unlike the typical single person-fit methods, the three-step procedure identifies both global misfit and local misfit individuals using different person-fit indices. This procedure was able to identify more local misfit individuals than single-index method, and a graphical method was used to visualize those particular items in which random response behaviors appear. This method may be useful to researchers in that it will provide them with more information about response behaviors, allowing better evaluation of scale administration and development of more plausible explanations. Real data were used in this study instead of simulation data. In order to create real random responses, an experimental test administration was designed. Four different random response samples were produced using this experimental system.

Download Full-text

Implications of Person Fluctuation for the Stability and Validity of Test Scores

Methodology ◽

10.1027/1614-2241.2.4.142 ◽

2006 ◽

Vol 2 (4) ◽

pp. 142-148 ◽

Cited By ~ 1

Author(s):

Pere J. Ferrando

Keyword(s):

Item Response ◽

Test Scores ◽

Temporal Stability ◽

Person Fit ◽

Item Parameters ◽

Individual Trait ◽

The Stability ◽

The Individual ◽

Different Levels ◽

Fluctuation Model

In the IRT person-fluctuation model, the individual trait levels fluctuate within a single test administration whereas the items have fixed locations. This article studies the relations between the person and item parameters of this model and two central properties of item and test scores: temporal stability and external validity. For temporal stability, formulas are derived for predicting and interpreting item response changes in a test-retest situation on the basis of the individual fluctuations. As for validity, formulas are derived for obtaining disattenuated estimates and for predicting changes in validity in groups with different levels of fluctuation. These latter formulas are related to previous research in the person-fit domain. The results obtained and the relations discussed are illustrated with an empirical example.

Download Full-text

Assessing Person Fit With the Information Matrix Test

Methodology ◽

10.1027/1614-2241/a000085 ◽

2015 ◽

Vol 11 (1) ◽

pp. 3-12 ◽

Cited By ~ 2

Author(s):

Jochen Ranger ◽

Jörg-Tobias Kuhn

Keyword(s):

Simulation Study ◽

Type I Error ◽

Information Matrix ◽

Small Samples ◽

Type I ◽

Person Fit ◽

Power Of The Test ◽

Order Expansion ◽

Trait Stability ◽

Information Matrix Test

In this manuscript, a new approach to the analysis of person fit is presented that is based on the information matrix test of White (1982) . This test can be interpreted as a test of trait stability during the measurement situation. The test follows approximately a χ2-distribution. In small samples, the approximation can be improved by a higher-order expansion. The performance of the test is explored in a simulation study. This simulation study suggests that the test adheres to the nominal Type-I error rate well, although it tends to be conservative in very short scales. The power of the test is compared to the power of four alternative tests of person fit. This comparison corroborates that the power of the information matrix test is similar to the power of the alternative tests. Advantages and areas of application of the information matrix test are discussed.

Download Full-text

The Convolution Neural Network Combined with the HT Person Fit Statistic to Develop an APP for Detecting Dengue Fever in Children: Development and Usability Study (Preprint)

10.2196/preprints.16347 ◽

2019 ◽

Author(s):

CHIEN WEI ◽

Chi Chow Julie ◽

Chou Willy

Keyword(s):

Neural Network ◽

Dengue Fever ◽

Prediction Accuracy ◽

Early Stage ◽

Family Members ◽

Convolution Neural Network ◽

Public Health Issue ◽

Person Fit ◽

Model Accuracy ◽

Comparison Results

UNSTRUCTURED Backgrounds: Dengue fever (DF) is an important public health issue in Asia. However, the disease is extremely hard to detect using traditional dichotomous (i.e., absent vs. present) evaluations of symptoms. Convolution neural network (CNN), a well-established deep learning method, can improve prediction accuracy on account of its usage of a large number of parameters for modeling. Whether the HT person fit statistic can be combined with CNN to increase the prediction accuracy of the model and develop an application (APP) to detect DF in children remains unknown. Objectives: The aim of this study is to build a model for the automatic detection and classification of DF with symptoms to help patients, family members, and clinicians identify the disease at an early stage. Methods: We extracted 19 feature variables of DF-related symptoms from 177 pediatric patients (69 diagnosed with DF) using CNN to predict DF risk. The accuracy of two sets of characteristics (19 symptoms and four other variables, including person mean, standard deviation, and two HT-related statistics matched to DF+ and DF−) for predicting DF, were then compared. Data were separated into training and testing sets, and the former was used to predict the latter. We calculated the sensitivity (Sens), specificity (Spec), and area under the receiver operating characteristic curve (AUC) across studies for comparison. Results: We observed that (1) the 23-item model yields a higher accuracy rate (0.95) and AUC (0.94) than the 19-item model (accuracy = 0.92, AUC = 0.90) based on the 177-case training set; (2) the Sens values are almost higher than the corresponding Spec values (90% in 10 scenarios) for predicting DF; (3) the Sens and Spec values of the 23-item model are consistently higher than those of the 19-item model. An APP was subsequently designed to detect DF in children. Conclusion: The 23-item model yielded higher accuracy rates (0.95) and AUC (0.94) than the 19-item model (accuracy = 0.92, AUC = 0.90). An APP could be developed to help patients, family members, and clinicians discriminate DF from other febrile illnesses at an early stage.

Download Full-text

Sustainable public health partnerships: Results from 41 European, Asian and American regions

European Journal of Public Health ◽

10.1093/eurpub/ckaa166.1233 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

J Ese ◽

C Ihlebak

Keyword(s):

Public Health ◽

Public Sector ◽

Social Contexts ◽

Research Literature ◽

Person Fit ◽

Time Data ◽

Incentive Schemes ◽

The Public ◽

Health Partnerships ◽

Group Interviews

Abstract Background Public health problems often constitute so called “wicked problems”, and the importance of involving multiple stakeholders in order to address such problems is acknowledged, for instance through the SDG17 guidelines. Partnerships between academia and the public sector have been deemed especially promising. However, sustainable partnerships might be difficult due to divergent understandings and interests. Although there is a substantial research literature on academic-public partnerships in general, partnerships addressing public health specifically are less investigated. The aim of the project was therefore to identify enablers for sustainable public health partnerships between academia and the public sector. Methods A mixed methods design was used. A survey regarding partnerships was sent to 41 European, Asian and American regions, with a response rate of 72 %. Based on survey data, an interview guide was developed and four best cases (Canada, Bulgaria, the Netherlands and Norway) were identified. Site visits and group interviews with representatives from stakeholders of the partnerships were conducted. Interview data and answers to open ended questions from questionnaires were analysed. Results Three main findings became apparent through the analysis. Important enablers were: 1) person-to-person fit between individuals, 2) national incentive schemes for collaboration, and 3) formal partnership agreements that provided a framework that allowed for manoeuvring. The enablers identified are on a macro, miso and micro level. Furthermore, they can be categorised as political, organisational, and social. Conclusions The data support the notion that partnerships are complex social structures that need to be initiated and managed on different levels and with different measures. At the same time, data demonstrate that across different geographical, political, and social contexts the same enablers are reappearing as important for sustaining public health partnerships. Key messages Similar enablers for sustaining public health partnerships are found across geographical, political, and social contexts. Important enablers for partnerships are person-to-person fit, national incentive schemes, and formal agreements.

Download Full-text

The Emergence of an Item-writing Technology

Review of Educational Research ◽

10.3102/00346543050002293 ◽

1980 ◽

Vol 50 (2) ◽

pp. 293-314 ◽

Cited By ~ 16

Author(s):

Gale Roid ◽

Tom Haladyna

Keyword(s):

Empirical Studies ◽

Achievement Tests ◽

Emerging Technology ◽

Achievement Testing ◽

Design Domain ◽

Item Development ◽

Item Writing ◽

Writing Technology ◽

Facet Design

The emerging technology of item writing for achievement tests is reviewed. Several different approaches to item development are discussed. A continuum of item-writing methods is proposed ranging from informal-subjective methods to algorithmic-objective methods. Examples of techniques include objective-based item writing, amplified objectives, item forms, facet design, domain-referenced concept testing, and computerized techniques. Each item-writing technique is critically reviewed, and empirical studies of methods are described. Recommendations for further research and for applications to achievement testing are presented.

Download Full-text

Exact person fit indexes for the rasch model for arbitrary alternatives

Psychometrika ◽

10.1007/bf02294184 ◽

2000 ◽

Vol 65 (1) ◽

pp. 29-42 ◽

Cited By ~ 7

Author(s):

Ivo Poncny

Keyword(s):

Rasch Model ◽

Person Fit ◽

Fit Indexes ◽

The Rasch Model

Download Full-text

The Blind Learning Aptitude Test

Journal of Visual Impairment & Blindness ◽

10.1177/0145482x7907300403 ◽

1979 ◽

Vol 73 (4) ◽

pp. 134-139

Author(s):

T. Ernest Newland

Keyword(s):

Educational Achievement ◽

High Reliability ◽

Achievement Testing ◽

Aptitude Test ◽

Intelligence Scale ◽

Individual Test ◽

Blind Children ◽

Basic Orientation

Against a general background of learning aptitude and educational achievement testing of blind children, and of basic orientation for such work, the development and nature of the Blind Learning Aptitude Test (BLAT), an individual test, are described. Standardized upon 961 widely representative educationally blind children, it had high reliability and its validity particularly with respect to the more complex school learnings was clearly indicated. It had particular value not only when its results were combined with those of the Hayes-Binet (though understandably lower when combined with those of the verbal portion of the Wechsler Intelligence Scale for Children), but especially when used with blind children coming from backgrounds offering limited cultural nurturance.

Download Full-text

Exploring the correspondence between traditional score resolution methods and person fit indices in rater-mediated writing assessments

Assessing Writing ◽

10.1016/j.asw.2018.12.002 ◽

2019 ◽

Vol 39 ◽

pp. 25-38 ◽

Cited By ~ 1

Author(s):

Stefanie A. Wind ◽

A. Adrienne Walker

Keyword(s):

Fit Indices ◽

Person Fit ◽

Writing Assessments

Download Full-text