score reliability Latest Research Papers

Kombination von NWV- und Nowcasting-Ensembles zur Verbesserung der Vorhersagen konvektiven Starkniederschlags des DWD im Rahmen von SINFONY

10.5194/dach2022-183 ◽

2021 ◽

Author(s):

Martin Rempel ◽

Peter Schaumann ◽

Ulrich Blahak ◽

Volker Schmidt

Keyword(s):

Skill Score ◽

Wird Eine ◽

Brier Skill Score ◽

Score Reliability ◽

Forecasting System

Verl&#228;ssliche Niederschlagsvorhersagen innerhalb des K&#252;rzestfristbereichs sind unerl&#228;sslich f&#252;r pr&#228;zise Warnungen und k&#246;nnen die Vorlaufzeit f&#252;r Entscheidungstr&#228;ger im Bereich der Gefahrenabwehr und des Rettungswesens erh&#246;hen. In der operationellen Wettervorhersage beruhen Vorhersage und Warnung vor konvektivem Starkniederschlag innerhalb der ersten zwei Stunden auf radarbasierten Nowcastingverfahren, w&#228;hrend f&#252;r sp&#228;tere Zeitpunkte Simulationen konvektionserlaubender Ensemblevorhersagesysteme genutzt werden. Im Rahmen des Projekts SINFONY (Seamless INtegrated FOrecastiNg sYstem) des Deutschen Wetterdienstes wird ein integriertes Ensemblesystem auf konvektiver Skala im Bereich der K&#252;rzestfristvorhersage entwickelt. Um die optimale Kombination der bisher unabh&#228;ngigen Systeme von Nowcasting und numerischer Wetterverhorsage zu erleichtern, wurde mit STEPS-DWD eine Adaption des weitverbreiteten STEPS (u.a. Seed 2003, Bowler et al., 2006) als Nowcast-Ensemble in den Testbetrieb &#252;berf&#252;hrt. Basis der NWV ist ICON-D2-RUC, welches derzeit st&#252;ndlich initialisiert&#160; Ensemblevorhersagen bis +8h Stunden mit einer horizontalen Aufl&#246;sung von 2,2km liefert. Kernkomponenten dieser Modellversion sind die Nutzung eines Zwei-Momenten-Mikrophysikschemas sowie die zus&#228;tzliche Assimilation von hochaufgel&#246;sten Fernerkundungsdaten wie 3D-Radardaten und Meteosat-SEVIRI-Daten. Auf Basis der zwei vorangenannten Ensemblesysteme STEPS-DWD und ICON-D2-RUC werden zwei Methoden zur Kombination der Vorhersagen dieser Systeme pr&#228;sentiert. In einem ersten Verfahren wird die Methode nach Nerini et al., 2019 adaptiert. Hierbei werden die Vorhersagen von Reflektivit&#228;ten und Regenraten im physischen Raum auf Basis eines Ensemble-Kalmanfilters kombiniert. Durch eine zeitlich und r&#228;umliche Aufl&#246;sung von f&#252;nf Minuten bzw. 1x1km wird unter Beibehaltung eines realistischen Aussehens der Niederschlagssysteme eine M&#246;glichkeit zur Absch&#228;tzung der weiteren Entwicklung bis +6h geschaffen. Weiterhin wird eine neue statistische Methode vorgestellt, mit der prognostizierte Niederschlagssummen auf Basis Neuronaler Netze (NN) im Wahrscheinlichkeitsraum kombiniert werden (vgl. Schaumann et al., 2021). Ziel ist es, mit einem Training sowohl nahtlose und kalibrierte Vorhersagen zu erhalten, als auch konsistente &#220;berschreitungswahrscheinlichkeiten gegen&#252;ber allen Schwellwerten zu erreichen. F&#252;r die Optimierung wurden drei Datens&#228;tze von jeweils drei Monaten verwendet, wobei die Datens&#228;tze A & B Ensemble-MOS und RadVOR mit einer jeweiligen horizontalen Aufl&#246;sung von 20km beinhalten. In Datensatz C werden Vorhersagen eines dreist&#252;ndig initialisierten ICON-D2-RUC sowie STEPS-DWD mit einer Aufl&#246;sung von 2,2km verwendet. Die Hyperparameter der NN wurden mit Datensatz A optimiert und die daraus resultierenden NN mittels Rolling Origin Validation auf Datensatz B & C validiert. Hieraus werden Vorhersagen mit einer zeitlichen Aufl&#246;sung von 1h bis +6h erzeugt. F&#252;r beide Verfahren wird durch mehrere Verifikationsmetriken (FSS, Bias, Brier Skill Score, Reliability und Reliability-Diagramm) gezeigt, dass die kombinierten Vorhersagen f&#252;r alle Vorhersagezeiten gleich oder besser als die der individuellen Systeme sind.

Download Full-text

Observed score reliability indices in diagnostic classification models

Behaviormetrika ◽

10.1007/s41237-021-00153-9 ◽

2021 ◽

Author(s):

Kazuhiro Yamaguchi ◽

Jonathan Templin

Keyword(s):

Diagnostic Classification ◽

Classification Models ◽

Diagnostic Classification Models ◽

Reliability Indices ◽

Score Reliability ◽

Observed Score

Download Full-text

The Academic Buoyancy Scale: Measurement Invariance across Culture and Gender in Egyptian and Omani Undergraduates

European Journal of Educational Research ◽

10.12973/eu-jer.10.4.2121 ◽

2021 ◽

Vol 10 (4) ◽

pp. 2121-2131

Author(s):

Mustafa Ali ◽

Mohammed A.

Keyword(s):

Measurement Invariance ◽

A Priori ◽

Evidence Based ◽

Validity Evidence ◽

Academic Buoyancy ◽

Multiple Group ◽

Confirmatory Factor ◽

Score Reliability ◽

And Gender ◽

Gender Groups

The academic buoyancy scale (ABS) is one of the most widely used instruments for measuring academic buoyancy. To obtain meaningful and valid comparisons across groups using ABS, however, measurement invariance should be ascertained a priori. To that end, we examined its measurement invariance, validity evidence based on relations to other variables, and score reliability using categorical omega across culture and gender among Egyptian and Omani undergraduates. Participants were 345 college students: Egyptian sample (N=191) and Omani sample (N=154). To assess measurement invariance across culture and gender, multiple–group confirmatory factor analysis was performed with four successive invariance models: (a) configural, (b) metric, (c) scalar, and (d) residual. Results revealed that the unidimensional baseline model had adequate fit to the data in the full sample. Moreover, measurement invariance was found to hold across culture but not across gender and consequently the ABS could be used to yield valid cross-cultural comparisons between the Egyptian and Omani students. Conversely, it cannot be used to yield valid inferences related to comparing gender groups within each culture. Validity evidence based on relations to other variables was supported by the significantly moderate correlation between ABS and academic achievement (GPA; r =.435 and r = .457, P < .01) for the Egyptian and Omani samples, respectively. With regard to score reliability, categorical omega coefficients were moderate across both samples. Educational and psychological implications, limitations and suggestions for improving the scale are discussed.

Download Full-text

Psychometric Properties and Validation of the Polish Version of the 12-Item World Health Organization Disability Assessment Schedule 2.0 in Patients with Huntington’s Disease

Journal of Clinical Medicine ◽

10.3390/jcm10051053 ◽

2021 ◽

Vol 10 (5) ◽

pp. 1053

Author(s):

Agnieszka Ćwirlej-Sozańska ◽

Bernard Sozański ◽

Mateusz Kupczyk ◽

Justyna Leszczak ◽

Andrzej Kwolek ◽

...

Keyword(s):

Quality Of Life ◽

Huntington's Disease ◽

Psychometric Properties ◽

Huntington’S Disease ◽

Scale Score ◽

World Health ◽

Disability Assessment ◽

Whodas 2.0 ◽

Score Reliability

Background: Huntington’s disease is a progressive neurodegenerative disorder that usually manifests in adulthood and is inherited in an autosomal dominant manner. The main aim of the study was to assess the psychometric properties of the 12-item WHO Disability Assessment Schedule (WHODAS) 2.0 in studying the level of disability in people with Huntington’s disease. Method: This is a cross-sectional study that covered 128 people with Huntington’s disease living in Poland. We examined scale score reliability, internal consistency, convergent validity, and known-group validity. The disability and quality of life of people with Huntington’s disease were also assessed. Results: The scale score reliability of the entire tool for the research group was high. The Cronbach’s α test result for the whole scale was 0.97. Cronbach’s α for individual domains ranged from 0.95 to 0.79. Time consistency for the overall result was 0.99 and for particular domains ranged from 0.91 to 0.99, which confirmed that the scale was consistent over time. All of the 12-item WHODAS 2.0 domains negatively correlated with all of the Huntington Quality of Life Instrument (H-QoL-I) domains. All correlation coefficients were statistically significant at the level of p < 0.001. The results obtained in the linear regression model showed that with each subsequent point of decrease in BMI the level of disability increases by an average of 0.83 points on the 12-item WHODAS 2.0 scale. With each subsequent year of the disease, the level of disability increases by an average of 1.39 points. Conclusions: This is the first study assessing disability by means of the WHODAS 2.0 in the HD patient population in Poland, and it is also one of the few studies evaluating the validity of the WHODAS 2.0 scale in assessing the disability of people with HD in accordance with the recommendations of DSM-5 (R). We have confirmed that the 12-item WHODAS 2.0 is an effective tool for assessing disability and changes in functioning among people with Huntington’s disease.

Download Full-text

Estimating Difference-Score Reliability in Pretest–Posttest Settings

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620986948 ◽

2021 ◽

pp. 107699862098694

Author(s):

Zhengguo Gu ◽

Wilco H. M. Emons ◽

Klaas Sijtsma

Keyword(s):

Eating Disorder ◽

Traditional Method ◽

Simulated Data ◽

Difference Score ◽

Difference Scores ◽

Intraindividual Change ◽

Item Level ◽

Score Reliability ◽

Level Method ◽

Anxiety Depression

Clinical, medical, and health psychologists use difference scores obtained from pretest–posttest designs employing the same test to assess intraindividual change possibly caused by an intervention addressing, for example, anxiety, depression, eating disorder, or addiction. Reliability of difference scores is important for interpreting observed change. This article compares the well-documented traditional method and the unfamiliar, rarely used item-level method for estimating difference-score reliability. We simulated data under various conditions that are typical of change assessment in pretest–posttest designs. The item-level method had smaller bias and greater precision than the traditional method and may be recommended for practical use.

Download Full-text

Observed Score Reliability Indices in Diagnostic Classification Models

10.31234/osf.io/6hvst ◽

2020 ◽

Author(s):

Kazuhiro Yamaguchi ◽

Jonathan Templin

Keyword(s):

Latent Variable ◽

Empirical Test ◽

Classical Test Theory ◽

Test Theory ◽

Diagnostic Classification ◽

Classification Models ◽

Diagnostic Classification Models ◽

Reliability Indices ◽

Score Reliability ◽

Observed Score

Quantifying the reliability of latent variable estimates in diagnostic classification models has been a difficult topic, complicated by the classification-based nature of these models. In this study, we derive observed score reliability indices based on diagnostic classification models as an extension of classical test theory-based reliability. Additionally, we derive conditional observed sum- and sub-score distributions. In this manner, various conditional expectations and conditional standard error of measurement estimates can be calculated for both total- and sub-scores of a test. The proposed methods provide a variety of expectations and standard errors for attribute estimates, which we demonstrate in an analysis of an empirical test.

Download Full-text

Evaluating the Internal Consistency of Subtraction-Based and Residualized Difference Scores: Considerations for Studies of Event-Related Potentials

10.31234/osf.io/nqwz6 ◽

2020 ◽

Author(s):

Peter E Clayson ◽

Scott Baldwin ◽

Michael J. Larson

Keyword(s):

Individual Differences ◽

Internal Consistency ◽

Neural Activity ◽

Generalizability Theory ◽

Classical Test Theory ◽

Test Theory ◽

Difference Score ◽

Difference Scores ◽

Classical Test ◽

Score Reliability

In studies of event-related brain potentials (ERPs), difference scores between conditions in a task are frequently used to isolate neural activity for use as a dependent or independent variable. Adequate score reliability is a prerequisite for studies examining relationships between ERPs and external correlates, but there is a widely held view that difference scores are inherently unreliable and unsuitable for studies of individual differences. This view fails to consider the nuances of difference score reliability that are relevant to ERP research. In the present study, we provide formulas from classical test theory and generalizability theory for estimating the internal consistency of subtraction-based and residualized difference scores. These formulas are then applied to error-related negativity (ERN) and reward positivity (RewP) difference scores from the same sample of 117 participants. Analyses demonstrate that ERN difference scores can be reliable, which supports their use in studies of individual differences. However, RewP difference scores yielded poor reliability due to the high correlation between the constituent reward and non-reward ERPs. Findings emphasize that difference score reliability largely depends on the internal consistency of constituent scores and the correlation between those scores. Furthermore, generalizability theory estimates yielded higher internal consistency estimates for subtraction-based difference scores than classical test theory estimates did. Despite some beliefs that difference scores are inherently unreliable, ERP difference scores can show adequate reliability and be useful for isolating neural activity in studies of individual differences.

Download Full-text

Using Generalizability Theory and the ERP Reliability Analysis (ERA) Toolbox for Assessing Test-Retest Reliability of ERP Scores Part 1: Algorithms, Framework, and Implementation

10.31234/osf.io/kcven ◽

2020 ◽

Author(s):

Peter E Clayson ◽

Kaylie Amanda Carbine ◽

Scott Baldwin ◽

Joseph A. Olsen ◽

Michael J. Larson

Keyword(s):

Reliability Analysis ◽

Internal Consistency ◽

Temporal Stability ◽

Test Theory ◽

Theory Approach ◽

Retest Reliability ◽

Score Reliability ◽

Test Retest Reliability ◽

The Impact ◽

Reliability Coefficients

The reliability of event-related brain potential (ERP) scores depends on study context and how those scores will be used, and reliability must be routinely evaluated. Many factors can influence ERP score reliability, and generalizability (G) theory provides a multifaceted approach to estimating the internal consistency and temporal stability of scores that is well suited for ERPs. G-theory’s approach possesses a number of advantages over classical test theory that make it ideal for pinpointing sources of error in scores. The current primer outlines the G-theory approach to estimating internal consistency (coefficients of equivalence) and test-retest reliability (coefficients of stability). This approach is used to evaluate the reliability of ERP measurements. The primer outlines how to estimate reliability coefficients that consider the impact of the number of trials, events, occasion, and group. The uses of two different G-theory reliability coefficients (i.e., generalizability and dependability) in ERP research are elaborated, and a dataset from the companion manuscript, which examines N2 amplitudes to Go/NoGo stimuli, is used as an example of the application of these coefficients to ERPs. The developed algorithms are implemented in the ERP Reliability Analysis (ERA) Toolbox, which is open-source software designed for estimating score reliability using G theory. The toolbox facilitates the application of G theory in an effort to simplify the study-by-study evaluation of ERP score reliability. The formulas provided in this primer should enable researchers to pinpoint the sources of measurement error in ERP scores from multiple recording sessions and subsequently plan studies that optimize score reliability.

Download Full-text

Score Reliability and Validity Evidence for the State-Interpersonal Reactivity Index: A Multidimensional Assessment of In-Session Counselor Empathy

Measurement and Evaluation in Counseling and Development ◽

10.1080/07481756.2020.1745652 ◽

2020 ◽

pp. 1-18

Author(s):

David A. Johnson ◽

Danielle N. Knight ◽

Kelsey McHugh

Keyword(s):

Reliability And Validity ◽

The State ◽

Reactivity Index ◽

Validity Evidence ◽

Interpersonal Reactivity Index ◽

Multidimensional Assessment ◽

Score Reliability

Download Full-text

Spondyloarthritis Research Consortium of Canada sacroiliac joint inflammation and structural scores: change score reliability and recalibration utility in children

Arthritis Research & Therapy ◽

10.1186/s13075-020-02157-4 ◽

2020 ◽

Vol 22 (1) ◽

Cited By ~ 2

Author(s):

Pamela F. Weiss ◽

Walter P. Maksymowych ◽

Rui Xiao ◽

David M. Biko ◽

Michael L. Francavilla ◽

...

Keyword(s):

Sacroiliac Joint ◽

Joint Inflammation ◽

Change Score ◽

Score Reliability ◽

Research Consortium

Download Full-text

score reliability
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Kombination von NWV- und Nowcasting-Ensembles zur Verbesserung der Vorhersagen konvektiven Starkniederschlags des DWD im Rahmen von SINFONY

Observed score reliability indices in diagnostic classification models

The Academic Buoyancy Scale: Measurement Invariance across Culture and Gender in Egyptian and Omani Undergraduates

Psychometric Properties and Validation of the Polish Version of the 12-Item World Health Organization Disability Assessment Schedule 2.0 in Patients with Huntington’s Disease

Estimating Difference-Score Reliability in Pretest–Posttest Settings

Observed Score Reliability Indices in Diagnostic Classification Models

Evaluating the Internal Consistency of Subtraction-Based and Residualized Difference Scores: Considerations for Studies of Event-Related Potentials

Using Generalizability Theory and the ERP Reliability Analysis (ERA) Toolbox for Assessing Test-Retest Reliability of ERP Scores Part 1: Algorithms, Framework, and Implementation

Score Reliability and Validity Evidence for the State-Interpersonal Reactivity Index: A Multidimensional Assessment of In-Session Counselor Empathy

Spondyloarthritis Research Consortium of Canada sacroiliac joint inflammation and structural scores: change score reliability and recalibration utility in children

Export Citation Format

score reliabilityRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Kombination von NWV- und Nowcasting-Ensembles zur Verbesserung der Vorhersagen konvektiven Starkniederschlags des DWD im Rahmen von SINFONY

Observed score reliability indices in diagnostic classification models

The Academic Buoyancy Scale: Measurement Invariance across Culture and Gender in Egyptian and Omani Undergraduates

Psychometric Properties and Validation of the Polish Version of the 12-Item World Health Organization Disability Assessment Schedule 2.0 in Patients with Huntington’s Disease

Estimating Difference-Score Reliability in Pretest–Posttest Settings

Observed Score Reliability Indices in Diagnostic Classification Models

Evaluating the Internal Consistency of Subtraction-Based and Residualized Difference Scores: Considerations for Studies of Event-Related Potentials

Using Generalizability Theory and the ERP Reliability Analysis (ERA) Toolbox for Assessing Test-Retest Reliability of ERP Scores Part 1: Algorithms, Framework, and Implementation

Score Reliability and Validity Evidence for the State-Interpersonal Reactivity Index: A Multidimensional Assessment of In-Session Counselor Empathy

Spondyloarthritis Research Consortium of Canada sacroiliac joint inflammation and structural scores: change score reliability and recalibration utility in children

score reliability
Recently Published Documents