scoring accuracy
Recently Published Documents


TOTAL DOCUMENTS

65
(FIVE YEARS 30)

H-INDEX

9
(FIVE YEARS 1)

2022 ◽  
Author(s):  
Miranda Julia Say ◽  
Ciarán O'Driscoll

Background: Despite its wide use in dementia diagnosis on the basis of cut-off points, the inter-rater variability of the ACE-III has been poorly studied.Methods: 31 healthcare professionals from an older adults’ mental health team scored two ACE-III protocols based on mock patients in a computerised form. Scoring accuracy, as well as total and domain-specific scoring variability, were calculated; factors relevant to participants were obtained, including their level of experience and self-rated confidence administering the ACE-III.Results: There was considerable inter-rater variability (up to 18 points for one of the cases), and one case’s mean score was significantly higher (by four points) than the true score. The Fluency, Visuospatial and Attention domains had greater levels of variability than Language and Memory. Higher levels of scoring accuracy were not associated with either greater levels of experience not higher self-confidence in administering the ACE-III.Conclusions: The results suggest that the ACE-III is susceptible to scoring error and considerable inter-rater variability, which highlights the critical importance of initial, and continued, administration and scoring training.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Raphael Vallat ◽  
Matthew P Walker

The clinical and societal measurement of human sleep has increased exponentially in recent years. However, unlike other fields of medical analysis that have become highly automated, basic and clinical sleep research still relies on human visual scoring. Such human-based evaluations are time-consuming, tedious, and can be prone to subjective bias. Here, we describe a novel algorithm trained and validated on +30,000 hr of polysomnographic sleep recordings across heterogeneous populations around the world. This tool offers high sleep-staging accuracy that matches human scoring accuracy and interscorer agreement no matter the population kind. The software is designed to be especially easy to use, computationally low-demanding, open source, and free. Our hope is that this software facilitates the broad adoption of an industry-standard automated sleep staging software package.


2021 ◽  
pp. 014662162110517
Author(s):  
Seang-Hwane Joo ◽  
Philseok Lee ◽  
Stephen Stark

Collateral information has been used to address subpopulation heterogeneity and increase estimation accuracy in some large-scale cognitive assessments. The methodology that takes collateral information into account has not been developed and explored in published research with models designed specifically for noncognitive measurement. Because the accurate noncognitive measurement is becoming increasingly important, we sought to examine the benefits of using collateral information in latent trait estimation with an item response theory model that has proven valuable for noncognitive testing, namely, the generalized graded unfolding model (GGUM). Our presentation introduces an extension of the GGUM that incorporates collateral information, henceforth called Explanatory GGUM. We then present a simulation study that examined Explanatory GGUM latent trait estimation as a function of sample size, test length, number of background covariates, and correlation between the covariates and the latent trait. Results indicated the Explanatory GGUM approach provides scoring accuracy and precision superior to traditional expected a posteriori (EAP) and full Bayesian (FB) methods. Implications and recommendations are discussed.


2021 ◽  
pp. 002224372110525
Author(s):  
Ishita Chakraborty ◽  
Minkyung Kim ◽  
K. Sudhir

The authors address two significant challenges in using online text reviews to obtain finegrained attribute level sentiment ratings. First, in contrast to methods that rely on word frequency, they develop a deep learning convolutional-LSTM hybrid model to account for language structure. The convolutional layer accounts for spatial structure (adjacent word groups or phrases) and LSTM accounts for sequential structure of language (sentiment distributed and modified across non-adjacent phrases). Second, they address the problem of missing attributes in text in constructing attribute sentiment scores—as reviewers write only about a subset of attributes and remain silent on others. They develop a model-based imputation strategy using a structural model of heterogeneous rating behavior. Using Yelp restaurant review data, they show superior attribute sentiment scoring accuracy with their model. They find three reviewer segments with different motivations: status seeking, altruism/want voice, and need to vent/praise. Reviewers write to inform and vent/praise, but not based on attribute importance. The heterogeneous model-based imputation performs better than other common imputations; and importantly leads to managerially significant corrections in restaurant attribute ratings. More broadly, our results suggest that social science research should pay more attention to reduce measurement error in variables constructed from text.


2021 ◽  
Author(s):  
Lucy M Carter ◽  
Caroline Gordon ◽  
Chee Seng Yee ◽  
Ian Bruce ◽  
David Isenberg ◽  
...  

AbstractObjectiveBILAG-2004 index is a comprehensive disease activity instrument for SLE but administrative burden and frequency of errors limits its use in routine practice. We aimed to develop a tool for more accurate, time-efficient scoring of BILAG-2004 index with full fidelity to the existing instrument.MethodsFrequency of BILAG-2004 items was collated from a BILAG-biologics registry (BILAG-BR) dataset. Easy-BILAG prototypes were drafted to address known issues affecting speed and accuracy. After expert-verification, accuracy and usability of the finalised Easy-BILAG was validated against standard format BILAG-2004 index in a workbook exercise of 10 case vignettes. 33 professionals with a range of expertise from 14 UK centres completed the validation exercise.ResultsEasy-BILAG incorporates all items present in ≥5% BILAG-BR records, plus full constitutional and renal domains into a rapid single-page assessment. An embedded glossary and colour-coding assists scoring each domain. A second page captures rarer manifestations when needed. In the validation exercise, Easy-BILAG yielded higher median scoring accuracy (96.7%) than standard BILAG-2004 documentation (87.8%, p=0.001), with better inter-rater agreement. Easy-BILAG was completed faster (59.5min) than the standard format (80.0min, p=0.04) for 10 cases. An advantage in accuracy was observed with Easy-BILAG use among general hospital rheumatologists (91.3 vs 75.0, p=0.02), leading to equivalent accuracy as tertiary centre rheumatologists. Clinicians rated Easy-BILAG as intuitive, convenient, and well adapted for routine practice.ConclusionEasy-BILAG facilitates more rapid and accurate scoring of BILAG-2004 across all clinical settings which could improve patient care and biologics prescribing. Easy-BILAG should be adopted wherever BILAG-2004 assessment is required.


Author(s):  
Floris Wardenaar ◽  
Scott Armistead ◽  
Kayla Boeckman ◽  
Brooke Butterick ◽  
Darya Youssefi ◽  
...  

Context: Urine color (Uc) is used to asses urine concentration when lab techniques are not feasible. Objective: To compare the accuracy of Uc scoring using four different light conditions and two different scoring techniques with a 7-color Uc chart. Additionally to assess the results' generalizability, a subsample was compared to scores obtained from fresh samples. Design: Descriptive laboratory study. Samples: 178 previously frozen urine samples were scored and n=78 samples were compared to its own fresh outcomes. Main outcome measure: Urine color and accuracy for classifying urine samples were calculated using receiver operating characteristics (ROC) analysis, allowing to compare the diagnostic capacity against a 1.020 urine specific gravity (USG) cut-off and defining optimal Uc cut-off value. Results: Uc was significantly different between light conditions (P<0.01), with the highest accuracy (80.3%) of correctly classifications of low or high urine concentrations occurring at the brightest light condition. Lower light intensity scored 1.5–2 shades darker on a 7-color Uc scale than bright conditions (P<0.001), with urine color but no further practical differences for accuracy between scoring techniques. Frozen was 0.5–1 shade darker than freshly measured Uc (P<0.004), but they were moderately correlated (r=0.64). A Bland-Altman plot showed that reporting bias mainly affects darker Uc without impacting the diagnostic ability of the method. Conclusions: Uc scoring, accuracy and Uc cut-off values are affected by lighting condition but not by scoring technique, with higher accuracy and a one-shade lower Uc cut-off value at the brightest light (i.e. LED flashlight).


2021 ◽  
Author(s):  
Brendan Colvert ◽  
Marzia Rigolli ◽  
Amanda Craine ◽  
Michael Criqui ◽  
Francisco Contijoch

Purpose: Cardiac CT has a clear clinical role in the evaluation of coronary artery disease and assessment of coronary artery calcium (CAC) but the use of ionizing radiation limits clinical use. Beam shaping 'bow-tie' filters determine the radiation dose and the effective scan field-of-view diameter (SFOV) by delivering higher X-ray fluence to a region centered at the isocenter. A method for positioning the heart near the isocenter could enable reduced SFOV imaging and reduce dose in cardiac scans. However, a predictive approach to center the heart, the extent to which heart centering can reduce the SFOV, and the associated dose reductions have not been assessed. The purpose of this study is to build a heart-centered patient positioning model, to test whether it reduces the SFOV required for accurate CAC scoring, and to quantify the associated reduction in radiation dose. Methods: The location of 38,184 calcium lesions (3,151 studies) in the Multi-Ethnic Study of Atherosclerosis (MESA) were utilized to build a predictive heart-centered positioning model and compare the impact of SFOV on CAC scoring accuracy in heart-centered and conventional body-centered scanning. Then, the positioning model was applied retrospectively to an independent, contemporary cohort of 118 individuals (81 with CAC>0) at our institution to validate the model's ability to maintain CAC accuracy while reducing the SFOV. In these patients, the reduction in dose associated with a reduced SFOV beam-shaping filter was quantified. Results: Heart centering reduced the SFOV diameter 25.7% relative to body centering while maintaining high CAC scoring accuracy (0.82% risk reclassification rate). In our validation cohort, imaging at this reduced SFOV with heart-centered positioning and tailored beam-shaping filtration led to a 26.9% median dose reduction (25-75th percentile: 21.6 to 29.8%) without any calcium risk reclassification. Conclusions: Heart-centered patient positioning enables a significant radiation dose reduction while maintaining CAC accuracy.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Xinran Wang ◽  
Liang Wang ◽  
Hong Bu ◽  
Ningning Zhang ◽  
Meng Yue ◽  
...  

AbstractProgrammed death ligand-1 (PD-L1) expression is a key biomarker to screen patients for PD-1/PD-L1-targeted immunotherapy. However, a subjective assessment guide on PD-L1 expression of tumor-infiltrating immune cells (IC) scoring is currently adopted in clinical practice with low concordance. Therefore, a repeatable and quantifiable PD-L1 IC scoring method of breast cancer is desirable. In this study, we propose a deep learning-based artificial intelligence-assisted (AI-assisted) model for PD-L1 IC scoring. Three rounds of ring studies (RSs) involving 31 pathologists from 10 hospitals were carried out, using the current guideline in the first two rounds (RS1, RS2) and our AI scoring model in the last round (RS3). A total of 109 PD-L1 (Ventana SP142) immunohistochemistry (IHC) stained images were assessed and the role of the AI-assisted model was evaluated. With the assistance of AI, the scoring concordance across pathologists was boosted to excellent in RS3 (0.950, 95% confidence interval (CI): 0.936–0.962) from moderate in RS1 (0.674, 95% CI: 0.614–0.735) and RS2 (0.736, 95% CI: 0.683–0.789). The 2- and 4-category scoring accuracy were improved by 4.2% (0.959, 95% CI: 0.953–0.964) and 13% (0.815, 95% CI: 0.803–0.827) (p < 0.001). The AI results were generally accepted by pathologists with 61% “fully accepted” and 91% “almost accepted”. The proposed AI-assisted method can help pathologists at all levels to improve the PD-L1 assay (SP-142) IC assessment in breast cancer in terms of both accuracy and concordance. The AI tool provides a scheme to standardize the PD-L1 IC scoring in clinical practice.


2021 ◽  
pp. 1-38
Author(s):  
Li-Ping Yang ◽  
Tao Xin ◽  
Fang Luo ◽  
Sheng Zhang ◽  
Xue-Tao Tian

Abstract Nowadays, automated essay evaluation (AEE) systems play an important role in evaluating essays and have been successfully used in large-scale writing assessments. However, existing AEE systems mostly focus on grammar or shallow content measurements rather than higher-order traits such as ideas. This paper proposes a new formulation of graph-based features for concept maps using word embeddings to evaluate the quality of ideas for Chinese compositions. The concept map derived from the student’s composition is composed of the concepts appearing in the essay and the co-occurrence relationship between the concepts. By utilizing real compositions written by eighth-grade students from a large-scale assessment, the scoring accuracy of the computer evaluation system (named AECC-I: Automated Evaluation for Chinese Compositions—Ideas) is higher than the baselines. The results indicate that the proposed method deepens the construct-relevant coverage of automatic ideas evaluation in compositions and that it can provide constructive feedback for students.


2021 ◽  
Author(s):  
Robert N Collins ◽  
David R. Mandel ◽  
Christopher W. Karvetski ◽  
Charley M Wu ◽  
Jonathan D. Nelson

Previous research shows that variation in coherence (i.e., degrees of respect for axioms of probability calculus), when used as a basis for performance-weighted aggregation, can improve the accuracy of probability judgments. However, many aspects of coherence-weighted aggregation remain a mystery, including both prescriptive issues (e.g. how best to use coherence measures) and theoretical issues (e.g. why coherence-weighted aggregation is effective). Using data from previous experiments employing either general-knowledge or statistical information-integration tasks, we addressed many of these issues. Of prescriptive relevance, we examined the effectiveness of coherence-weighted aggregation as a function of judgment elicitation method, group size, weighting function, and the aggressivity of the function’s tuning parameter. Of descriptive relevance, we propose that coherence-weighted aggregation can improve accuracy via two distinct, task-dependent routes: a deterministic route in which the bases for scoring accuracy depend on conformity to coherence principles (e.g., Bayesian information integration) and a diagnostic route in which coherence serves as a cue to correct knowledge. The findings provide support for the efficacy of both routes, but they also highlight why coherence weighting, especially the most aggressive forms, sometimes impose costs to accuracy. We conclude by sketching a decision-theoretic approach to how the wisdom of the coherent within the wisdom of the crowd can be sensibly leveraged.


Sign in / Sign up

Export Citation Format

Share Document