An Iterative Parametric Bootstrap Approach to Evaluating Rater Fit

2021 ◽  
pp. 014662162110131
Author(s):  
Wenjing Guo ◽  
Stefanie A. Wind

When analysts evaluate performance assessments, they often use modern measurement theory models to identify raters who frequently give ratings that are different from what would be expected, given the quality of the performance. To detect problematic scoring patterns, two rater fit statistics, the infit and outfit mean square error ( MSE) statistics are routinely used. However, the interpretation of these statistics is not straightforward. A common practice is that researchers employ established rule-of-thumb critical values to interpret infit and outfit MSE statistics. Unfortunately, prior studies have shown that these rule-of-thumb values may not be appropriate in many empirical situations. Parametric bootstrapped critical values for infit and outfit MSE statistics provide a promising alternative approach to identifying item and person misfit in item response theory (IRT) analyses. However, researchers have not examined the performance of this approach for detecting rater misfit. In this study, we illustrate a bootstrap procedure that researchers can use to identify critical values for infit and outfit MSE statistics, and we used a simulation study to assess the false-positive and true-positive rates of these two statistics. We observed that the false-positive rates were highly inflated, and the true-positive rates were relatively low. Thus, we proposed an iterative parametric bootstrap procedure to overcome these limitations. The results indicated that using the iterative procedure to establish 95% critical values of infit and outfit MSE statistics had better-controlled false-positive rates and higher true-positive rates compared to using traditional parametric bootstrap procedure and rule-of-thumb critical values.

2011 ◽  
Vol 35 (2) ◽  
pp. 180-190 ◽  
Author(s):  
Rens van de Schoot ◽  
Dagmar Strohmeier

In the present paper, the application of a parametric bootstrap procedure, as described by van de Schoot, Hoijtink, and Deković (2010), will be applied to demonstrate that a direct test of an informative hypothesis offers more informative results compared to testing traditional null hypotheses against catch-all rivals. Also, more power can be gained when informative hypotheses are tested directly. In this paper we will (a) compare the results of traditional analyses with the results of this novel methodology; (b) introduce applied researchers to the parametric bootstrap procedure for the evaluation of informative hypotheses; and (c) provide the results of a simulation study to demonstrate power gains when using inequality constraints. We argue that researchers should directly evaluate inequality-constrained hypotheses if there is a strong theory about the ordering of relevant parameters. In this way, researchers can make use of all knowledge available from previous investigations, while also learning more from their data compared to traditional null-hypothesis testing.


2017 ◽  
Vol 41 (5) ◽  
pp. 372-387 ◽  
Author(s):  
R. Philip Chalmers ◽  
Victoria Ng

When tests consist of a small number of items, the use of latent trait estimates for secondary analyses is problematic. One area in particular where latent trait estimates have been problematic is when testing for item misfit. This article explores the use of plausible-value imputations to lessen the severity of the inherent measurement unreliability in shorter tests, and proposes a parametric bootstrap procedure to generate empirical sampling characteristics for null-hypothesis tests of item fit. Simulation results suggest that the proposed item-fit statistics provide conservative to nominal error detection rates. Power to detect item misfit tended to be less than Stone’s [Formula: see text] item-fit statistic but higher than the [Formula: see text] statistic proposed by Orlando and Thissen, especially in tests with 20 or more dichotomously scored items.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1894
Author(s):  
Chun Guo ◽  
Zihua Song ◽  
Yuan Ping ◽  
Guowei Shen ◽  
Yuhei Cui ◽  
...  

Remote Access Trojan (RAT) is one of the most terrible security threats that organizations face today. At present, two major RAT detection methods are host-based and network-based detection methods. To complement one another’s strengths, this article proposes a phased RATs detection method by combining double-side features (PRATD). In PRATD, both host-side and network-side features are combined to build detection models, which is conducive to distinguishing the RATs from benign programs because that the RATs not only generate traffic on the network but also leave traces on the host at run time. Besides, PRATD trains two different detection models for the two runtime states of RATs for improving the True Positive Rate (TPR). The experiments on the network and host records collected from five kinds of benign programs and 20 famous RATs show that PRATD can effectively detect RATs, it can achieve a TPR as high as 93.609% with a False Positive Rate (FPR) as low as 0.407% for the known RATs, a TPR 81.928% and FPR 0.185% for the unknown RATs, which suggests it is a competitive candidate for RAT detection.


2018 ◽  
Vol 10 (9) ◽  
pp. 83 ◽  
Author(s):  
Wentao Wang ◽  
Xuan Ke ◽  
Lingxia Wang

A data center network is vulnerable to suffer from concealed low-rate distributed denial of service (L-DDoS) attacks because its data flow has the characteristics of data flow delay, diversity, and synchronization. Several studies have proposed addressing the detection of L-DDoS attacks, most of them are only detect L-DDoS attacks at a fixed rate. These methods cause low true positive and high false positive in detecting multi-rate L-DDoS attacks. Software defined network (SDN) is a new network architecture that can centrally control the network. We use an SDN controller to collect and analyze data packets entering the data center network and calculate the Renyi entropies base on IP of data packets, and then combine them with the hidden Markov model to get a probability model HMM-R to detect L-DDoS attacks at different rates. Compared with the four common attack detection algorithms (KNN, SVM, SOM, BP), HMM-R is superior to them in terms of the true positive rate, the false positive rate, and the adaptivity.


Neurology ◽  
2021 ◽  
pp. 10.1212/WNL.0000000000011789
Author(s):  
Hiroya NISHIDA ◽  
Kuniko KOHYAMA ◽  
Satoko KUMADA ◽  
Jun-ichi TAKANASHI ◽  
Akihisa OKUMURA ◽  
...  

OBJECTIVE:To evaluate the validity of the 2016 clinical diagnostic criteria proposed for probable anti-NMDA receptor (NMDAR) encephalitis in children, we tested the criteria in a Japanese pediatric cohort.METHODS:We retrospectively reviewed clinical information of patients with neurological symptoms whose CSF were analyzed for NMDAR antibodies (Abs) in our laboratory from January 1, 2015, to March 31, 2019.RESULTS:Overall, 137 cases were included. Of the 41 cases diagnosed as probable anti-NMDAR encephalitis (“criteria-positive”) according to the 2016 criteria, 13 were positive and 28 were negative for anti-NMDAR Abs. Of the 96 criteria-negative cases, three were positive and 93 were negative for anti-NMDAR Abs. The sensitivity of the criteria was 81.2%, specificity was 76.9%, positive predictive value (PPV) was 31.7%, and negative predictive value was 96.9%. Compared with the true-positive group, the false-positive group contained more male than female patients (male:female, 4:9 in the true-positive vs. 19:9 in the false-positive group, p = 0.0425). The majority of the cases with false-positive diagnoses were associated with neurological autoimmunity.CONCLUSION:The clinical diagnostic criteria are reliable for deciding to start immunomodulatory therapy in the criteria-positive cases. Low PPV may be caused by a lower prevalence of NMDAR encephalitis and/or lower level of suspicion for encephalitis in the pediatric population. Physicians should therefore continue differential diagnosis, focusing especially on other forms of encephalitis.Classification of Evidence:This study provides Class IV evidence that the proposed diagnostic criteria for anti-NMDAR encephalitis in children has a sensitivity of 81.2% and a specificity of 76.9%.


1979 ◽  
Vol 25 (12) ◽  
pp. 2034-2037 ◽  
Author(s):  
L B Sheiner ◽  
L A Wheeler ◽  
J K Moore

Abstract The percentage of mislabeled specimens detected (true-positive rate) and the percentage of correctly labeled specimens misidentified (false-positive rate) were computed for three previously proposed delta check methods and two linear discriminant functions. The true-positive rate was computed from a set of pairs of specimens, each having one member replaced by a member from another pair chosen at random. The relationship between true-positive and false-positive rates was similar among the delta check methods tested, indicating equal performance for all of them over the range of false-positive rate of interest. At a practical false-positive operating rate of about 5%, delta check methods detect only about 50% of mislabeled specimens; even if the actual mislabeling rate is moderate (e.g., 1%), only abot 10% of specimens flagged a by a delta check will actually have been mislabeled.


2019 ◽  
Vol 19 (1) ◽  
Author(s):  
Don van Ravenzwaaij ◽  
John P. A. Ioannidis

Abstract Background Until recently a typical rule that has often been used for the endorsement of new medications by the Food and Drug Administration has been the existence of at least two statistically significant clinical trials favoring the new medication. This rule has consequences for the true positive (endorsement of an effective treatment) and false positive rates (endorsement of an ineffective treatment). Methods In this paper, we compare true positive and false positive rates for different evaluation criteria through simulations that rely on (1) conventional p-values; (2) confidence intervals based on meta-analyses assuming fixed or random effects; and (3) Bayes factors. We varied threshold levels for statistical evidence, thresholds for what constitutes a clinically meaningful treatment effect, and number of trials conducted. Results Our results show that Bayes factors, meta-analytic confidence intervals, and p-values often have similar performance. Bayes factors may perform better when the number of trials conducted is high and when trials have small sample sizes and clinically meaningful effects are not small, particularly in fields where the number of non-zero effects is relatively large. Conclusions Thinking about realistic effect sizes in conjunction with desirable levels of statistical evidence, as well as quantifying statistical evidence with Bayes factors may help improve decision-making in some circumstances.


2019 ◽  
Vol 41 (06) ◽  
pp. 688-694
Author(s):  
Ron Bardin ◽  
Noga Perl ◽  
Reuven Mashiach ◽  
Eitan Ram ◽  
Sharon Orbach-Zinger ◽  
...  

Abstract Purpose To investigate the accuracy of ultrasound in the diagnosis of adnexal torsion. Materials and Methods Retrospective cohort analysis of 322 women, presenting to a tertiary medical center with acute abdominal pain, who underwent gynecological examination, sonographic evaluation and laparoscopic surgery, between 2010 and 2016. Findings for adnexal torsion were compared among three groups: positive sonographic findings consistent with surgically confirmed adnexal torsion (true positive, n = 228); negative sonographic findings inconsistent with surgically confirmed adnexal torsion (false negative, n = 42); and positive sonographic findings inconsistent with a surgical diagnosis other than adnexal torsion (false positive, n = 52). Outcome measures were sensitivity and positive predictive value of ultrasound, and its specific features, for the diagnosis of adnexal torsion. Results The sensitivity of ultrasound for adnexal torsion diagnosis was 84.4 %, and the positive predictive value was 81.4 %. Edematous ovary and/or tube, as well as positive whirlpool sign had the highest sensitivity and positive predictive value. The false-negative group had the highest frequency of ovarian cysts (p = 0.0086) and the lowest frequency of ovarian edema (p < 0.0001). The false-positive group had the lowest proportion of pregnant women (p = 0.0022). Significantly more women in the true-positive group had a prior event of adnexal torsion (p = 0.026). Conclusion Ultrasound examination is highly accurate in the diagnosis of adnexal torsion. Clinicians should be aware of the presence of demographic and clinical characteristics that may positively or negatively affect sonographic diagnostic accuracy.


Sign in / Sign up

Export Citation Format

Share Document