scholarly journals Validation of automatic passenger counting: introducing the t-test-induced equivalence test

2019 ◽  
Vol 47 (6) ◽  
pp. 3031-3045
Author(s):  
Michael Siebert ◽  
David Ellenberger

Abstract Automatic passenger counting (APC) in public transport has been introduced in the 1970s and has been rapidly emerging in recent years. Still, real-world applications continue to face events that are difficult to classify. The induced imprecision needs to be handled as statistical noise and thus methods have been defined to ensure that measurement errors do not exceed certain bounds. Various recommendations for such an APC validation have been made to establish criteria that limit the bias and the variability of the measurement errors. In those works, the misinterpretation of non-significance in statistical hypothesis tests for the detection of differences (e.g. Student’s t-test) proves to be prevalent, although existing methods which were developed under the term equivalence testing in biostatistics (i.e. bioequivalence trials, Schuirmann in J Pharmacokinet Pharmacodyn 15(6):657–680, 1987) would be appropriate instead. This heavily affects the calibration and validation process of APC systems and has been the reason for unexpected results when the sample sizes were not suitably chosen: Large sample sizes were assumed to improve the assessment of systematic measurement errors of the devices from a user’s perspective as well as from a manufacturers perspective, but the regular t-test fails to achieve that. We introduce a variant of the t-test, the revised t-test, which addresses both type I and type II errors appropriately and allows a comprehensible transition from the long-established t-test in a widely used industrial recommendation. This test is appealing, but still it is susceptible to numerical instability. Finally, we analytically reformulate it as a numerically stable equivalence test, which is thus easier to use. Our results therefore allow to induce an equivalence test from a t-test and increase the comparability of both tests, especially for decision makers.

1980 ◽  
Vol 5 (4) ◽  
pp. 337-349 ◽  
Author(s):  
Philip H. Ramsey

It is noted that disagreements have arisen in the literature about the robustness of the t test in normal populations with unequal variances. Hsu's procedure is applied to determine exact Type I error rates for t. Employing fairly liberal but objective standards for assessing robustness, it is shown that the t test is not always robust to the assumption of equal population variances even when sample sizes are equal. Several guidelines are suggested including the point that to apply t at α = .05 without regard for unequal variances would require equal sample sizes of at least 15 by one of the standards considered. In many cases, especially those with unequal N's, an alternative such as Welch's procedure is recommended.


1994 ◽  
Vol 19 (3) ◽  
pp. 275-291 ◽  
Author(s):  
James Algina ◽  
T. C. Oshima ◽  
Wen-Ying Lin

Type I error rates were estimated for three tests that compare means by using data from two independent samples: the independent samples t test, Welch’s approximate degrees of freedom test, and James’s second-order test. Type I error rates were estimated for skewed distributions, equal and unequal variances, equal and unequal sample sizes, and a range of total sample sizes. Welch’s test and James’s test have very similar Type I error rates and tend to control the Type I error rate as well or better than the independent samples t test does. The results provide guidance about the total sample sizes required for controlling Type I error rates.


2014 ◽  
Vol 3 ◽  
pp. 03.CP.3.1 ◽  
Author(s):  
Duane V. Knudson ◽  
Crawford Lindsey
Keyword(s):  
Type I ◽  
Type Ii ◽  

2020 ◽  
Author(s):  
Alyssa Counsell ◽  
Rob Cribbie

Measurement Invariance (MI) is often concluded from a nonsignificant chi-square difference test. Researchers have also proposed using change in goodness-of-fit indices (ΔGOFs) instead. Both of these commonly used methods for testing MI have important limitations. To combat these issues, To combat these issues, it was proposed using an equivalence test (EQ) to replace the chi-square difference test commonly used to test MI. Due to concerns with the EQ's power, and adjusted version (EQ-A) was created, but provides little evaluation of either procedure. The current study evaluated the Type I error and power of both the EQ and EQ-A, and compared their performance to that of the traditional chi-square difference test and ΔGOFs. The EQ was the only procedure that maintained empirical error rates below the nominal alpha level. Results also highlight that the EQ requires larger sample sizes than traditional difference-based approaches or using equivalence bounds based on larger than conventional RMSEA values (e.g., > .05) to ensure adequate power rates. We do not recommend the proposed adjustment (EQ-A) over the EQ.


2020 ◽  
Author(s):  
Alyssa Counsell ◽  
Rob Cribbie ◽  
David B Flora

Measurement Invariance (MI) is often concluded from a nonsignificant chi-square difference test. Researchers have also proposed using change in goodness of fit indices (∆GOFs) instead. Both of these commonly used methods for testing MI have important limitations. To combat these issues, Yuan and Chan (2016) proposed using an equivalence test (EQ) to replace the chi-square difference test commonly used to test MI. Due to their concerns with the EQ’s power, Yuan and Chan also created an adjusted version (EQ-A), but provide little evaluation of either procedure. The current study evaluated the Type I error and power of both the EQ and EQ-A, and compared their performance to that of the traditional chi-square difference test and ∆GOFs. The EQ for nested model comparisons was the only procedure that always maintained empirical error rates below the nominal alpha level. Results also highlight that the EQ requires larger sample sizes than traditional difference-based approaches or using equivalence bounds based on larger than conventional RMSEA values (e.g., > .05) to ensure adequate power rates. We do not recommend Yuan and Chan’s proposed adjustment (EQ-A) over the EQ.


1995 ◽  
Vol 77 (1) ◽  
pp. 155-159 ◽  
Author(s):  
John E. Overall ◽  
Robert S. Atlas ◽  
Janet M. Gibson

Welch (1947) proposed an adjusted t test that can be used to correct the serious bias in Type I error protection that is otherwise present when both sample sizes and variances are unequal. The implications of the Welch adjustment for power of tests for the difference between two treatments across k levels of a concomitant factor are evaluated in this article for k × 2 designs with unequal sample sizes and unequal variances. Analyses confirm that, although Type I error is uniformly controlled, power of the Welch test of significance for the main effect of treatments remains rather seriously dependent on direction of the correlation between unequal variances and unequal sample sizes. Nevertheless, considering the fact that analysis of variance is not an acceptable option in such cases, the Welch t test appears to have an important role to play in the analysis of experimental data.


2020 ◽  
pp. 37-55 ◽  
Author(s):  
A. E. Shastitko ◽  
O. A. Markova

Digital transformation has led to changes in business models of traditional players in the existing markets. What is more, new entrants and new markets appeared, in particular platforms and multisided markets. The emergence and rapid development of platforms are caused primarily by the existence of so called indirect network externalities. Regarding to this, a question arises of whether the existing instruments of competition law enforcement and market analysis are still relevant when analyzing markets with digital platforms? This paper aims at discussing advantages and disadvantages of using various tools to define markets with platforms. In particular, we define the features of the SSNIP test when being applyed to markets with platforms. Furthermore, we analyze adjustment in tests for platform market definition in terms of possible type I and type II errors. All in all, it turns out that to reduce the likelihood of type I and type II errors while applying market definition technique to markets with platforms one should consider the type of platform analyzed: transaction platforms without pass-through and non-transaction matching platforms should be tackled as players in a multisided market, whereas non-transaction platforms should be analyzed as players in several interrelated markets. However, if the platform is allowed to adjust prices, there emerges additional challenge that the regulator and companies may manipulate the results of SSNIP test by applying different models of competition.


Sign in / Sign up

Export Citation Format

Share Document