reliability estimates
Recently Published Documents


TOTAL DOCUMENTS

448
(FIVE YEARS 91)

H-INDEX

35
(FIVE YEARS 3)

Author(s):  
Jack Hutchinson ◽  
Luke Strickland ◽  
Simon Farrell ◽  
Shayne Loft

Objective Examine (1) the extent to which humans can accurately estimate automation reliability and calibrate to changes in reliability, and how this is impacted by the recent accuracy of automation; and (2) factors that impact the acceptance of automated advice, including true automation reliability, reliability perception, and the difference between an operator’s perception of automation reliability and perception of their own reliability. Background Existing evidence suggests humans can adapt to changes in automation reliability but generally underestimate reliability. Cognitive science indicates that humans heavily weight evidence from more recent experiences. Method Participants monitored the behavior of maritime vessels (contacts) in order to classify them, and then received advice from automation regarding classification. Participants were assigned to either an initially high (90%) or low (60%) automation reliability condition. After some time, reliability switched to 75% in both conditions. Results Participants initially underestimated automation reliability. After the change in true reliability, estimates in both conditions moved towards the common true reliability, but did not reach it. There were recency effects, with lower future reliability estimates immediately following incorrect automation advice. With lower initial reliability, automation acceptance rates tracked true reliability more closely than perceived reliability. A positive difference between participant assessments of the reliability of automation and their own reliability predicted greater automation acceptance. Conclusion Humans underestimate the reliability of automation, and we have demonstrated several critical factors that impact the perception of automation reliability and automation use. Application The findings have potential implications for training and adaptive human-automation teaming.


2021 ◽  
Vol 36 (4) ◽  
pp. 545-663
Author(s):  
Muhammad Faran ◽  
Farah Malik

Music is a universal phenomenon however, despite its unified properties, the taste and preference of music may still vary as a function of ethnicity and culture. So, the present study aimed to adapt and validate the short test of music preference scale for music and non-music Pakistani students. In Phase I, the cultural adaption of the scale was carried out while the content validity index (Lawshe, 1975) was also established. However, in phase II, the Short test of Music Scale (STOMP) was validated, yielding confirmatory factor analysis. For the empirical evaluation, a sample of 561 students, including both 286 music and 275 non-music students of undergraduate level with the age range of 18-26 years were recruited. The psychometric evolution of STOMP turned into excellent validity and reliability estimates for first-order constructs. Moreover, strict measurement invariance was established for STOMP across music and non-music students. The validation of this scale would be a little effort to pave the way for music psychology to make research available to measure the construct indigenously.


2021 ◽  
Vol 2021 ◽  
pp. 1-12
Author(s):  
Amer Ibrahim Al-Omari ◽  
Amal S. Hassan ◽  
Naif Alotaibi ◽  
Mansour Shrahili ◽  
Heba F. Nagy

In survival analysis, the two-parameter inverse Lomax distribution is an important lifetime distribution. In this study, the estimation of R = P   Y < X is investigated when the stress and strength random variables are independent inverse Lomax distribution. Using the maximum likelihood approach, we obtain the R estimator via simple random sample (SRS), ranked set sampling (RSS), and extreme ranked set sampling (ERSS) methods. Four different estimators are developed under the ERSS framework. Two estimators are obtained when both strength and stress populations have the same set size. The two other estimators are obtained when both strength and stress distributions have dissimilar set sizes. Through a simulation experiment, the suggested estimates are compared to the corresponding under SRS. Also, the reliability estimates via ERSS method are compared to those under RSS scheme. It is found that the reliability estimate based on RSS and ERSS schemes is more efficient than the equivalent using SRS based on the same number of measured units. The reliability estimates based on RSS scheme are more appropriate than the others in most situations. For small even set size, the reliability estimate via ERSS scheme is more efficient than those under RSS and SRS. However, in a few cases, reliability estimates via ERSS method are more accurate than using RSS and SRS schemes.


2021 ◽  
Vol 158 (A4) ◽  
Author(s):  
Y Garbatov ◽  
C Guedes Soares

Reliability assessment of a corroded deck of a tanker ship subjected to non-linear general corrosion wastage is performed, accounting for an initial period without corrosion due to the presence of a corrosion protection system, and a non-linear increase in wastage up to a steady state value. The reliability model is based on the analysis of corrosion depth data. Two types of uncertainties are accounted for. The first one is related to the corrosion degradation trend as a function of time, which is identified by a sequence independent data analysis. The second uncertainty is related to the variation of the corrosion degradation around its trend, which is identified as a stochastic process, and is defined based on the time series analysis. The time series determines the autocorrelation and spectral density functions of the stochastic process applying the Fast Fourier transform. The reliability estimates with respect to a corroded deck of cargo tank of a tanker ship is analysed by a time variant formulation and the effect of inspections is also incorporated employing the Bayesian updating formulation.


2021 ◽  
Author(s):  
Jack Hutchinson ◽  
Simon Farrell ◽  
Luke Joseph Gough Strickland ◽  
Shayne Loft

Human perception of automation reliability and automation acceptance behaviours are key to effective human-automation teaming. This study examined factors that impact perceptions of automation reliability over time and the acceptance of automated advice. Participants completed a maritime vessel classification task in which they classified vessels (contacts) with the assistance of automation. In Experiment 1 automation reliability successively switched from high to low (or vice versa). In Experiment 2 automation reliability decreased by varying magnitudes before returning to high. Participants did not initially calibrate to true reliability and experiencing low automation reliability reduced future reliability estimates when experiencing subsequent high reliability. Automation acceptance was predicted by positive differences between participants perception of automation reliability and confidence in their own classification reliability. Experiencing low automation reliability caused perceptions of reliability and automation acceptance rates to diverge. These findings have important implications for training and adaptive human-automation teaming in complex and dynamic environments.


2021 ◽  
Author(s):  
Thomas Pronk ◽  
Rebecca Hirst ◽  
Reinout Wiers ◽  
Jaap M. J. Murre

Research deployed via the internet and administered via smartphones could have access to more diverse samples than lab-based research. Diverse samples could have relatively high variation in their traits and so yield relatively reliable measurements of individual differences in these traits. Cognitive tasks have been reported to yield relatively low reliabities (Hedge et al., 2018), which could potentially be addressed by smartphone-mediated administration in diverse samples. We formulate several criteria to determine whether a cognitive task is suitable for individual differences research on commodity smartphones: no very brief or precise stimulus timing, relative response times (RTs), a maximum of two response options, and a small number of graphical stimuli. The Flanker Task meets these criteria. We compared the reliability of individual differences in the Flanker Effect across samples and devices in a pre-registered study. We found no evidence that a more diverse sample yields higher reliabilities. We also found no evidence that commodity smartphones yield lower reliabilities than commodity laptops. Hence, diverse samples might not improve reliability above student samples, but smartphones may well measure individual differences with cognitive tasks reliably. Exploratively, we examined different reliability coefficients, split-half reliabilities, and the development of reliability estimates as a function of task length.


2021 ◽  
Vol 21 (4) ◽  
pp. 1021-1027
Author(s):  
Brian P. Shaw

Researchers and psychometricians have long used Cronbach’s α as a measure of reliability. However, there have been growing calls to replace Cronbach’s α with measures that have more defensible assumptions. One of the most common and straightforward recommended reliability estimates is ω. After a review of reliability and its estimation in Stata, I introduce the community-contributed command omegacoef. This command reports McDonald’s ω in a format similar to the base alpha command. omegacoef provides Stata users the ability to easily compute estimates of reliability with the confidence that the necessary statistical assumptions are met.


Author(s):  
Raphaela I. Zehtner ◽  
Cosima L. Baeurle ◽  
Bertram Walter ◽  
Rudolf Stark ◽  
Andrea Hermann

Abstract. Background: This study aimed to develop a German version of the Family Expressiveness Questionnaire (FEQ; Halberstadt, 1983 , 1986 ), which investigates emotional expressiveness within the family context while growing up. While a theoretically derived four-factor structure was postulated, 2- and 3-scale versions have been applied in research. Methods: In Study 1 ( N = 650), these existing models were tested against each other. A confirmatory factor analysis was conducted for the solution that best fitted the data with half of the sample, and results were cross-validated in the other half. Construct validation was investigated in Study 2 ( N = 225). Results: An acceptable model fit for a three-factor solution was attained in Study 1. In Study 2, correlation patterns indicated a good convergent and discriminant validity. Reliability estimates in both studies were in an acceptable to excellent range. Conclusion: Findings suggest that the FEQ German version is a psychometrically sound instrument for assessing expressiveness within the family.


Author(s):  
Benedict C. O. F. Fehringer

Abstract. Visualization and spatial relations (mental rotation) are two important factors of spatial thinking. Visualization refers to complex visual-spatial transformations, whereas spatial relations refer to simple mental rotation of visualized objects. Conventional spatial relations tests, however, have been found to be highly correlated with visualization tests because solving items through mental rotation might involve visualization ability due to the complexity of the visual materials of these tests. In two studies ( N = 51, N = 109), a new computer-based test for spatial relations, the R-Cube-SR Test, was developed and validated. The R-Cube-SR Test utilizes simple, single-colored cubes as rotated visual materials. Reliability estimates of the reaction times reach ω = .87. Correlations with standard tests of spatial relations (up to r = .55) were significantly higher than with visualization tests, such as the new R-Cube-Vis Test ( Fehringer, 2020 ), which uses the same visual materials. This was supported by CFAs. It is concluded that the new R-Cube-SR Test is a valid measure of spatial relations. Both tests together, the R-Cube-Vis and R-Cube-SR, as specific tests for their respective factor, now, are able to provide a differential diagnosis of a participant’s spatial thinking ability using the same visual materials.


2021 ◽  
Author(s):  
Joshua Levy ◽  
Carly Bobak ◽  
Nasim Azizgolshani ◽  
Xiaoying Liu ◽  
Bing Ren ◽  
...  

The public health burden of non-alcoholic steatohepatitis (NASH), a liver condition characterized by excessive lipid accumulation and subsequent tissue inflammation and fibrosis, has burgeoned with the spread of western lifestyle habits. Progression of fibrosis into cirrhosis is assessed using histological staging scales (e.g., NASH Clinical Research Network (NASH CRN)). These scales are used to monitor disease progression as well as to evaluate the effectiveness of therapies. However, clinical drug trials for NASH are typically underpowered due to lower than expected inter-/intra-rater reliability, which impacts measurements at screening, baseline, and endpoint. Bridge ratings represent a phenomenon where pathologists assign two adjacent stages simultaneously during assessment and may further complicate these analyses when ad hoc procedures are applied. Statistical techniques, dubbed Bridge Category Models, have been developed to account for bridge ratings, but not for the scenario where multiple pathologists assess biopsies across time points. Here, we develop hierarchical Bayesian extensions for these statistical methods to account for repeat observations and use these methods to assess the impact of bridge ratings on the inter-/intra-rater reliability of the NASH CRN staging scale. We also report on how pathologists may differ in their assignment of bridge ratings to highlight different staging practices. Our findings suggest that Bridge Category Models can capture additional fibrosis staging heterogeneity with greater precision, which translates to potentially higher reliability estimates in contrast to the information lost through ad hoc approaches.


Sign in / Sign up

Export Citation Format

Share Document