scholarly journals The Comparison Accuracy Estimation of Test Reliability Coefficients for National Chemistry Examination in Jambi Province on Academic Year 2014/2015

2017 ◽  
Vol 2 (1) ◽  
pp. 34
Author(s):  
Rida Sarwiningsih

<p>This research aims to compare the internal consistency of reliability coefficient on classical test theory. Estimation accuracy of internal consistency reliability coefficient used several methods of the coefficient reliability formulation. The methods are Split-Half Method, Cronbach Alpha formula, and Kuder Richardson formula.  Determination of the test reliability coefficients used also some formula and then their results were compared with the results of their estimation accuracy. This research is a quantitative descriptive. Data were analyzed based on responses of national chemistry examination in Jambi province on academic year 2014/2015. The data of students answer sheets were taken using proportional stratified random sampling technique. There are 200 students’ responses from 162 schools (132 public schools and 30 private schools) in Jambi province. The form of data were dichotomy data and analyzed using Split-Half Method. Their reliabilities were analyzed using Cronbach Alpha formula and Kuder Richardson formula. Reliability criteria used consist of five conditions, they are 0.5; 0.6; 0.7; 0.8 and 0.9. The results of this research indicated that (a) the coefficient of reliability in classical test theory developed by measurement experts (using Split-Half Method, Cronbach Alpha formula and Kuder Richardson formula) have varying estimates of accuracy;  (b) average reliability coefficients have the precision estimation about of 0.78 up to 0.8; (c) the reliability coefficient using Spearman Brown formula was 0.78, with Rulon formula was 0.78, Flanagan formula was 0.77, Cronbach Alpha formula was 0.838, the KR20 formula was 0.838, and KR21 formula was 0.82<em>1.</em></p>

2020 ◽  
Vol 9 (1) ◽  
pp. 85-99
Author(s):  
Marian Popa

Cronbach alpha coefficient is still commonly used in research dedicated to the development of psychologicaltests. However, there is a certain lack of understanding of the significance of real and, especially, has its limits.The article presents the fundamental postulates of classical test theory and analyzes, in this context, the mainissues affecting the calculation and interpretation of Cronbach alpha coefficient: unidimensionality, internal consistency,item number, characteristics of the data, the sampling error. Finally, are summarized recommendationsof good practices on the use and reporting Cronbach's alpha.


2021 ◽  
Vol 5 (2) ◽  
pp. 210-221
Author(s):  
Anis Faridah

This research is a study of quantitative descriptive. The purpose of this research is to describe the characteristics of final semester exam items for grade XI in the History subject at SMA Negeri 1 Pangkalpinang using the classical test theory approach. The research of the subject was 138 students of class XI in Social Sciences Major. The result of the research shows that final exam questions in the history subject class XI of SMA Negeri 1 Pangkalpinang are proper to use. This shows that from the validity of the items which there are 39 items of questions (97.5%) which are proven empirically valid with a 0.818 reliability coefficient. Other than that, there are 27 items of questions (67,5%) that can fulfill the criteria for the difficulty level, distinguishing power, and distractor function so it can be used directly to measure the student's ability without correction. While 12 items of questions (30%) need to be fixed and 1 item of question (2,5%) is declared to be invalid so it can't be used to measure the student's ability in History Subject. Permasalahan yang melatarbelakangi penelitian ini adalah pengembangan soal penilaian akhir semester mata pelajaran sejarah yang tidak melalui tahapan analisis butir soal sehingga kualitas butir soal tidak diketahui. Penelitian ini merupakan penelitian deskriptif kuantitatif. Tujuan penelitian ini adalah untuk mendeskripsikan karakteristik butir soal penilaian akhir semester mata pelajaran sejarah kelas XI SMA Negeri 1 Pangkalpinang menggunakan pendekatan teori tes klasik. Subjek penelitian berjumlah 138 peserta didik kelas XI jurusan IPS. Hasil penelitian menunjukkan bahwa soal PAS mata pelajaran sejarah kelas XI SMA Negeri 1 Pangkalpinang telah layak digunakan. Hal ini dibuktikan dari validitas butir soal yang mana terdapat 39 butir soal (97,5%) terbukti valid secara empirik dengan koefisien reliabilitas sebesar 0,818. Selain itu terdapat 27 butir soal (67,5%) yang memenuhi kriteria tingkat kesukaran, daya beda, dan keberfungsian distraktor sehingga dapat digunakan langsung untuk mengukur kemampuan peserta didik tanpa perbaikan. Sedangkan sebanyak 12 butir soal (30%) perlu dilakukan perbaikan dan 1 butir soal (2,5%) dinyatakan gugur sehingga tidak dapat digunakan untuk mengukur kemampuan peserta didik pada mata pelajaran sejarah.


Author(s):  
Susanne Hempel

This chapter discusses reliability. It outlines the nature and purpose of reliability, classical test theory, measures of reliability (measure orientated reliability, parallel test, and test-retest) as well as internal consistency, inter-item correlation, coefficient alpha, and categorical judgements.


MADRASAH ◽  
2020 ◽  
Vol 12 (1) ◽  
pp. 29-39
Author(s):  
Nuril Huda ◽  
Tutik Sri Wahyuni

This research aims to: 1) find out the characteristics of the science items try out National Standar School Exams (USBN) in the academic year 2018/2019 based on Classical Test Theory (CTT); 2) find out the number of the science items try out USBN in the academic year 2018/2019 in relation to cognitive level. This type of research is a descriptive research with a quantitative approach. The data obtained was a computer answer sheet of 5022 students who took USBN try out of Elementary School 2019 on February 21, 2019 in Tulungagung Regency. The results showed that: 1) The characteristics of the science items try out USBN in the academic year 2018/2019 based on Classical Test Theory (CTT) in aspects of: a) validity of 35 items valid; b) the reliability value of 0.818 is very high; c) the level of difficulty level: 4 items (11.43%) are difficult, 9 items (25.71%) are moderate, 16 items (45.71%) are easy and 6 items (7.140%) are very easy; d) discriminating power: 3 items (8.57%) are bad, 12 items (34.29%) are good enough, 15 items (42.86%) are moderate, and 5 items (14.29%) are good; e) the quality of options: 17 items (48.57%) without revision, 9 items (25.71%) one option revision, 5 items (14.29%) 2 option revisions, and 4 items (11.43% ) wrong revision of 3 options; f) 13 items (37.14%) about the science try out USBN in the academic year 2018/2019 have quite good and good characteristics, so they can be included in the question bank; 2) items the science of  try out USBN in the academic year 2018/2019 in relationship with cognitive level, 11 items (31.43%) category L1 (knowledge), 10 items (28.57%) category L1 (understanding), 4 items (11.43%) category L2 (application), and 10 items (28.57%) category L3 (reasoning). Of the 13 items entered in the question bank with cognitive level, the science try out USBN in the academic year 2018/2019 was dominated at the cognitive level L1 (knowledge and understanding).


2019 ◽  
Author(s):  
Robert Foster

This paper develops a generalized framework which allows for the use of parametric classical test theory inference with non-normal models. Using the theory of natural exponential families and Bayesian theory of their conjugate priors, theoretical properties of test scores under the framework are derived, including a formula for parallel-test reliability in terms of the test length and a parameter of the underlying population distribution of abilities. This framework is shown to satisfy the general properties of classical test theory several common classical test theory results are shown to reduce to parallel-test reliability in this framework. An empirical Bayes method for estimating reliability, both with point estimates and with intervals, is described using maximum likelihood. This method is applied to an example set of data and compared to classical test theory estimators of reliability, and a simulation study is performed to show the coverage of the interval estimates of reliability derived from the framework.


2020 ◽  
Author(s):  
Pingguang lei ◽  
zheng yang ◽  
wei li ◽  
jingqing ou ◽  
yingli cun ◽  
...  

Abstract Background Quality of life (QOL) is now concerned worldwide in cancer clinical fields and the specific instrument FACT-Hep (Functional Assessment of Cancer Therapy- Hepatobiliary questionnaire) is widely used in English-spoken countries. However, the specific instruments for hepatocellular carcinoma patients in China were seldom and no formal validation on the Simplified Chinese Version of the FACT-Hep was carried out. This study was aimed to validate the Chinese FACT-Hep based on Combinations of Classical Test Theory and Generalizability Theory. Methods The Chinese Version of FACT-Hep and the QLICP-LI were used to measure QOL three times before and after treatments from a sample of 114 in-patients of hepatocellular carcinoma. The scale were evaluated by indicators such as validity and reliability coefficients Cronbach α, Pearson r, intra-class correlation (ICC), and standardized response mean. The Generalizability Theory (G theory) was also applied to addresses the dependability of measurements and estimation of multiple sources of variance. Results The Internal consistency Cronbach’s α coefficients were greater than 0.70 for all domains, and test-retest reliability coefficients for all domains and the overall were greater than 0.80 (exception of emotional Well-being 0.74) with the range from 0.81 to 0.96. G-coefficients and Ф-coefficients confirmed the reliability of the scale further with exact variance components. The domains of PWB, FWB and the overall scale had significant changes after treatments with SRM ranging from 0.40 to 0.69. Conclusions The Chinese version of FACT-Hep has good validity, reliability, and responsiveness, and can be used to measure QOL for patients with hepatocellular carcinoma in China.


Author(s):  
James Austin

Classical testing theory, including its origins within psychological measurement, the fundamental principles of true scores and measurement error, psychometrics, and statistical assumptions are the focus of this chapter. Random and systematic forms of measurement error are addressed, and the standard error of measurement is defined. Major approaches to defining and estimating test reliability and validity are reviewed, and practical applications of classical test theory to K-12 music education assessment are considered, including large-scale standardized testing as well as measurement levels, item analysis, and techniques for enhancing the reliability and validity of classroom-level assessments. Finally, the transition from classical test theory to modern test theory is explored.


2020 ◽  
Author(s):  
Peter E Clayson ◽  
Scott Baldwin ◽  
Michael J. Larson

In studies of event-related brain potentials (ERPs), difference scores between conditions in a task are frequently used to isolate neural activity for use as a dependent or independent variable. Adequate score reliability is a prerequisite for studies examining relationships between ERPs and external correlates, but there is a widely held view that difference scores are inherently unreliable and unsuitable for studies of individual differences. This view fails to consider the nuances of difference score reliability that are relevant to ERP research. In the present study, we provide formulas from classical test theory and generalizability theory for estimating the internal consistency of subtraction-based and residualized difference scores. These formulas are then applied to error-related negativity (ERN) and reward positivity (RewP) difference scores from the same sample of 117 participants. Analyses demonstrate that ERN difference scores can be reliable, which supports their use in studies of individual differences. However, RewP difference scores yielded poor reliability due to the high correlation between the constituent reward and non-reward ERPs. Findings emphasize that difference score reliability largely depends on the internal consistency of constituent scores and the correlation between those scores. Furthermore, generalizability theory estimates yielded higher internal consistency estimates for subtraction-based difference scores than classical test theory estimates did. Despite some beliefs that difference scores are inherently unreliable, ERP difference scores can show adequate reliability and be useful for isolating neural activity in studies of individual differences.


1994 ◽  
Vol 19 (1) ◽  
pp. 73-90 ◽  
Author(s):  
Ronald D. Armstrong ◽  
Douglas H. Jones ◽  
Zhaobo Wang

A network-flow model is formulated for constructing parallel tests based on classical test theory using test reliability for the criterion. The model enables practitioners to specify a test difficulty distribution for the values of the item difficulties as well as test composition requirements. Use of the network-flow algorithm ensures high computational efficiency, allowing wide applications of optimal test construction using microcomputers. The results of an empirical study show that the generated tests have acceptably high test reliability.


Sign in / Sign up

Export Citation Format

Share Document