Balancing the demands of validity and reliability in practice: Case study of a changing system of primary science summative assessment

Sarah G. Earle

doi:10.14324/lre.18.2.06

Balancing the demands of validity and reliability in practice: Case study of a changing system of primary science summative assessment

London Review of Education ◽

10.14324/lre.18.2.06 ◽

2020 ◽

Vol 18 (2) ◽

Author(s):

Sarah G. Earle

Keyword(s):

Reliability And Validity ◽

Summative Assessment ◽

Teacher Assessment ◽

Primary Science ◽

Validity And Reliability ◽

Trade Off

Teacher summative judgements of children’s attainment in science, which are statutory at age 11 in England, require consideration of both valid sampling of the construct and reliable comparison of outcomes. In order to develop understanding of the enacted ‘trade off’ between validity and reliability, this three-year case study, within the Teacher Assessment in Primary Science (TAPS) project, was undertaken during a period of statutory assessment change in England. The case demonstrates an ongoing balancing act between the demands of reliability and validity, and resulted in the development of a teacher assessment seesaw, which provides a model for both interpreting and supporting practice, within and beyond primary science.

Download Full-text

Using Principal Component Scores to Enhance the Validity and Reliability of Big Five Personality Measures

Journal of Individual Differences ◽

10.1027/1614-0001/a000225 ◽

2017 ◽

Vol 38 (2) ◽

pp. 83-93

Author(s):

Jeffrey M. Cucina ◽

Nicholas L. Vasilopoulos ◽

Arwen H. DeCostanza

Keyword(s):

Big Five ◽

Discriminant Validity ◽

Reliability And Validity ◽

Principal Component ◽

Big Five Personality ◽

Validity And Reliability ◽

Scoring Method ◽

Five Factors ◽

Big Five Factors ◽

Component Scores

Abstract. Varimax rotated principal component scores (VRPCS) have previously been offered as a possible solution to the non-orthogonality of scores for the Big Five factors. However, few researchers have examined the reliability and validity of VRPCS. To address this gap, we use a lab study and a field study to investigate whether using VRPCS increase orthogonality, reliability, and criterion-related validity. Compared to the traditional unit-weighting scoring method, the use of VRPCS enhanced the reliability and discriminant validity of the Big Five factors, although there was little improvement in criterion-related validity. Results are discussed in terms of the benefit of using VRPCS instead of traditional unit-weighted sum scores.

Download Full-text

Developing Schedule With Linear Programming (Case Study: STTF II Project Komplek Sukamukti Banjaran)

International Journal of Innovation in Enterprise System ◽

10.25124/ijies.v4i02.77 ◽

2020 ◽

Vol 4 (02) ◽

pp. 34-45

Author(s):

Naufal Dzikri Afifi ◽

Ika Arum Puspita ◽

Mohammad Deni Akbar

Keyword(s):

Linear Programming ◽

Objective Function ◽

Project Scheduling ◽

Time Limit ◽

Minimum Time ◽

Service Cost ◽

Trade Off ◽

Overtime Work ◽

The Cost

Shift to The Front II Komplek Sukamukti Banjaran Project is one of the projects implemented by one of the companies engaged in telecommunications. In its implementation, each project including Shift to The Front II Komplek Sukamukti Banjaran has a time limit specified in the contract. Project scheduling is an important role in predicting both the cost and time in a project. Every project should be able to complete the project before or just in the time specified in the contract. Delay in a project can be anticipated by accelerating the duration of completion by using the crashing method with the application of linear programming. Linear programming will help iteration in the calculation of crashing because if linear programming not used, iteration will be repeated. The objective function in this scheduling is to minimize the cost. This study aims to find a trade-off between the costs and the minimum time expected to complete this project. The acceleration of the duration of this study was carried out using the addition of 4 hours of overtime work, 3 hours of overtime work, 2 hours of overtime work, and 1 hour of overtime work. The normal time for this project is 35 days with a service fee of Rp. 52,335,690. From the results of the crashing analysis, the alternative chosen is to add 1 hour of overtime to 34 days with a total service cost of Rp. 52,375,492. This acceleration will affect the entire project because there are 33 different locations worked on Shift to The Front II and if all these locations can be accelerated then the duration of completion of the entire project will be effective

Download Full-text

Formative and summative assessment of science in English primary schools: evidence from the Primary Science Quality Mark

Research in Science & Technological Education ◽

10.1080/02635143.2014.913129 ◽

2014 ◽

Vol 32 (2) ◽

pp. 216-228 ◽

Cited By ~ 5

Author(s):

Sarah Earle

Keyword(s):

Primary Schools ◽

Summative Assessment ◽

Primary Science ◽

Quality Mark

Download Full-text

Selecting packaging material for dry food products by trade‐off of sustainability and performance: A case study on cookies and milk powder

Packaging Technology and Science ◽

10.1002/pts.2561 ◽

2021 ◽

Vol 34 (5) ◽

pp. 303-318

Author(s):

Maarten Baele ◽

An Vermeulen ◽

Dimitri Adons ◽

Roos Peeters ◽

Angelique Vandemoortele ◽

...

Keyword(s):

Milk Powder ◽

Food Products ◽

Packaging Material ◽

Trade Off ◽

Dry Food ◽

And Performance

Download Full-text

Validity and reliability of the Mandarin version of the Treatment Burden Questionnaire among stroke patients in Mainland China

Family Practice ◽

10.1093/fampra/cmab004 ◽

2021 ◽

Author(s):

Qi Zhang ◽

Ke Zhang ◽

Miao Li ◽

Jiaxin Gu ◽

Xintong Li ◽

...

Keyword(s):

Internal Consistency ◽

Convergent Validity ◽

Intraclass Correlation ◽

Reliability And Validity ◽

Therapeutic Interventions ◽

Treatment Burden ◽

Validity And Reliability ◽

Stroke Patients ◽

Retest Reliability ◽

Test Retest Reliability

Abstract Objectives To examine the validity and reliability of the Mandarin version of the Treatment Burden Questionnaire (TBQ) among stroke patients. Background Stroke patients need long-term management of symptoms and life situation, and treatment burden has recently emerged as a new concept that can influence the health outcomes during the rehabilitation process. Methods The convenience sampling method was used to recruit 187 cases of stroke patients in a tertiary grade hospital in Tianjin for a formal investigation. Item analysis, reliability and validity tests were carried out. The reliability test included internal consistency and test–retest reliability. And as well as content, structure and convergent validity were performed for the validity test. Results Of the 187 completed questionnaires, only 180 (96.3%) were suitable for analysis. According to the experts’ evaluation, the I-CVI of each item was from 0.833 to 1.000, and the S-CVI was 0.967. The exploratory factor analysis yielded three-factor components with a cumulative variation of 53.054%. Convergent validity was demonstrated using measures of Morisky’s Medication Adherence Scale 8 (r = –0.450, P < 0.01). All correlations between items and global scores ranged from 0.403 to 0.638. Internal consistency reliability and test–retest reliability were found to be acceptable, as indicated by a Cronbach’s α of 0.824 and an intraclass correlation coefficient of 0.846, respectively. Conclusions The Mandarin TBQ had acceptable validity and reliability. The use of TBQ in the assessment of treatment burden of stroke survivor may benefit health resources allocation and provide tailor therapeutic interventions to construct minimally disruptive care.

Download Full-text

Original CAN reliability and validity study Phelan M, Slade M, Thornicroft G, Dunn G, Holloway F, Wykes T, Strathdee G, Loftus L, McCrone P & Hayward P (1995) The Camberwell Assessment of Need: the validity and reliability of an instrument to assess the needs of people with severe mental illness, British Journal of Psychiatry, 167, 589–95.

Camberwell Assessment of Need (CAN) ◽

10.1017/9781911623441.017 ◽

2020 ◽

pp. 117-125

Keyword(s):

Mental Illness ◽

British Journal ◽

Severe Mental Illness ◽

Reliability And Validity ◽

Validity Study ◽

Validity And Reliability ◽

Camberwell Assessment Of Need ◽

Assessment Of Need

Download Full-text

The Clinician’s Subjective Experience during the Interaction with Adolescent Psychiatric Patients: Validity and Reliability of the Assessment of Clinician’s Subjective Experience

Psychopathology ◽

10.1159/000513769 ◽

2021 ◽

pp. 1-8

Author(s):

Angelo Picardi ◽

Sara Panunzi ◽

Sofia Misuraca ◽

Chiara Di Maggio ◽

Andrea Maugeri ◽

...

Keyword(s):

Clinical Examination ◽

Subjective Experience ◽

Rating Scale ◽

Mood State ◽

Psychiatric Patients ◽

Temporal Stability ◽

Young Patients ◽

Reliability And Validity ◽

Mental States ◽

Validity And Reliability

Introduction: The last decade has witnessed a resurgence of interest in the clinician’s subjectivity and its role in the diagnostic assessment. Integrating the criteriological, third-person approach to patient evaluation and psychiatric diagnosis with other approaches that take into account the patient’s subjective and intersubjective experience may bear particular importance in the assessment of very young patients. The ACSE (Assessment of Clinician’s Subjective Experience) instrument may provide a practical way to probe the intersubjective field of the clinical examination; however, its reliability and validity in child and adolescent psychiatrists seeing very young patients is still to be determined. Methods: Thirty-three clinicians and 278 first-contact patients aged 12–17 years participated in this study. The clinicians completed the ACSE instrument and the Brief Psychiatric Rating Scale after seeing the patient, and the Profile of Mood State (POMS) just before seeing the patient and immediately after. The ACSE was completed again for 45 patients over a short (1–4 days) retest interval. Results: All ACSE scales showed high internal consistency and moderate to high temporal stability. Also, they displayed meaningful correlations with the changes in conceptually related POMS scales during the clinical examination. Discussion: The findings corroborate and extend previous work on adult patients and suggest that the ACSE provides a valid and reliable measure of the clinician’s subjective experience in adolescent psychiatric practice, too. The instrument may prove to be useful to help identify patients in the early stages of psychosis, in whom subtle alterations of being with others may be the only detectable sign. Future studies are needed to determine the feasibility and usefulness of integrating the ACSE within current approaches to the evaluation of at-risk mental states.

Download Full-text

Digital Competence Assessment Methods in Higher Education: A Systematic Literature Review

Education Sciences ◽

10.3390/educsci11080402 ◽

2021 ◽

Vol 11 (8) ◽

pp. 402

Author(s):

Linda Helene Sillat ◽

Kairit Tammets ◽

Mart Laanpere

Keyword(s):

Higher Education ◽

Literature Review ◽

Systematic Literature Review ◽

Reliability And Validity ◽

Assessment Tools ◽

Competence Assessment ◽

Validity And Reliability ◽

Digital Competence ◽

Need To Evaluate ◽

Assessment In Higher Education

The rapid increase in recent years in the number of different digital competency frameworks, models, and strategies has prompted an increasing popularity for making the argument in favor of the need to evaluate and assess digital competence. To support the process of digital competence assessment, it is consequently necessary to understand the different approaches and methods. This paper carries out a systematic literature review and includes an analysis of the existing proposals and conceptions of digital competence assessment processes and methods in higher education, with the aim of better understanding the field of research. The review follows three objectives: (i) describe the characteristics of digital competence assessment processes and methods in higher education; (ii) provide an overview of current trends; and, finally, (iii) identify challenges and issues in digital competence assessment in higher education with a focus on the reliability and validity of the proposed methods. On the basis of the findings, and as a result of the COVID-19 pandemic, digital competence assessment in higher education requires more attention, with a specific focus on instrument validity and reliability. Furthermore, it will be of great importance to further investigate the use of assessment tools to support systematic digital competence assessment processes. The analysis includes possible opportunities and ideas for future lines of work in digital competence evaluation in higher education.

Download Full-text

Do We Need to Redesign the Current Assessment System in Bangladesh? A Review of Theory and Practice

Global Journal of Educational Studies ◽

10.5296/gjes.v5i2.15585 ◽

2019 ◽

Vol 5 (2) ◽

pp. 37

Author(s):

Faieza Chowdhury

Keyword(s):

Formative Assessment ◽

Student Learning ◽

Reliability And Validity ◽

Summative Assessment ◽

Theory And Practice ◽

Assessment System ◽

Summative Assessments ◽

Effective Manner ◽

Current Assessment ◽

Improve Student Learning

In this current age of highly competitive global environment, teachers are under tremendous pressure to assess student learning in the most effective manner. Two tools that teachers commonly utilize to assess students in their classes are formative and summative assessment. In formative assessment, teachers gather data in order to improve student learning and in summative assessment they use the data to assess students’ learning at the end of a specific course of study. The scores on both types of assessment should meet the minimum standards of both reliability and validity. In this article we highlight the differences between the two forms of assessment, discuss the theories pertaining to summative and formative assessment, identify how educators at tertiary level in Bangladesh commonly utilize the two types of assessment and disclose opinions of teachers regarding whether the current assessment system is appropriate or need any further improvements. Findings from the study indicate that most teachers have an incomplete and unharmonious understanding about assessment often failing to clearly distinguish between formative and summative assessments.

Download Full-text

Participation in Decision Making as a Property of Complex Adaptive Systems: Developing and Testing a Measure

Nursing Research and Practice ◽

10.1155/2013/706842 ◽

2013 ◽

Vol 2013 ◽

pp. 1-16 ◽

Cited By ~ 9

Author(s):

Ruth A. Anderson ◽

Donde Plowman ◽

Kirsten Corazzini ◽

Pi-Ching Hsieh ◽

Hui Fang Su ◽

...

Keyword(s):

Decision Making ◽

Complex Adaptive Systems ◽

Adaptive Systems ◽

Discriminant Validity ◽

Reliability And Validity ◽

Organizational Level ◽

Validity And Reliability ◽

Complex Adaptive ◽

Participation In Decision Making ◽

Level Property

Objectives.To (1) describe participation in decision-making as a systems-level property of complex adaptive systems and (2) present empirical evidence of reliability and validity of a corresponding measure.Method.Study 1 was a mail survey of a single respondent (administrators or directors of nursing) in each of 197 nursing homes. Study 2 was a field study using random, proportionally stratified sampling procedure that included 195 organizations with 3,968 respondents.Analysis.In Study 1, we analyzed the data to reduce the number of scale items and establish initial reliability and validity. In Study 2, we strengthened the psychometric test using a large sample.Results.Results demonstrated validity and reliability of the participation in decision-making instrument (PDMI) while measuring participation of workers in two distinct job categories (RNs and CNAs). We established reliability at the organizational level aggregated items scores. We established validity of the multidimensional properties using convergent and discriminant validity and confirmatory factor analysis.Conclusions.Participation in decision making, when modeled as a systems-level property of organization, has multiple dimensions and is more complex than is being traditionally measured. Managers can use this model to form decision teams that maximize the depth and breadth of expertise needed and to foster connection among them.

Download Full-text