scholarly journals A Detailed Item Response Theory Analysis of Algorithms and Programming Concepts in App Inventor Projects

2021 ◽  
Vol 29 ◽  
pp. 1377-1402
Author(s):  
Nathalia da Cruz Alves ◽  
Christiane Gresse Von Wangenheim ◽  
Jean Carlo Rossa Hauck ◽  
Adriano Ferreti Borgatto

Teaching computing in K-12 is often introduced focusing on algorithms and programming concepts using block-based programming environments, such as App Inventor. Yet, learning programming is a complex process and novices struggle with several difficulties. Thus, to be effective, instructional units need to be designed regarding not only the content but also its sequencing taking into consideration difficulties related to the concepts and the idiosyncrasies of programming environments. Such systematic sequencing can be based on large-scale project analyses by regarding the volition, incentive, and opportunity of students to apply the relevant program constructs as latent psychometric constructs using Item Response Theory to obtain quantitative ‘difficulty’ estimates for each concept. Therefore, this article presents the results of a large-scale data-driven analysis of the demonstrated use in practice of algorithms and programming concepts in App Inventor. Based on a dataset of more than 88,000 App Inventor projects assessed automatically with the CodeMaster rubric, we perform an analysis using Item Response Theory. The results demonstrate that the easiness of some concepts can be explained by their inherent characteristics, but also due to the characteristics of App Inventor as a programming environment. These results can help teachers, instructional and curriculum designers in the sequencing, scaffolding, and assessment design of programming education in K-12.

2021 ◽  
Author(s):  
Nathalia Da Cruz Alves ◽  
Christiane Gresse Von Wangenheim ◽  
Jean Carlo Rossa Hauck ◽  
Adriano Ferreti Borgatto

Computing education is often introduced in K-12 focusing on algorithms and programming concepts using block-based programming environments, such as App Inventor. Yet, learning programming is a complex process and novices struggle with several difficulties. Thus, to be effective, instructional units need to be designed regarding not only the content but also its sequencing taking into consideration difficulties related to the concepts and the idiosyncrasies of programming environments. Such systematic sequencing can be based on large-scale project analyses by regarding the volition, incentive, and opportunity of students to apply the relevant program constructs as latent psychometric constructs using Item Response Theory to obtain quantitative ?difficulty? estimates for each concept. Therefore, this article presents the results of a large-scale data-driven analysis of the demonstrated use in practice of algorithms and programming concepts in App Inventor. Based on a dataset of more than 88,000 App Inventor projects assessed automatically with the ANON rubric, we perform an analysis using Item Response Theory. The results demonstrate that the easiness of some concepts can be explained by their inherent characteristics, but also due to the characteristics of App Inventor as a programming environment. These results can help teachers, instructional and curriculum designers in the sequencing, scaffolding and assessment design of programming education in K-12.


2020 ◽  
Vol 2 (1) ◽  
pp. 90-105
Author(s):  
Jimmy Y. Zhong

AbstractFocusing on 12 allocentric/survey-based strategy items of the Navigation Strategy Questionnaire (Zhong & Kozhevnikov, 2016), the current study applied item response theory-based analysis to determine whether a bidimensional model could better describe the latent structure of the survey-based strategy. Results from item and model fit diagnostics, categorical response and item information curves showed that an item with the lowest rotated component loading (.27) [SURVEY12], could be considered for exclusion in future studies; and that a bidimensional model with three preference-related items constituting a content factor offered a better representation of the latent structure than a unidimensional model per se. Mean scores from these three items also correlated significantly with a pointing-to-landmarks task to the same relative magnitude as the mean scores from all items, and all items excluding SURVEY12. These findings gave early evidence suggesting that the three preference-related items could constitute a subscale for deriving quick estimates of large-scale allocentric spatial processing in healthy adults in both experimental and clinical settings. Potential cognitive and brain mechanisms were discussed, followed by calls for future studies to gather greater evidence confirming the predictive validity of the full and sub scales, along with the design of new items focusing on environmental familiarity.


2017 ◽  
Vol 35 (2) ◽  
pp. 297-317 ◽  
Author(s):  
Tanya Longabach ◽  
Vicki Peyton

K–12 English language proficiency tests that assess multiple content domains (e.g., listening, speaking, reading, writing) often have subsections based on these content domains; scores assigned to these subsections are commonly known as subscores. Testing programs face increasing customer demands for the reporting of subscores in addition to the total test scores in today’s accountability-oriented educational environment. Although reporting subscores can provide much-needed information for teachers, administrators, and students about proficiency in the test domains, one of the major drawbacks of subscore reporting includes their lower reliability as compared to the test as a whole. In addition, viewing language domains as if they were not interrelated, and reporting subscores without considering this relationship between domains, may be contradictory to the theory of language acquisition. This study explored several methods of assigning subscores to the four domains of a state English language proficiency test, including classical test theory (CTT)-based number correct, unidimensional item response theory (UIRT), augmented item response theory (A-IRT), and multidimensional item response theory (MIRT), and compared the reliability and precision of these different methods across language domains and grade bands. The first two methods assessed proficiency in the domains separately, without considering the relationship between domains; the last two methods took into consideration relationships between domains. The reliability and precision of the CTT and UIRT methods were similar and lower than those of A-IRT and MIRT for most domains and grade bands; MIRT was found to be the most reliable method. Policy implications and limitations of this study, as well as directions for further research, are discussed.


2018 ◽  
Author(s):  
Jimmy Y. Zhong

Focusing on 12 allocentric/survey-based strategy items of the Navigation Strategy Questionnaire (Zhong & Kozhevnikov, 2016), the current study applied item response theory-based analysis to determine whether a bidimensional model could better describe the latent structure of the survey-based strategy. Results from item and model fit diagnostics, categorical response and item information curves showed that an item with the lowest rotated component loading (.27) [SURVEY12], could be considered for exclusion in future studies; and that a bidimensional model with three preference-related items constituting a content factor offered a better representation of the latent structure than a unidimensional model per se. Mean scores from these three items also correlated significantly with a pointing-to-landmarks task to the same relative magnitude as the mean scores from all items, and all items excluding SURVEY12. These findings gave early evidence suggesting that the three preference-related items could constitute a subscale for deriving quick estimates of large-scale allocentric spatial processing in healthy adults in both experimental and clinical settings. Potential cognitive and brain mechanisms were discussed, followed by calls for future studies to gather greater evidence confirming the predictive validity of the full and sub scales, along with the design of new items focusing on environmental familiarity. [COPYRIGHT CC-BY-NC-ND 4.0 J. Y. ZHONG 2018]. AUTHOR'S NOTE: Officially published as "Reanalysis of an Allocentric Navigation Strategy Scale based on Item Response Theory"


2017 ◽  
Vol 78 (5) ◽  
pp. 805-825 ◽  
Author(s):  
Dimiter M. Dimitrov

This article presents some new developments in the methodology of an approach to scoring and equating of tests with binary items, referred to as delta scoring (D-scoring), which is under piloting with large-scale assessments at the National Center for Assessment in Saudi Arabia. This presentation builds on a previous work on delta scoring and adds procedures for scaling and equating, item response function, and estimation of true values and standard errors of D scores. Also, unlike the previous work on this topic, where D-scoring involves estimates of item and person parameters in the framework of item response theory, the approach presented here does not require item response theory calibration.


2018 ◽  
Vol 22 (2) ◽  
pp. 130-142
Author(s):  
Thomas Mbenu Nulangi ◽  
Djemari Mardapi

This study aimed to describe (1) the characteristics of items based on the Item Response Theory, (2) the cheating level in the implementation of the national examinartion based on Angoffs B-Index method, Pair 1 method, Pair 2 method, Modified Error Similarity Analysis (MESA) method, and G2 method, (3) the most accurate method to detect the cheating in the mathematics national examination at the senior secondary school level in the academic year of 2015/2016 in East Nusa Tenggara Province. The result of the item response theory analysis showed that 17 (42.5%) items of the mathematics national examination fit with the 3-PL model, with the maximum information function of 58.0128 at 1.6, and the measurement error of 0.1313. The number of pairs detected to be cheating by Angoff’s B-Index method was 63 pairs, that by the Pair 1 method was 52 pairs, that by the Pair 2 method was 141 pairs, that by MESA method was 67 pairs, and that by the G2 method was 183 pairs. The methods which could detect most pairs doing cheating were the G2 method, the Pair 2 method, the MESA method, Angoff’s B-Index method, and the Pair 1 method successively. The methods which could accurately detect cheating based on the computation of the standard error were Angoff’s B-Index method, the G2 method, the MESA method, the Pair 1 method, and the Pair 2 method successively.


2021 ◽  
Vol 6 (1) ◽  
pp. 93
Author(s):  
Rahmat Danni ◽  
Ajeng Wahyuni ◽  
Tauratiya Tauratiya

This study describes the item details of the final semester questions in Arabic MAN 1 Pangkalpinang using the item response theory approach. The problem behind this research is that the development of Arabic final assessment items did not go through the correct stages. Therefore, this research is quantitative research. The subjects of this study were 176 students of class XI MAN 1 Pangkalpinang. The answer data is in the form of answers to questions in the final semester in Arabic which are 40 multiple-choice items with five answers. The results showed that the final results of the Arabic semester (1) proved valid, indicated by 40 items (100%) with loading factors; (2) proven to be reliable, indicated by the reliability coefficient of 0.884; (3) there are 33 items (82.5%) of the 40 items that have a good level of difficulty and distinguishing power so that they can be stored in the question bank and used in subsequent activities, while 7 items (17.5%) are item number 10, 26, 27, 29, 32, 34, and 35 do not meet the criteria for a good level of difficulty so they need to be revised or eliminated; and (4) suitable for use in students with low to moderate ability (θ) in the range -3.5 to +1.5 in logit. Future research is expected to be able to analyze Arabic language question items in the form of descriptive tests on a large scale or develop high-quality high-order thinking skills in Arabic.


Author(s):  
Dani Gamerman ◽  
Tufi M. Soares ◽  
Flávio Gonçalves

This article discusses the use of a Bayesian model that incorporates differential item functioning (DIF) in analysing whether cultural differences may affect the performance of students from different countries in the various test items which make up the OECD’s Programme for International Student Assessment (PISA) test of mathematics ability. The PISA tests in mathematics and other subjects are used to compare the educational attainment of fifteen-year old students in different countries. The article first provides a background on PISA, DIF and item response theory (IRT) before describing a hierarchical three-parameter logistic model for the probability of a correct response on an individual item to determine the extent of DIF remaining in the mathematics test of 2003. The results of Bayesian analysis illustrate the importance of appropriately accounting for all sources of heterogeneity present in educational testing and highlight the advantages of the Bayesian paradigm when applied to large-scale educational assessment.


Sign in / Sign up

Export Citation Format

Share Document