scholarly journals Evaluation of the quality of multiple choice test bank for the module of Introduction to Anthropology by using the RASCH model and QUEST software

Author(s):  
Quang Ngoc Bui

The paper presents (1) a general view of the history of the development of objective multiple choice testing methods in accordance with the development of measurement science, and the evaluation process of the learners’ academic performance by this method; (2) the process of applying classic and modern test theories to analyze and evaluate the quality of multiple choice test bank for the module of Introduction to Anthropology by the RASCH model and QUEST software, which is implemented by the determination of difficulty degree of the questionnaires, the quality of the wrong opinions, the degree of difference among the test questions, the correlation factors between the test score and the whole score, the probability of each option being chosen, the measurement scale for the learners’ competence, the "threshold level" of the difficulty level for a multiple choice question, the calculation error, the reliability of the test, etc. and thereby (3) some solutions made towards the optimal application of the objective multiple choice tests at the University of Social Sciences and Humanities, Vietnam National University Ho Chi Minh City.

2021 ◽  
Vol 8 (6) ◽  
pp. 398-407
Author(s):  
Jonald Pimentel ◽  
Marah Luriely A. Villaruz

A study was conducted to know if rearranging item choices in an item of a multiple choice test affects the behavior of the examinees the way they look at the item difficulty of the given items. Two sets of test instruments (pretest and modified posttest) containing fifteen items were made with the same items but item choices for the modified posttest instrument were rearranged. Among the 205 examinees who took the test during a two-week time interval, their responses were modeled using the Rasch model.  Results show that the estimates of the item difficulties for the majority of the fifteen items between the two tests were different. Majority of the items given in the exam showed an increase in the difficulty level as viewed by the examinees. The effect on the difficulty maybe due to the time interval the two sets of test were administered, that is first, students forget what they learned and see the items as difficult (time factor) and second, the rearrangement of the choices in each items in the post test affected the student’s way they see in dealing the items of the test which partly contributed to the increase of the level of difficulty of majority of the items in the test.


2010 ◽  
Vol 35 (1) ◽  
pp. 12-16 ◽  
Author(s):  
Sandra L. Clifton ◽  
Cheryl L. Schriner

PRASI ◽  
2020 ◽  
Vol 15 (02) ◽  
pp. 57
Author(s):  
Ni Putu Liana Santy ◽  
Ni Luh Putu Eka Sulistia Dewi ◽  
Anak Agung Gede Yudha Paramartha

This study analyzed the quality of multiple-choice test used as middle test made by the English teachers in one school in Singaraja. This study was essential to be conducted since the items of the multiple-choice test must have good quality to be used to assess the students’ achievement levels. This study used content analysis method in analyzing100 items from 3 different instruments. In collecting the data, the checklist analysis form was used to compare the items of teacher-made multiple-choice test with the norms as one of the standards in making a good multiple-choice test, then clarified through interview. From the data that have been obtained, there are 72% of students who got bad scores in the middle test. There is 1% of the items has sufficient quality, 8% of items have good quality, and the rest of the items have very good quality. The most common mistake found is in the punctuation and capitalization. It is supported by the results of the interview, which show that the teachers did not know precisely the norm of punctuation and capitalization. It can be concluded that the teachers already follow the norms of making a multiple-choice test, and it is indicated that the quality of the multiple-choice test is not the only factor that affects students’ achievement levels.Keywords: instrument quality, norms, teacher-made multiple-choice test


2020 ◽  
Vol 4 (3) ◽  
pp. 272
Author(s):  
M.S.D, Indrayani ◽  
A.A.I.N, Marhaeini ◽  
A.A.G.Y, Paramartha ◽  
L.G.E, Wahyuni

This study aimed at investigating and analyze the quality of teacher-made multiple-choice tests used as summative assessment for English subject. The quality of the tests was seen from the norms in constructing a good multiple-choice test. The research design used was descriptive research. Document study and interview were used as methods of collecting the data. The data was analyzed by comparing the 18 norms in constructing a good multiple-choice test and the multiple-choice tests, then, analyzed by using formula suggested by Nurkencana. The result showed the quality of the teacher-made multiple-choice tests a is very good with 79 items (99%) qualified as very good and I item (1%) qualified good. There were still found some problems referring to some norms. Therefore, it is suggested that the teachers have to pay attention to these unfulfilled norms. To minimize the issues, it is further suggested to do peer review, rechecking, and editing process.


2018 ◽  
Vol 40 (1) ◽  
pp. 5 ◽  
Author(s):  
Tim Stoeckel ◽  
Phil Bennett ◽  
Tomoko Ishii

This paper describes the development and initial validation of a Japanese-English bilingual version of the New General Service List Test (NGSLT; Stoeckel & Bennett, 2015). The New General Service List (NGSL; Browne, 2013) consists of 2,800 high frequency words and is intended to provide maximal coverage of texts for learners of English. The NGSLT is a diagnostic instrument designed to identify gaps in knowledge of words on the NGSL. The NGSLT is a multiple-choice test that consists of 5 levels, each assessing knowledge of 20 randomly sampled words from a 560-word frequency-based level of the NGSL. A bilingual version of the NGSLT was developed to minimize the risk of conflating vocabulary knowledge with understanding of the answer choices. A validation study with 382 Japanese high school and university learners found the instrument to be reliable (α = .97) and unidimensional and to demonstrate good fit to the Rasch model. 本論文では New General Service List (NGSL) に基づく語彙サイズテスト(NGSLT)の日本語版の開発及び検証を論じる。NGSL (Browne, 2013) は高いテキストカバー率を目指して編集された2800語の高頻度語彙のリストであり、NGSLT (Stoeckel & Bennett, 2015) はそのリストについての学習者の知識を診断するテストである。NGSLを560語ごとの5レベルに分割し、各レベルから20語を無作為に抽出し計100問の多肢選択式のテストを作成した。選択肢の理解不足によって不正解になる懸念があるため、日本語版を作成した。大学生・高校生合わせて382人の学習者による検証により、この日本語版の信頼性が高いこと(α = .97)、測定が一次元的に行われていること、またラッシュモデルに適合することが確認された。


2021 ◽  
Vol 3 (1) ◽  
pp. 11-20
Author(s):  
Ulfah Zahiroh ◽  
Pangoloan Soleman Ritonga

This research aimed at knowing the quality of test item derived from its validity, reliability, difficulty level, discriminator power, and distractor effectiveness. Quantitative descriptive method was used in this research. Interview and documentation were the techniques of collecting the data. The data source used was in evensemester exam questions that were in the forms of multiple-choice, student answer sheet, and answer key. Anates 4.0.9 program was to analyze the quality of test items. The research findings of the analysis of multiple-choice test item quality on semester final exam of Chemistry subject at the eleventh grade of State Islamic Senior High School 2 Kepulauan Meranti showed that in the validity analysis there were 6 valid test items (17%) and 29 non validitems (83%); in the reliability analysis it was obtained the reliability score 0.955; in the difficulty level analysis there were 12 easy test items (34%), 17 medium items (49%), and 6 hard items (17%); in the discriminator power analysis there were 4 very good test items (11.5%), a good item (3%), 19 items (54%) that should be revised, and 11 items (31.5%) that should be eliminated; in the distractor effectiveness there 26 very good options (19%), 10 good options (7%), 25 poor options (18%), 55 bad options (39%), and 24 very bad options (17%). Therefore, it could be concluded that the quality of test items could be stated bad.


2018 ◽  
Vol 7 (3.20) ◽  
pp. 109
Author(s):  
Hasni Shamsuddin ◽  
Nordin Abdul Razak ◽  
Ahmad Zamri Khairani

Rasch model analysis is an important tools in analysing students’ performance at item level. As such, the purpose of this study is to calibrate 14 years old students’ performance in mathematics test based on the item difficulty parameter. 307 Form 2 students provide responses for this study. A 40-item multiple choice test was developed to gauge the responses. Results show that two of the items need to be dropped since they did not meet the Rasch model’s expectations. Analysis on the remaining items showed that the students were most competent in item related to Directed Numbers (mean = -1.445 logits), while they are least competent in the topic of Circle (mean = 1.065 logits). We also provide calibration of the performance at item level. In addition, we discuss how to the findings might be helpful for teachers in addressing students’ difficulty in the topics.  


2020 ◽  
Vol 4 (1) ◽  
pp. 44-63
Author(s):  
Pandu J Laksono

This study aims to develop a three tier multiple choice test instrument on chemical equilibrium material to identify misconceptions in students. The development model used by Borg and Gall (1983): (1) research and data collection, (2) planning, (3) developing product drafts, (4) conducting initial field trials, (5) making revisions to initial products; (6) conducting limited field tests, (7) product improvement. This research was conducted with 60 respondents. Validation uses the Aiken method with 5 expert validators. The conclusion is that the three tier multiple choice test instrument developed was declared to be feasible and met the criteria as a good problem with the average aiken validity of 0.87. The instrument was declared feasible in terms of the reliability of the test 0.806 included in the high category, has a distinguishing power of 0.351 so that it was categorized as good. Difficulty level found 20% categorized as easy problems, 71.11% categorized as moderate and 8.89% categorized as difficult. Based on the deception index, it was concluded that the deception mostly worked better than the answer key and the deception value mostly more than 5% was chosen so that it was declared effective. Three Tier Multiple Choice Test instrument seen from the practicality included in good categories with a percentage of 78.28%.


2018 ◽  
Vol 4 (1) ◽  
pp. 100
Author(s):  
Desi Afriani ◽  
Kusno Kusno

This research aims to describe the composition of cognitive process and knowledge dimension of the item analysis of even final test on mathematic of the fourth grade of SMP cluster 1 Banyumas in academic year 2014/2015, to describe the quality of the even final test based on the theoretical and empirical analysis. Based on the analysis result, the composition of the cognitive process in the item consisted of cognitive process of understanding (35,14%) and applying (64,86%), the knowledge composition consisted of conceptual knowledge (35,14%) and procedural (64,86%). The analysis result of the content validity, multiple choice test, short answer, and essay with the accordingly percentages 96%, 90%, and 80% had fulfilled the criteria of good test. The analysis result of the contruct validity and face validity, multiple choice test, short answer, and essay with the accordingly percentages 92%, 90%, and 80% had fulfilled the criteria of good test. The empirical result analysis showed that the validity level of the item of multiple choice and short answer were dominant in fair category, the essay were dominant in high category, the coefficient reliability of multiple choice, short anwer question, and essay were accordingly 0,62; 0,50; 0,63, the distractor of multiple choice, short answer, and essays were dominant in fair category, the difficulty level of the multiple choice, short answer, and essay were dominant in fair category.


Sign in / Sign up

Export Citation Format

Share Document