Shiken 24.1 - TEVAL - Shiken: A Journal of Language Testing and Evaluation in Japan
Latest Publications


TOTAL DOCUMENTS

11
(FIVE YEARS 4)

H-INDEX

0
(FIVE YEARS 0)

Published By The Japan Association For Language Teaching (JALT)

1881-5537

Author(s):  
David Allen ◽  
Trevor Holster

A robust finding in psycholinguistics is that cognates and loanwords, which are words that typically share some degree of form and meaning across languages, provide the second language learner with benefits in language use when compared to words that do not share form and meaning across languages. This cognate effect has been shown to exist for Japanese learners of English; that is, words such as table are processed faster and more accurately in English because they have a loanword equivalent in Japanese (i.e., テーブル /te:buru/ ‘table’). Previous studies have also shown that the degree of phonological and semantic similarity, as measured on a numerical scale from ‘completely different’ to ‘identical’, also influences processing. However, there has been relatively little appraisal of such cross-linguistic similarity ratings themselves. Therefore, the present study investigated the structure of the similarity ratings using Rasch analysis, which is an analytic approach frequently used in the design and validation of language assessments. The findings showed that a 4-point scale may be optimal for phonological similarity ratings of cognates and a 2-point scale may be most appropriate for semantic similarity ratings. Furthermore, this study reveals that while a few raters and items misfitted the Rasch model, there was substantial agreement in ratings, especially for semantic similarity. The results validate the ratings for use in research and demonstrate the utility of Rasch analysis in the design and validation of research instruments in psychology.



Author(s):  
Sam Reid ◽  
Peter Chin

Critical thinking (CT) is taking on an increasingly important role in Japanese tertiary education. Teachers tasked with developing CT in a second-language (L2) context may need a way of assessing students’ abilities. However, a number of difficulties face L2 students taking a test designed for first-language (L1) speakers. They may be disadvantaged by linguistic and perhaps cultural issues. This study describes an exploratory attempt to make a CT test that can be administered to learners of English and which allows them to display selected elements of CT, specifically analyzing arguments and judging or evaluating. A comparison of L1 and L2 performance in the test showed the results to be comparable. Analysis of two different question topics showed differences in CT skills displayed. Issues with rating accuracy are linked to the format of the test. We argue that this test format is suitable for many students in Japan and elsewhere who have intermediate levels of English.



Author(s):  
Bartolo Bazan

The listening span task is a measure of working memory that requires participants to process sets of increasing numbers of utterances and store the last word of each utterance for recall at the end of each set. Measures to date have contained an exceedingly demanding processing component, possibly leading to insufficient resources to meet the word recall requirement, which may have affected the sensitivity of the measure to distinguish different levels of working memory. Further, tasks thus far have asked participants to verify the content utterances based on knowledge, which may have confounded the measurement of working memory capacity with world knowledge. An additional weakness is that they lack sound psychometric construct validity evidence, which clouds what these tools actually measure. This pilot study presents a listening span task that accounts for preceding methodological shortcomings, which was administered to 31 Japanese junior high school students. The participants listened to ten sets (two sets of equal length of two, three, four, five and six utterances) of short casual utterances, judged whether they made sense in Japanese, and recalled the last word of each utterance in the set. Performance was assessed through a scoring procedure new to listening span tasks in which credit is given for the words recalled in order of appearance until memory failure. The data was analyzed through the Rasch model, which produces evidence for different aspects of validity and indicates if the items in a test measure a unidimensional construct. The results provided validity evidence for the use of the new listening span task and revealed that the instrument measured a single unidimensional construct.



Author(s):  
Eric Shepherd Martin

This paper details the development and validation of a listening self-efficacy instrument for EFL/ESL learners with beginner- to-intermediate-level English language proficiency. Self-efficacy, or the belief in one's ability to perform a task successfully, is believed to determine how likely individuals will be to cope with difficulties relating to the task domain (e.g., listening, speaking, reading, or writing), and to sustain their effort in spite of obstacles (Bandura, 1997). To date, few instruments have been developed to evaluate English L2 listening self-efficacy. The instrument presented here was distributed among a sample of first- and second-year Japanese university students (N = 121), and, unlike most previously developed questionnaires, was validated through the use of Rasch analysis. The results of the administration of the questionnaire showed that learners' responses differed predictably and considerably, thereby suggesting the utility of the instrument for future use by EFL/ESL practitioners.



Author(s):  
Jeffrey Martin

This study critically evaluated an anonymized peer feedback and assessment design for L2 writers enhanced by the use of Rasch analysis. This approach centered on the acts of giving assessment (Topping, 1998) and feedback (Lundstrom & Baker, 2009) in exchange with multiple peers. Each participant received feedback comments and class-wide statistical measures summarized for students without the need for their peers to rate all papers. Anonymity was maintained to bring unencumbered attention to the role of the reader (Booth et al., 2008) and to provide a space for interpretation and reflection on the potentially contrasting data and experiences that emerge. This process is argued to drive cognitive development and improve L2 writing skills. An initial trial with 15 high-proficiency EFL learners indicated that the design facilitated an effective exchange for each participant. The effects of anonymously including teacher comments also brought informative insights about the perception of feedback and its sources. Issues were found regarding overly narrow use of the ratings scales by some participants. A 6-point rating scale is proposed for more differentiated scoring. Overall, positive engagement and reception by the participants suggests that this peer assisted learning approach holds promise for L2 writers.



Author(s):  
David Allen

This article describes a recent education reform initiative concerning English education in Japan, specifically the proposed introduction of four-skills tests as part of the university entrance admissions process. The first aim is to summarize, in English, some of the key issues and events concerning the reform. To this end, background information and a timeline of key events since 2016 is provided. The second aim is to contrast proposals made by two academic organizations, the Japan Language Testing Association (JLTA) and the Science Council of Japan: Language and Literature Committee (SCJ). It is shown that, while agreeing on a number of specific issues related to the reform, these two organizations take starkly different positions in terms of their general orientation, which, it is argued, reflects the background of the organizational members and their views on foreign language education in Japan. These contrasting positions are discussed with reference to the metaphor, to throw the baby out with the bathwater. Finally, it is argued that a number of criticisms levelled at the proposed use of private four-skills tests illustrate a reluctance to engage with issues related to the currently used university entrance exams; in other words, these criticisms are made while ignoring the elephant in the room.



Author(s):  
Bartolo Bazan

Working Memory refers to the capacity to temporarily retain a limited amount of information that is available for manipulation by higher-order cognitive processes. Several assessment instruments, such as the speaking span task, have been associated with the measurement of working memory span. However, despite the widespread use of the speaking span task, no study, to the best of my knowledge, has attempted to validate it using Rasch Measurement Theory. Rasch analysis can potentially shed light on the dimensionality of a complex construct such as working memory as well as examine whether a collection of items is working together to construct a coherent and reliable measure of a targeted population. This pilot study reports a Rasch analysis of a novel speaking span task, which was administered individually to 31 Japanese junior high school students and scored using a newly developed scoring system. Two separate analyses were conducted on the task: an analysis of the individual items using the Rasch dichotomous model and an analysis of the super items (sets) using the partial credit model. The results indicate that the task measures a coherent unidimensional latent variable and is thus a useful tool for measuring the construct. Moreover, Rasch analysis was shown to be suitable method for evaluating working memory tests.



Author(s):  
David Allen

This article presents a history of Shiken since it was first published in 1997 until 2019, followed by suggestions for areas of future research in assessment to which the publication may be well suited to contribute. In the historical overview, data is presented about the following: the origins, titles, editors, and distribution; the article types; the contents of research articles and the design and methodologies they have employed. Regarding research article content, four prominent themes were identified: mass market tests, entrance exams, statistics, and validity/reliability. Regarding design and methods, research articles have tended to focus on English language tests with university students in Japan, while utilizing test and/or instrument data and quantitative methods of analysis. Recommendations for future research areas include investigations into the validity of test interpretations and uses of four-skills, vocabulary and other tests used in Japan, and language assessment literacy. Recommendations for future research design and methods include focusing more on a range of test stakeholders; various contexts, such as pre-tertiary education; and the use of qualitative and mixed methods.



Sign in / Sign up

Export Citation Format

Share Document