Machine-Scored Syntax: Comparison of the CLAN Automatic Scoring Program to Manual Scoring

2020 ◽  
Vol 51 (2) ◽  
pp. 479-493
Author(s):  
Jenny A. Roberts ◽  
Evelyn P. Altenberg ◽  
Madison Hunter

Purpose The results of automatic machine scoring of the Index of Productive Syntax from the Computerized Language ANalysis (CLAN) tools of the Child Language Data Exchange System of TalkBank (MacWhinney, 2000) were compared to manual scoring to determine the accuracy of the machine-scored method. Method Twenty transcripts of 10 children from archival data of the Weismer Corpus from the Child Language Data Exchange System at 30 and 42 months were examined. Measures of absolute point difference and point-to-point accuracy were compared, as well as points erroneously given and missed. Two new measures for evaluating automatic scoring of the Index of Productive Syntax were introduced: Machine Item Accuracy (MIA) and Cascade Failure Rate— these measures further analyze points erroneously given and missed. Differences in total scores, subscale scores, and individual structures were also reported. Results Mean absolute point difference between machine and hand scoring was 3.65, point-to-point agreement was 72.6%, and MIA was 74.9%. There were large differences in subscales, with Noun Phrase and Verb Phrase subscales generally providing greater accuracy and agreement than Question/Negation and Sentence Structures subscales. There were significantly more erroneous than missed items in machine scoring, attributed to problems of mistagging of elements, imprecise search patterns, and other errors. Cascade failure resulted in an average of 4.65 points lost per transcript. Conclusions The CLAN program showed relatively inaccurate outcomes in comparison to manual scoring on both traditional and new measures of accuracy. Recommendations for improvement of the program include accounting for second exemplar violations and applying cascaded credit, among other suggestions. It was proposed that research on machine-scored syntax routinely report accuracy measures detailing erroneous and missed scores, including MIA, so that researchers and clinicians are aware of the limitations of a machine-scoring program. Supplemental Material https://doi.org/10.23641/asha.11984364

2002 ◽  
Vol 23 (2) ◽  
pp. 304-306
Author(s):  
Diane E. Beals

Since the late 1980s, the Child Language Data Exchange System (CHILDES) has defined the state of the art of collection, analysis, archiving, and data sharing of transcriptions of children's language. Starting from scratch in 1987, Brian MacWhinney, along with many other leaders in child language, developed highly useful tools for the computerization of transcripts and their analysis. I have used the transcription conventions and analysis programs since 1989 and have seen the system evolve from a simple DOS-based program to one that handles much broader and more complex analyses within more user-friendly Windows and Macintosh platforms. This latest (third) edition of the manual that accompanies the CHILDES system reflects a more stable version of the Conventions for Human Analysis of Transcripts (CHAT) and Child Language Analysis (CLAN) programs than prior editions, which felt like works in progress. This version is written as a finished product with procedures and programs that have settled down into stable patterns of operation.


1985 ◽  
Vol 12 (2) ◽  
pp. 271-295 ◽  
Author(s):  
Brian MacWhinney ◽  
Catherine Snow

ABSTRACTThe study of language acquisition underwent a major revolution in the late 1950s as a result of the dissemination of technology permitting high-quality tape-recording of children in the family setting. This new technology led to major breakthroughs in the quality of both data and theory. The field is now at the threshold of a possible second major breakthrough stimulated by the dissemination of personal computing. Researchers are now able to transcribe tape-recorded data into computer files. With this new medium it is easy to conduct global searches for word combinations across collections of files. It is also possible to enter new codings of the basic text line. Because of the speed and accuracy with which computer files can be copied, it is now much easier to share data between researchers. To foster this sharing of computerized data, a group of child language researchers has established the Child Language Data Exchange System (CHILDES). This article details the formation of the CHILDES, the governance of the system, the nature of the database, the shape of the coding conventions, and the types of computer programs being developed.


1990 ◽  
Vol 13 (2) ◽  
pp. 187-199
Author(s):  
Kim Plunkett

The Child Language Data Exchange System — CHILDES — is the largest child language archive in the world. The archive includes a wide range of languages covering both normal and abnormal populations. The database is freely accessible to the research community and the user is supported with guidelines for carrying out transcription work and software packages for the automatic analysis of transcriptions. The article provides a brief overview of the CHAT transcription notation and the CLAN programs that can be used to analyse transcripts written in CHAT format. Current drawbacks of the CHILDES system are discussed and some pointers to future developments higlighted.


2005 ◽  
Vol 10 (2) ◽  
pp. 223-230 ◽  
Author(s):  
Fabíola de Sousa Braz Aquino ◽  
Nádia Maria Ribeiro Salomão

O presente estudo investigou a utilização de enunciados maternos diretivos, os quais podem funcionar para dirigir, controlar e manter a atenção da criança nas trocas interativas. Os enunciados diretivos podem apresentar diferentes funções nas interações e variações em seu uso dependendo de características como o gênero. Nesse estudo foram analisadas as possíveis variações no uso de diretivos maternos, em função do gênero da criança. Participaram desse estudo 16 díades mãe-criança nas idades entre 24-30 meses. As díades foram filmadas em ambiente natural durante 20 minutos. As transcrições das sessões seguiram as diretrizes do sistema computacional Child Language Data Exchange System (CHILDES). A aplicação do teste Mann-Whitney revelou variações no uso de diretivos maternos, sendo dirigidos mais diretivos a meninos que a meninas. Os resultados foram discutidos considerando-se o nível lingüístico infantil e os contextos nos quais os enunciado ocorreram.


2009 ◽  
Vol 30 (3) ◽  
pp. 463-484 ◽  
Author(s):  
SARAH ROBINS ◽  
REBECCA TREIMAN

ABSTRACTIn six analyses using the Child Language Data Exchange System known as CHILDES, we explored whether and how parents and their 1.5- to 5-year-old children talk about writing. Parent speech might include information about the similarity between print and speech and about the difference between writing and drawing. Parents could convey similarity between print and speech by using the wordssay,name, andwordto refer to both spoken and written language. Parents could differentiate writing and drawing by making syntactic and semantic distinctions in their discussion of the two symbol systems. Our results indicate that parent speech includes these types of information. However, young children themselves sometimes confuse writing and drawing in their speech.


1990 ◽  
Vol 17 (2) ◽  
pp. 457-472 ◽  
Author(s):  
Brian MacWhinney ◽  
Catherine Snow

ABSTRACTIn a previous issue of this Journal, MacWhinney & Snow (1985) laid out the basic sketch for an international system for exchanging and analysing child language transcript data. This system – the Child Language Data Exchange System (CHILDES) – has developed three major tools for child language research: (1) the CHILDES database of transcripts, (2) the CHAT system for transcribing and coding data, and (3) the CLAN programs for analysing CHAT files. Here we sketch out the current shape of these three major tools and the organizational form of the CHILDES system. A forthcoming book (MacWhinney, in press) documents these tools in detail.


1992 ◽  
Vol 19 (2) ◽  
pp. 459-471 ◽  
Author(s):  
Brian Macwhinney ◽  
Catherine Snow

ABSTRACTEdwards (1992) presents a set of examples from the Child Language Data Exchange System (CHILDES) as prototypes of bad transcription practice. Her discussion is based upon four basic confusions. First, Edwards confuses old and discarded versions of CHAT with current CHAT. Second, she confuses the relation between CHAT standards with the implementation of these standards during the process of reformatting older corpora. Third, she confuses transcription for automatic analysis with transcription for documentation. Fourth, she confuses the CHAT guidelines with the larger CHILDES system. We argue that these confusions have misled Edwards into developing an overly rigid set of principles for data analysis which, if followed literally, could choke off progress in the analysis of spontaneous language samples.


Sign in / Sign up

Export Citation Format

Share Document