Machine-Scored Syntax: Comparison of the CLAN Automatic Scoring Program to Manual Scoring

Jenny A. Roberts; Evelyn P. Altenberg; Madison Hunter

doi:10.1044/2019_lshss-19-00056

Machine-Scored Syntax: Comparison of the CLAN Automatic Scoring Program to Manual Scoring

Language Speech and Hearing Services in Schools ◽

10.1044/2019_lshss-19-00056 ◽

2020 ◽

Vol 51 (2) ◽

pp. 479-493

Author(s):

Jenny A. Roberts ◽

Evelyn P. Altenberg ◽

Madison Hunter

Keyword(s):

Data Exchange ◽

Child Language ◽

Exchange System ◽

Absolute Point ◽

Language Analysis ◽

Search Patterns ◽

Language Data ◽

Report Accuracy ◽

Point To Point ◽

Automatic Scoring

Purpose The results of automatic machine scoring of the Index of Productive Syntax from the Computerized Language ANalysis (CLAN) tools of the Child Language Data Exchange System of TalkBank (MacWhinney, 2000) were compared to manual scoring to determine the accuracy of the machine-scored method. Method Twenty transcripts of 10 children from archival data of the Weismer Corpus from the Child Language Data Exchange System at 30 and 42 months were examined. Measures of absolute point difference and point-to-point accuracy were compared, as well as points erroneously given and missed. Two new measures for evaluating automatic scoring of the Index of Productive Syntax were introduced: Machine Item Accuracy (MIA) and Cascade Failure Rate— these measures further analyze points erroneously given and missed. Differences in total scores, subscale scores, and individual structures were also reported. Results Mean absolute point difference between machine and hand scoring was 3.65, point-to-point agreement was 72.6%, and MIA was 74.9%. There were large differences in subscales, with Noun Phrase and Verb Phrase subscales generally providing greater accuracy and agreement than Question/Negation and Sentence Structures subscales. There were significantly more erroneous than missed items in machine scoring, attributed to problems of mistagging of elements, imprecise search patterns, and other errors. Cascade failure resulted in an average of 4.65 points lost per transcript. Conclusions The CLAN program showed relatively inaccurate outcomes in comparison to manual scoring on both traditional and new measures of accuracy. Recommendations for improvement of the program include accounting for second exemplar violations and applying cascaded credit, among other suggestions. It was proposed that research on machine-scored syntax routinely report accuracy measures detailing erroneous and missed scores, including MIA, so that researchers and clinicians are aware of the limitations of a machine-scoring program. Supplemental Material https://doi.org/10.23641/asha.11984364

Get full-text (via PubEx)

The CHILDES Project: Tools for Analyzing Talk: Vol. 1. Transcription format and programs; Vol. 2. The database (3rd ed.). B. MacWhinney. Mahwah, NJ: Erlbaum, 2000. Pp. 366 (Vol. 1); Pp. 418 (Vol. 2).

Applied Psycholinguistics ◽

10.1017/s0142716402222079 ◽

2002 ◽

Vol 23 (2) ◽

pp. 304-306

Author(s):

Diane E. Beals

Keyword(s):

Data Sharing ◽

Data Exchange ◽

State Of The Art ◽

Child Language ◽

The State ◽

Exchange System ◽

Language Analysis ◽

Language Data ◽

User Friendly

Since the late 1980s, the Child Language Data Exchange System (CHILDES) has defined the state of the art of collection, analysis, archiving, and data sharing of transcriptions of children's language. Starting from scratch in 1987, Brian MacWhinney, along with many other leaders in child language, developed highly useful tools for the computerization of transcripts and their analysis. I have used the transcription conventions and analysis programs since 1989 and have seen the system evolve from a simple DOS-based program to one that handles much broader and more complex analyses within more user-friendly Windows and Macintosh platforms. This latest (third) edition of the manual that accompanies the CHILDES system reflects a more stable version of the Conventions for Human Analysis of Transcripts (CHAT) and Child Language Analysis (CLAN) programs than prior editions, which felt like works in progress. This version is written as a finished product with procedures and programs that have settled down into stable patterns of operation.

Get full-text (via PubEx)

Child Language Data Exchange System

Encyclopedia of Language Development ◽

10.4135/9781483346441.n24 ◽

2014 ◽

Author(s):

Steven Gillis

Keyword(s):

Data Exchange ◽

Child Language ◽

Exchange System ◽

Language Data

Get full-text (via PubEx)

Child language data exchange system

Journal of Child Language ◽

10.1017/s0305000900006139 ◽

1984 ◽

Vol 11 (3) ◽

pp. 721-721

Keyword(s):

Data Exchange ◽

Child Language ◽

Exchange System ◽

Language Data

Get full-text (via PubEx)

The child language data exchange system

Journal of Child Language ◽

10.1017/s0305000900006449 ◽

1985 ◽

Vol 12 (2) ◽

pp. 271-295 ◽

Cited By ~ 418

Author(s):

Brian MacWhinney ◽

Catherine Snow

Keyword(s):

Data Exchange ◽

Child Language ◽

New Technology ◽

Exchange System ◽

Language Data ◽

Share Data ◽

Recorded Data ◽

Family Setting ◽

Dissemination Of Technology

ABSTRACTThe study of language acquisition underwent a major revolution in the late 1950s as a result of the dissemination of technology permitting high-quality tape-recording of children in the family setting. This new technology led to major breakthroughs in the quality of both data and theory. The field is now at the threshold of a possible second major breakthrough stimulated by the dissemination of personal computing. Researchers are now able to transcribe tape-recorded data into computer files. With this new medium it is easy to conduct global searches for word combinations across collections of files. It is also possible to enter new codings of the basic text line. Because of the speed and accuracy with which computer files can be copied, it is now much easier to share data between researchers. To foster this sharing of computerized data, a group of child language researchers has established the Child Language Data Exchange System (CHILDES). This article details the formation of the CHILDES, the governance of the system, the nature of the database, the shape of the coding conventions, and the types of computer programs being developed.

Get full-text (via PubEx)

Computational Tools for Analysing Talk

Nordic Journal of Linguistics ◽

10.1017/s0332586500002262 ◽

1990 ◽

Vol 13 (2) ◽

pp. 187-199

Author(s):

Kim Plunkett

Keyword(s):

Data Exchange ◽

Child Language ◽

Research Community ◽

Exchange System ◽

Computational Tools ◽

Future Developments ◽

Software Packages ◽

Wide Range ◽

The World ◽

Language Data

The Child Language Data Exchange System — CHILDES — is the largest child language archive in the world. The archive includes a wide range of languages covering both normal and abnormal populations. The database is freely accessible to the research community and the user is supported with guidelines for carrying out transcription work and software packages for the automatic analysis of transcriptions. The article provides a brief overview of the CHAT transcription notation and the CLAN programs that can be used to analyse transcripts written in CHAT format. Current drawbacks of the CHILDES system are discussed and some pointers to future developments higlighted.

Get full-text (via PubEx)

Estilos diretivos maternos apresentados a meninos e meninas

Estudos de Psicologia (Natal) ◽

10.1590/s1413-294x2005000200009 ◽

2005 ◽

Vol 10 (2) ◽

pp. 223-230 ◽

Cited By ~ 3

Author(s):

Fabíola de Sousa Braz Aquino ◽

Nádia Maria Ribeiro Salomão

Keyword(s):

Data Exchange ◽

Child Language ◽

Exchange System ◽

Language Data

O presente estudo investigou a utilização de enunciados maternos diretivos, os quais podem funcionar para dirigir, controlar e manter a atenção da criança nas trocas interativas. Os enunciados diretivos podem apresentar diferentes funções nas interações e variações em seu uso dependendo de características como o gênero. Nesse estudo foram analisadas as possíveis variações no uso de diretivos maternos, em função do gênero da criança. Participaram desse estudo 16 díades mãe-criança nas idades entre 24-30 meses. As díades foram filmadas em ambiente natural durante 20 minutos. As transcrições das sessões seguiram as diretrizes do sistema computacional Child Language Data Exchange System (CHILDES). A aplicação do teste Mann-Whitney revelou variações no uso de diretivos maternos, sendo dirigidos mais diretivos a meninos que a meninas. Os resultados foram discutidos considerando-se o nível lingüístico infantil e os contextos nos quais os enunciado ocorreram.

Get full-text (via PubEx)

Talking about writing: What we can learn from conversations between parents and their young children

Applied Psycholinguistics ◽

10.1017/s0142716409090237 ◽

2009 ◽

Vol 30 (3) ◽

pp. 463-484 ◽

Cited By ~ 14

Author(s):

SARAH ROBINS ◽

REBECCA TREIMAN

Keyword(s):

Young Children ◽

Data Exchange ◽

Child Language ◽

Written Language ◽

Exchange System ◽

Symbol Systems ◽

Language Data ◽

The Difference ◽

Types Of Information

ABSTRACTIn six analyses using the Child Language Data Exchange System known as CHILDES, we explored whether and how parents and their 1.5- to 5-year-old children talk about writing. Parent speech might include information about the similarity between print and speech and about the difference between writing and drawing. Parents could convey similarity between print and speech by using the wordssay,name, andwordto refer to both spoken and written language. Parents could differentiate writing and drawing by making syntactic and semantic distinctions in their discussion of the two symbol systems. Our results indicate that parent speech includes these types of information. However, young children themselves sometimes confuse writing and drawing in their speech.

Get full-text (via PubEx)

The Child Language Data Exchange System: an update

Journal of Child Language ◽

10.1017/s0305000900013866 ◽

1990 ◽

Vol 17 (2) ◽

pp. 457-472 ◽

Cited By ~ 271

Author(s):

Brian MacWhinney ◽

Catherine Snow

Keyword(s):

Data Exchange ◽

Child Language ◽

International System ◽

Organizational Form ◽

Previous Issue ◽

Exchange System ◽

System A ◽

Forthcoming Book ◽

Language Research ◽

Language Data

ABSTRACTIn a previous issue of this Journal, MacWhinney & Snow (1985) laid out the basic sketch for an international system for exchanging and analysing child language transcript data. This system – the Child Language Data Exchange System (CHILDES) – has developed three major tools for child language research: (1) the CHILDES database of transcripts, (2) the CHAT system for transcribing and coding data, and (3) the CLAN programs for analysing CHAT files. Here we sketch out the current shape of these three major tools and the organizational form of the CHILDES system. A forthcoming book (MacWhinney, in press) documents these tools in detail.

Get full-text (via PubEx)

The wheat and the chaff: or four confusions regarding CHILDES

Journal of Child Language ◽

10.1017/s0305000900011491 ◽

1992 ◽

Vol 19 (2) ◽

pp. 459-471 ◽

Cited By ~ 4

Author(s):

Brian Macwhinney ◽

Catherine Snow

Keyword(s):

Data Analysis ◽

Data Exchange ◽

Child Language ◽

Automatic Analysis ◽

Exchange System ◽

Spontaneous Language ◽

Language Data ◽

Rigid Set

ABSTRACTEdwards (1992) presents a set of examples from the Child Language Data Exchange System (CHILDES) as prototypes of bad transcription practice. Her discussion is based upon four basic confusions. First, Edwards confuses old and discarded versions of CHAT with current CHAT. Second, she confuses the relation between CHAT standards with the implementation of these standards during the process of reformatting older corpora. Third, she confuses transcription for automatic analysis with transcription for documentation. Fourth, she confuses the CHAT guidelines with the larger CHILDES system. We argue that these confusions have misled Edwards into developing an overly rigid set of principles for data analysis which, if followed literally, could choke off progress in the analysis of spontaneous language samples.

Get full-text (via PubEx)