scholarly journals Inter-annotator agreement in spoken language annotation: Applying uα-family coefficients to discourse segmentation

2021 ◽  
Vol 25 (2) ◽  
pp. 478-506
Author(s):  
Salvador Pons Bordería ◽  
Elena Pascual Aliaga

As databases make Corpus Linguistics a common tool for most linguists, corpus annotation becomes an increasingly important process. Corpus users do not need only raw data, but also annotated data, submitted to tagging or parsing processes through annotation protocols. One problem with corpus annotation lies in its reliability, that is, in the probability that its results can be replicable by independent researchers. Inter-annotation agreement (IAA) is the process which evaluates the probability that, applying the same protocol, different annotators reach similar results. To measure agreement, different statistical metrics are used. This study applies IAA for the first time to the Valencia Espaol Coloquial (Val.Es.Co.) discourse segmentation model, designed for segmenting and labelling spoken language into discourse units. Whereas most IAA studies merely label a set of in advance pre-defined units, this study applies IAA to the Val.Es.Co. protocol, which involves a more complex two-fold process: first, the speech continuum needs to be divided into units; second, the units have to be labelled. Kripendorffs u -family statistical metrics (Krippendorff et al. 2016) allow measuring IAA in both segmentation and labelling tasks. Three expert annotators segmented a spontaneous conversation into subacts, the minimal discursive unit of the Val.Es.Co. model, and labelled the resulting units according to a set of 10 subact categories. Kripendorffs u coefficients were applied in several rounds to elucidate whether the inclusion of a bigger number of categories and their distinction had an impact on the agreement results. The conclusions show high levels of IAA, especially in the annotation of procedural subact categories, where results reach coefficients over 0.8. This study validates the Val.Es.Co. model as an optimal method to fully analyze a conversation into pragmatically-based discourse units.

2015 ◽  
pp. 113-122
Author(s):  
Violetta Koseska-Toszewa ◽  
Joanna Satoła-Staśkowiak ◽  
Wojciech Sosnowski

From the Problems of Dictionaries and Multi-lingual CorporaThe article describes the work on a number of dictionaries being developed by the Corpus Linguistics and Semantics Group of the Institute of Slavic PAS. They include “Contemporary Bulgarian-Polish Dictionary”, “Bulgarian-Polish Online Dictionary” and “Russian-Bulgarian-Polish Dictionary”. The dictionaries differ in the numbers of entries, as well as in the different degrees of their connection with parallel corpora being elaborated under the “Clarin” project. All the discussed dictionaries are similar with respect to their use of traditional, syntactic classifiers and of semantic classifiers, introduced for the first time in the existing lexicographical practice. Thanks to the “Polish-Bulgarian-Russian Corpus”, the Group has managed to verify the results of contrasting Polish and Bulgarian in the light of scope-based logical quantification. Thanks to the Russian material added to the trilingual corpus, the researchers have managed to confirm the fact that from the viewpoint of “incomplete quantification” Russian and Polish (synthetic languages) behave similarly, and are opposed to the analytic Bulgarian.


In Language Assessment Across Modalities: Paired-Papers on Signed and Spoken Language Assessment, volume editors Tobias Haug, Wolfgang Mann, and Ute Knoch bring together—for the first time—researchers, clinicians, and practitioners from two different fields: signed language and spoken language. The volume examines theoretical and practical issues related to 12 topics ranging from test development and language assessment of bi-/multilingual learners to construct issues of second-language assessment (including the Common European Framework of Reference [CEFR]) and language assessment literacy in second-language assessment contexts. Each topic is addressed separately for spoken and signed language by experts from the relevant field. This is followed by a joint discussion in which the chapter authors highlight key issues in each field and their possible implications for the other field. What makes this volume unique is that it is the first of its kind to bring experts from signed and spoken language assessment to the same table. The dialogues that result from this collaboration not only help to establish a shared appreciation and understanding of challenges experienced in the new field of signed language assessment but also breathes new life into and provides a new perspective on some of the issues that have occupied the field of spoken language assessment for decades. It is hoped that this will open the door to new and exciting cross-disciplinary collaborations.


1972 ◽  
Vol 18 (9) ◽  
pp. 1013-1018
Author(s):  
M A Evenson ◽  
M A Olson

Abstract A high-speed, high-performance, continuous-flow analyzer is described that operates at two to three times the usual analysis rate without necessitating corrections of the raw data and with no decrease in accuracy or precision. At faster speeds (180-300 samples/h) inductive sample interaction (%Ii), opposite in direction to carry-over, is for the first time quantitatively measured. A correction equation for %Ii was developed, and when it is applied to raw data, the accuracy of the results are significantly improved. Operating characteristics of the high-speed analyzer are described and the desirability of automatic computer corrections is discussed for the high-speed system.


2013 ◽  
Vol 2 (2) ◽  
pp. 224-241
Author(s):  
Yevgen Matusevych ◽  
Ad Backus ◽  
Martin Reynaert

This article is about the type of language that is offered to learners in textbooks, using the example of Russian. Many modern textbooks of Russian as a foreign language aim at efficient development of oral communication skills. However, some expressions used in the textbooks are not typical for everyday language. We claim that textbooks’ content should be reassessed based on actual language use, following theoretical and methodological models of cognitive and corpus linguistics. We extracted language patterns from three textbooks, and compared them with alternative patterns that carry similar meaning by (1) calculating the frequency of occurrence of each pattern in a corpus of spoken language, and (2) using Russian native speakers’ intuitions about what is more common. The results demonstrated that for 39 to 53 percent of all the recurrent patterns in the textbooks better alternatives could be found. We further investigated the typical shortcomings of the extracted patterns.


2020 ◽  
Vol 8 (2) ◽  
pp. 464-473
Author(s):  
Lipikajyoti Dowarah

Purpose of the study: The research aims to study the implementation of the Mid-Day Meal Scheme in Government primary schools of the Tinsukia district of Assam. Methodology: For the purpose study descriptive methodology is used. Data are randomly collected through a self-structured questionnaire. Tables and figures are used to analyze the collected raw data. Main Findings: Results showed that Mid-Day Meal Scheme is playing an important role in reducing classroom hunger of students. But the functioning of the Mid-Day Meal Scheme in many surveyed schools is not by the guidelines of the scheme provided by the government particularly in respect of the timely supply of grains, storage facility of grains and plates for distribution of meal to the children. Applications of this study: This research can be used by policymakers, teachers, parents and also social welfare activities. Novelty/Originality of this study: For the first time, implementation of the Mid-Day Meal Scheme has been studied in the Tinsukia district of Assam.


2021 ◽  
Vol 61 (5) ◽  
pp. 13-25

Consumer decision making as an important process in marketing sphere has been discussed in detail, but so far researchers as a rule have not focused the attention on how the purchase happens for the very first time. The current text is an attempt to develop the foundations and to make a conceptual framework of the first purchase in marketing and to outline its significance for current or future consumption, especially for the fast-moving consumer goods (FMCG). Together with a review of the extent to which the first purchase is considered and interpreted in the academic tradition in the field (and in practice), an attempt has been made to outline it as a phenomenon, since it can have significant benefits for better understanding consumer behavior and the further improvement of marketing communications. With this regard, it can be assumed that the first purchase is the initial step of acquiring consumer experience, which determines whether the product will continue to be purchased or not. Of course, all this is largely valid for the b2c (business to consumer) markets, for the products for individual and household consumption, and for the b2b (business to business) ones the particularities may differ significantly and need to be a subject of additional research efforts.


2017 ◽  
Vol 8 (2) ◽  
pp. 149-166 ◽  
Author(s):  
Ludivine Crible ◽  
Maria-Josep Cuenca

It is generally acknowledged that discourse markers are used differently in speech and writing, yet many general descriptions and most annotation frameworks are written-based, thus partially unfit to be applied in spoken corpora. This paper identifies the major distinctive features of discourse markers in spoken language, which can be associated with problems related to their scope and structure, their meaning and their tendency to co-occur. The description is based on authentic examples and is followed by methodological recommendations on how to deal with these phenomena in more exhaustive, speech-friendly annotation models.


2018 ◽  
Vol 2 (1) ◽  
pp. 72-91 ◽  
Author(s):  
Jonas Bens

In The Prosecutor v. Ahmad Al Faqi Al Mahdi, the International Criminal Court (ICC) tried the destruction of UNESCO World Heritage sites as a war crime for the first time. In this case, the value of things in relation to the value of persons became the central issue. Based on courtroom ethnography conducted during the proceedings and informed by affect and emotion research, this article identifies the rhetorical practice of sentimentalising persons and things as an important process of legal meaning making. Through sentimentalising, all parties rhetorically produce normative arrangements of bodies by way of emotionally differentiating the relevant persons, things and other entities from and affectively relating them to each other. Sentimentalising provides an affective-emotional frame in which to determine the degree of guilt and innocence, justice and injustice.


2016 ◽  
Vol 24 (4) ◽  
pp. 568-591 ◽  
Author(s):  
Lars Engwall ◽  
Tina Hedmo

This paper focuses on the processes through which scientific fields are organized over time. It is argued that new approaches in scientific work are hampered by authority structures within national systems for research and established approaches within disciplines, but that these obstacles can be overcome by means of external funding, particularly through new funding sources, as well as the international developments of an innovation. As far as the latter are concerned, they are expected to first lead to informal collaboration among scholars. In the passage of time this informal collaboration becomes more and more formalized. In order to analyse such processes the paper presents a model with three phases labelled as creating, gathering and communicating. This model is then used in an empirical study of corpus linguistics, i.e. the systematic analysis of well-defined populations of written and/or spoken language material. It is shown in the paper how corpus linguistics was developed by scientific innovators who were initially questioned. With the passage of time they created a number of international organizations, which have eventually become more and more formalized, many of them publishing their own journals. In this way the paper demonstrates the significance of organizing for the development of scientific fields.


2009 ◽  
Vol 10 (2) ◽  
pp. 286-309 ◽  
Author(s):  
Dawn Archer ◽  
Jonathan Culpeper

In this paper, we argue that there is another approach to the study of historical pragmatics beyond those explicitly mentioned in Jacobs and Jucker (1995). We label this approach “sociophilology”. Moreover, we demonstrate how this approach can be effectively pursued by combining two corpus linguistics techniques: corpus annotation and “keyness” analysis. Specifically, we draw from the Sociopragmatic Corpus (1640–1760), an annotated subsection of comedy plays and drama proceedings taken from the Corpus of Dialogues 1560–1760, as a means of identifying the statistically-based style markers, or key items, associated with a number of social role dyads (including examiner to examinee and master/mistress to servant). We will show how such an approach might be used to uncover differential distributions of personal pronouns, interjections, imperative verbs, politeness formulae, etc., and how, by combining qualitative analysis with quantitative analysis, one can scrutinise such material for pragmatic import.


Sign in / Sign up

Export Citation Format

Share Document