Large-scale study of speech acts' development using automatic labelling

Studies of children's language use in the wild (e.g., in the context of child-caregiver social interaction) have been slowed by the time- and resource- consuming task of hand annotating utterances for communicative intents/speech acts. Existing studies have typically focused on investigating rather small samples of children, raising the question of how their findings generalize both to larger and more representative populations and to a richer set of interaction contexts. Here we propose a simple automatic model for speech act labeling in early childhood based on the INCA-A coding scheme (Ninio et al., 1994). After validating the model against ground truth labels, we automatically annotated the entire English-language data from the CHILDES corpus. The major theoretical result was that earlier findings generalize quite well at a large scale. Our model will be shared with the community so that researchers can use it with their data to investigate various questions related to language use development.

Download Full-text

Large-scale study of speech acts' development in early childhood

10.31234/osf.io/xs8k6 ◽

2021 ◽

Author(s):

Mitja Nikolaus ◽

Eliot Maes ◽

Jeremy Auguste ◽

Laurent Prévot ◽

Abdellah Fourtassi

Keyword(s):

Early Childhood ◽

Speech Acts ◽

Large Scale ◽

English Language ◽

Language Use ◽

Ground Truth ◽

Small Samples ◽

Language Data ◽

In The Wild ◽

Child Caregiver

Studies of children's language use in the wild (e.g., in the context of child-caregiver social interaction) have been slowed by the time- and resource- consuming task of hand annotating utterances for communicative intents/speech acts. Existing studies have typically focused on investigating rather small samples of children, raising the question of how their findings generalize both to larger and more representative populations and to a richer set of interaction contexts. Here we propose a simple automatic model for speech act labeling in early childhood based on the INCA-A coding scheme (Ninio, Snow, Pan, & Rollins, 1994). After validating the model against ground truth labels, we automatically annotated the entire English-language data from the CHILDES corpus. The major theoretical result was that earlier findings generalize quite well at a large scale. Further, we introduced two complementary measures for the age of acquisition of speech acts which allows us to rank different speech acts according to their order of emergence in production and comprehension.Our model will be shared with the community so that researchers can use it with their data to investigate various question related to language use both in typical and atypical populations of children.

Download Full-text

Comparative Analysis of Supervised and Unsupervised Approaches Applied to Large-Scale “In The Wild” Face Verification

Symmetry ◽

10.3390/sym12111832 ◽

2020 ◽

Vol 12 (11) ◽

pp. 1832

Author(s):

Tomasz Hachaj ◽

Patryk Mazurek

Keyword(s):

Pattern Recognition ◽

Large Scale ◽

Statistical Significance ◽

Ground Truth ◽

Classification Algorithm ◽

Adjusted Rand Index ◽

Face Verification ◽

Data Set ◽

In The Wild ◽

Unsupervised Approaches

Deep learning-based feature extraction methods and transfer learning have become common approaches in the field of pattern recognition. Deep convolutional neural networks trained using tripled-based loss functions allow for the generation of face embeddings, which can be directly applied to face verification and clustering. Knowledge about the ground truth of face identities might improve the effectiveness of the final classification algorithm; however, it is also possible to use ground truth clusters previously discovered using an unsupervised approach. The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without ground truth knowledge. In this study, we use two sufficiently large data sets containing more than 200,000 “taken in the wild” images, each with various resolutions, visual quality, and face poses which, in our opinion, guarantee the statistical significance of the results. We examine several clustering and supervised pattern recognition algorithms and find that knowledge about the ground truth has a very small influence on the Fowlkes–Mallows score (FMS) of the classification algorithm. In the case of the classification algorithm that obtained the highest accuracy in our experiment, the FMS improved by only 5.3% (from 0.749 to 0.791) in the first data set and by 6.6% (from 0.652 to 0.718) in the second data set. Our results show that, beside highly secure systems in which face verification is a key component, face identities discovered by unsupervised approaches can be safely used for training supervised classifiers. We also found that the Silhouette Coefficient (SC) of unsupervised clustering is positively correlated with the Adjusted Rand Index, V-measure score, and Fowlkes–Mallows score and, so, we can use the SC as an indicator of clustering performance when the ground truth of face identities is not known. All of these conclusions are important findings for large-scale face verification problems. The reason for this is the fact that skipping the verification of people’s identities before supervised training saves a lot of time and resources.

Download Full-text

Text-linguistic approaches to register variation

10.1075/rs.18007.bib ◽

2019 ◽

Vol 1 (1) ◽

pp. 42-75 ◽

Cited By ~ 3

Author(s):

Douglas Biber

Keyword(s):

Large Scale ◽

English Language ◽

Language Use ◽

Language Variation ◽

Patterns Of Use ◽

Comprehensive Framework ◽

Occurrence Patterns ◽

Linguistic Approach ◽

Learner Language

Abstract Douglas Biber, Regents’ Professor of Applied Linguistics at Northern Arizona University, authors this article exploring the connections between register and a text-linguistic approach to language variation. He has spent the last 30 years pursuing a research program that explores the inherent link between register and language use, including at the phraseological, grammatical, and lexico-grammatical levels. His seminal book Variation across Speech and Writing (1988, Cambridge University Press) launched multi-dimensional (MD) analysis, a comprehensive framework and methodology for the large-scale study of register variation. This approach was innovative in taking a text-linguistic approach to characterize language use across situations of use through the quantitative and functional analysis of linguistic co-occurrence patterns and underlying dimensions of language use. MD analysis is now used widely to study register variation over time, in general and specialized registers, in learner language, and across a range of languages. In 1999, the Longman Grammar of Spoken and Written English (Biber et al.) became the first comprehensive descriptive reference book to systematically consider register variation in describing the grammatical and lexico-grammatical patterns of use in English. Douglas Biber’s quantitative linguistic research has consistently demonstrated the importance of register as a predictor of language variation. In his own words, “register always matters” (Gray 2013: 360, Interview with Douglas Biber, English Language & Linguistics).

Download Full-text

Pragmatics and Language Teaching

Journal of Language Teaching and Research ◽

10.17507/jltr.1105.21 ◽

2020 ◽

Vol 11 (5) ◽

pp. 841

Author(s):

Raifu O. Farinde ◽

Wasiu A. Oyedokun-Alli

Keyword(s):

Speech Acts ◽

English Language ◽

Language Use ◽

Language Teaching ◽

Gricean Pragmatics

The main goal of language teaching is that at the end of the period of learning, the learners should be able to communicate in that language effectively. The main source of language is language use. The students must therefore be given plenty of opportunity to use the language. This is where the principles of pragmatics come into language teaching. Pragmatics provides ample opportunities for the students to learn English language communicatively and practically. In this study, I shall focus particularly on the application of pragmatics to language teaching with emphasis on Gricean pragmatics and Searle’s speech acts. The question of why pragmatics should be assigned a more prominent place in language teaching syllabus is also sufficiently and adequately addressed.

Download Full-text

Networked Hospitality and Placemaking in the Sharing Economy

Revista Turismo em Análise ◽

10.11606/issn.1984-4867.v30i3p516-538 ◽

2019 ◽

Vol 30 (3) ◽

pp. 516-538

Author(s):

Lénia Marques ◽

Nigel Williams

Keyword(s):

Text Analysis ◽

Large Scale ◽

English Language ◽

Language Use ◽

Sharing Economy ◽

Quantitative Approach ◽

Narrative Construction ◽

Main Aspect ◽

Reputational Capital ◽

Similarities And Differences

This article investigates the similarities and differences for tangible and intangible elements (factors and language use) contributing to placemaking in Airbnb English language reviews in Paris (59,057 reviews), Barcelona (19,291 reviews) and London (30,403 reviews). This paper contributes to provide new insights on the narrative construction of reputational capital which is connected to placemaking strategies. A combined quantitative approach using large scale text analysis enabled the analysis of review content and style. Patterns in the words usage were identified. Findings suggest that tangible and intangible elements work together in the discourse, contributing to the place-narrative built on the host’s reputational capital. The host-guest interaction is the main aspect of the reviews, followed by the importance of transport and local amenities. Cities have different profiles in the composition of the word clusters which indicates differences in the guests’ perceived experience.

Download Full-text

An historical analysis of species references in American English

Corpora ◽

10.3366/cor.2019.0177 ◽

2019 ◽

Vol 14 (3) ◽

pp. 327-349

Author(s):

Craig Frayne

Keyword(s):

Environmental Change ◽

Sentiment Analysis ◽

Quantitative Methods ◽

English Language ◽

Language Use ◽

American English ◽

Historical Analysis ◽

The Past ◽

Corpus Studies ◽

Google Books

This study uses the two largest available American English language corpora, Google Books and the Corpus of Historical American English (coha), to investigate relations between ecology and language. The paper introduces ecolinguistics as a promising theme for corpus research. While some previous ecolinguistic research has used corpus approaches, there is a case to be made for quantitative methods that draw on larger datasets. Building on other corpus studies that have made connections between language use and environmental change, this paper investigates whether linguistic references to other species have changed in the past two centuries and, if so, how. The methodology consists of two main parts: an examination of the frequency of common names of species followed by aspect-level sentiment analysis of concordance lines. Results point to both opportunities and challenges associated with applying corpus methods to ecolinguistc research.

Download Full-text

THEORY OF SPEECH ACTS AND COMMUNICATIVE COMPETENCES: PRAGMATIC ANALYSIS OF THE ILLOCUTIVE ACT “EXPRESSION OF REFUSALS”

International journal of word art ◽

10.26739/2181-9297-2020-5-13 ◽

2020 ◽

Vol 5 (3) ◽

pp. 77-81

Author(s):

Sayyora Azimova ◽

Keyword(s):

Speech Acts ◽

English Language ◽

Speech Act ◽

Pragmatic Analysis ◽

Pragmatic Interpretation

This article is devoted to the pragmatic interpretation of the illocutionary action of the speech act “expression of refusals”. The article discusses different ways of reflecting cases of denial. This article was written not only for English language professionals, but also for use in aggressive conflicts and their pragmatic resolution, which naturally occur in the process of communication in all other languages

Download Full-text

THEORY OF SPEECH ACTS AND COMMUNICATIVE COMPETENCES: PRAGMATIC ANALYSIS OF THE ILLOCUTIVE ACT “EXPRESSION OF REFUSALS”

International journal of word art ◽

10.26739/2181-9297-2020-6-31 ◽

2020 ◽

Vol 6 (3) ◽

pp. 227-231

Author(s):

Sayyora Azimova ◽

Keyword(s):

Speech Acts ◽

English Language ◽

Speech Act ◽

Pragmatic Analysis ◽

Pragmatic Interpretation

This article is devoted to the pragmatic interpretation of the illocutionary action of the speech act“expression of refusals”. The article discusses different ways of reflecting cases of denial. This article was written not only for English language professionals, but also for use in aggressive conflicts and their pragmatic resolution, which naturally occur in the process of communication in all other languages

Download Full-text

First grammar books in the Habsburg Monarchy: individual initiative and regulatory interference by the state (1760s–1770s)

A day in the calendar. Celebrations and memorial days as an instrument of national consolidation in Central, Eastern and South-Eastern Europe from the nineteenth to the twenty-first century - Central-European Studies ◽

10.31168/2619-0877.2019.2.6 ◽

2020 ◽

Vol 2019 (2 (11)) ◽

pp. 137-157

Author(s):

Olga V. Khavanova ◽

Keyword(s):

Eighteenth Century ◽

State Policy ◽

Large Scale ◽

Language Use ◽

Linguistic Diversity ◽

Mother Tongue ◽

German Language ◽

Habsburg Monarchy ◽

Private Initiative ◽

The One

The second half of the eighteenth century in the lands under the sceptre of the House of Austria was a period of development of a language policy addressing the ethno-linguistic diversity of the monarchy’s subjects. On the one hand, the sphere of use of the German language was becoming wider, embracing more and more segments of administration, education, and culture. On the other hand, the authorities were perfectly aware of the fact that communication in the languages and vernaculars of the nationalities living in the Austrian Monarchy was one of the principal instruments of spreading decrees and announcements from the central and local authorities to the less-educated strata of the population. Consequently, a large-scale reform of primary education was launched, aimed at making the whole population literate, regardless of social status, nationality (mother tongue), or confession. In parallel with the centrally coordinated state policy of education and language-use, subjects-both language experts and amateur polyglots-joined the process of writing grammar books, which were intended to ease communication between the different nationalities of the Habsburg lands. This article considers some examples of such editions with primary attention given to the correlation between private initiative and governmental policies, mechanisms of verifying the textbooks to be published, their content, and their potential readers. This paper demonstrates that for grammar-book authors, it was very important to be integrated into the patronage networks at the court and in administrative bodies and stresses that the Vienna court controlled the process of selection and financing of grammar books to be published depending on their quality and ability to satisfy the aims and goals of state policy.

Download Full-text

Model and Method for Contributor’s Quality Assessment in Community Image Tagging Systems

Information and Control Systems ◽

10.31799/1684-8853-2018-4-45-51 ◽

2018 ◽

pp. 45-51

Author(s):

A. V. Ponomarev

Keyword(s):

Large Scale ◽

Wide Spectrum ◽

Preference Relation ◽

Pairwise Comparison ◽

Ground Truth ◽

Comparison Method ◽

Characteristic Matrix ◽

Image Tagging ◽

Proposed Model

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems).

Download Full-text