scholarly journals Large-scale study of speech acts' development using automatic labelling

2021 ◽  
Author(s):  
Mitja Nikolaus ◽  
Juliette Maes ◽  
Jeremy Auguste ◽  
Laurent Prévot ◽  
Abdellah Fourtassi

Studies of children's language use in the wild (e.g., in the context of child-caregiver social interaction) have been slowed by the time- and resource- consuming task of hand annotating utterances for communicative intents/speech acts. Existing studies have typically focused on investigating rather small samples of children, raising the question of how their findings generalize both to larger and more representative populations and to a richer set of interaction contexts. Here we propose a simple automatic model for speech act labeling in early childhood based on the INCA-A coding scheme (Ninio et al., 1994). After validating the model against ground truth labels, we automatically annotated the entire English-language data from the CHILDES corpus. The major theoretical result was that earlier findings generalize quite well at a large scale. Our model will be shared with the community so that researchers can use it with their data to investigate various questions related to language use development.

2021 ◽  
Author(s):  
Mitja Nikolaus ◽  
Eliot Maes ◽  
Jeremy Auguste ◽  
Laurent Prévot ◽  
Abdellah Fourtassi

Studies of children's language use in the wild (e.g., in the context of child-caregiver social interaction) have been slowed by the time- and resource- consuming task of hand annotating utterances for communicative intents/speech acts. Existing studies have typically focused on investigating rather small samples of children, raising the question of how their findings generalize both to larger and more representative populations and to a richer set of interaction contexts. Here we propose a simple automatic model for speech act labeling in early childhood based on the INCA-A coding scheme (Ninio, Snow, Pan, & Rollins, 1994). After validating the model against ground truth labels, we automatically annotated the entire English-language data from the CHILDES corpus. The major theoretical result was that earlier findings generalize quite well at a large scale. Further, we introduced two complementary measures for the age of acquisition of speech acts which allows us to rank different speech acts according to their order of emergence in production and comprehension.Our model will be shared with the community so that researchers can use it with their data to investigate various question related to language use both in typical and atypical populations of children.


Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1832
Author(s):  
Tomasz Hachaj ◽  
Patryk Mazurek

Deep learning-based feature extraction methods and transfer learning have become common approaches in the field of pattern recognition. Deep convolutional neural networks trained using tripled-based loss functions allow for the generation of face embeddings, which can be directly applied to face verification and clustering. Knowledge about the ground truth of face identities might improve the effectiveness of the final classification algorithm; however, it is also possible to use ground truth clusters previously discovered using an unsupervised approach. The aim of this paper is to evaluate the potential improvement of classification results of state-of-the-art supervised classification methods trained with and without ground truth knowledge. In this study, we use two sufficiently large data sets containing more than 200,000 “taken in the wild” images, each with various resolutions, visual quality, and face poses which, in our opinion, guarantee the statistical significance of the results. We examine several clustering and supervised pattern recognition algorithms and find that knowledge about the ground truth has a very small influence on the Fowlkes–Mallows score (FMS) of the classification algorithm. In the case of the classification algorithm that obtained the highest accuracy in our experiment, the FMS improved by only 5.3% (from 0.749 to 0.791) in the first data set and by 6.6% (from 0.652 to 0.718) in the second data set. Our results show that, beside highly secure systems in which face verification is a key component, face identities discovered by unsupervised approaches can be safely used for training supervised classifiers. We also found that the Silhouette Coefficient (SC) of unsupervised clustering is positively correlated with the Adjusted Rand Index, V-measure score, and Fowlkes–Mallows score and, so, we can use the SC as an indicator of clustering performance when the ground truth of face identities is not known. All of these conclusions are important findings for large-scale face verification problems. The reason for this is the fact that skipping the verification of people’s identities before supervised training saves a lot of time and resources.


2019 ◽  
Vol 1 (1) ◽  
pp. 42-75 ◽  
Author(s):  
Douglas Biber

Abstract Douglas Biber, Regents’ Professor of Applied Linguistics at Northern Arizona University, authors this article exploring the connections between register and a text-linguistic approach to language variation. He has spent the last 30 years pursuing a research program that explores the inherent link between register and language use, including at the phraseological, grammatical, and lexico-grammatical levels. His seminal book Variation across Speech and Writing (1988, Cambridge University Press) launched multi-dimensional (MD) analysis, a comprehensive framework and methodology for the large-scale study of register variation. This approach was innovative in taking a text-linguistic approach to characterize language use across situations of use through the quantitative and functional analysis of linguistic co-occurrence patterns and underlying dimensions of language use. MD analysis is now used widely to study register variation over time, in general and specialized registers, in learner language, and across a range of languages. In 1999, the Longman Grammar of Spoken and Written English (Biber et al.) became the first comprehensive descriptive reference book to systematically consider register variation in describing the grammatical and lexico-grammatical patterns of use in English. Douglas Biber’s quantitative linguistic research has consistently demonstrated the importance of register as a predictor of language variation. In his own words, “register always matters” (Gray 2013: 360, Interview with Douglas Biber, English Language & Linguistics).


2020 ◽  
Vol 11 (5) ◽  
pp. 841
Author(s):  
Raifu O. Farinde ◽  
Wasiu A. Oyedokun-Alli

The main goal of language teaching is that at the end of the period of learning, the learners should be able to communicate in that language effectively. The main source of language is language use. The students must therefore be given plenty of opportunity to use the language. This is where the principles of pragmatics come into language teaching. Pragmatics provides ample opportunities for the students to learn English language communicatively and practically. In this study, I shall focus particularly on the application of pragmatics to language teaching with emphasis on Gricean pragmatics and Searle’s speech acts. The question of why pragmatics should be assigned a more prominent place in language teaching syllabus is also sufficiently and adequately addressed.


2019 ◽  
Vol 30 (3) ◽  
pp. 516-538
Author(s):  
Lénia Marques ◽  
Nigel Williams

This article investigates the similarities and differences for tangible and intangible elements (factors and language use) contributing to placemaking in Airbnb English language reviews in Paris (59,057 reviews), Barcelona (19,291 reviews) and London (30,403 reviews). This paper contributes to provide new insights on the narrative construction of reputational capital which is connected to placemaking strategies. A combined quantitative approach using large scale text analysis enabled the analysis of review content and style. Patterns in the words usage were identified. Findings suggest that tangible and intangible elements work together in the discourse, contributing to the place-narrative built on the host’s reputational capital. The host-guest interaction is the main aspect of the reviews, followed by the importance of transport and local amenities. Cities have different profiles in the composition of the word clusters which indicates differences in the guests’ perceived experience.


Corpora ◽  
2019 ◽  
Vol 14 (3) ◽  
pp. 327-349
Author(s):  
Craig Frayne

This study uses the two largest available American English language corpora, Google Books and the Corpus of Historical American English (coha), to investigate relations between ecology and language. The paper introduces ecolinguistics as a promising theme for corpus research. While some previous ecolinguistic research has used corpus approaches, there is a case to be made for quantitative methods that draw on larger datasets. Building on other corpus studies that have made connections between language use and environmental change, this paper investigates whether linguistic references to other species have changed in the past two centuries and, if so, how. The methodology consists of two main parts: an examination of the frequency of common names of species followed by aspect-level sentiment analysis of concordance lines. Results point to both opportunities and challenges associated with applying corpus methods to ecolinguistc research.


2020 ◽  
Vol 5 (3) ◽  
pp. 77-81
Author(s):  
Sayyora Azimova ◽  

This article is devoted to the pragmatic interpretation of the illocutionary action of the speech act “expression of refusals”. The article discusses different ways of reflecting cases of denial. This article was written not only for English language professionals, but also for use in aggressive conflicts and their pragmatic resolution, which naturally occur in the process of communication in all other languages


2020 ◽  
Vol 6 (3) ◽  
pp. 227-231
Author(s):  
Sayyora Azimova ◽  

This article is devoted to the pragmatic interpretation of the illocutionary action of the speech act“expression of refusals”. The article discusses different ways of reflecting cases of denial. This article was written not only for English language professionals, but also for use in aggressive conflicts and their pragmatic resolution, which naturally occur in the process of communication in all other languages


Author(s):  
Olga V. Khavanova ◽  

The second half of the eighteenth century in the lands under the sceptre of the House of Austria was a period of development of a language policy addressing the ethno-linguistic diversity of the monarchy’s subjects. On the one hand, the sphere of use of the German language was becoming wider, embracing more and more segments of administration, education, and culture. On the other hand, the authorities were perfectly aware of the fact that communication in the languages and vernaculars of the nationalities living in the Austrian Monarchy was one of the principal instruments of spreading decrees and announcements from the central and local authorities to the less-educated strata of the population. Consequently, a large-scale reform of primary education was launched, aimed at making the whole population literate, regardless of social status, nationality (mother tongue), or confession. In parallel with the centrally coordinated state policy of education and language-use, subjects-both language experts and amateur polyglots-joined the process of writing grammar books, which were intended to ease communication between the different nationalities of the Habsburg lands. This article considers some examples of such editions with primary attention given to the correlation between private initiative and governmental policies, mechanisms of verifying the textbooks to be published, their content, and their potential readers. This paper demonstrates that for grammar-book authors, it was very important to be integrated into the patronage networks at the court and in administrative bodies and stresses that the Vienna court controlled the process of selection and financing of grammar books to be published depending on their quality and ability to satisfy the aims and goals of state policy.


Author(s):  
A. V. Ponomarev

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems). 


Sign in / Sign up

Export Citation Format

Share Document