scholarly journals Tuning to Real-Life Language Statistics: Online Processing of Multiword Sequences in Native and Non-Native Speakers Across Language Registers

2020 ◽  
Author(s):  
Elma Kerz ◽  
Daniel Wiechmann ◽  
Felicity Frinsel ◽  
Morten H. Christiansen

A large body of research over the past two decades has demonstrated that children and adults are equipped with statistical learning mechanisms that facilitate their language processing and boost their acquisition. However, this research has been conducted primarily using artificial languages that are highly simplified relative to real language input. Here, we aimed to determine to what extent adult native and non-native speakers show sensitivity to real-life language statistics obtained from large-scale analyses of authentic language use. Through a within-subject design, we conducted a series of behavioral experiments geared towards assessing the sensitivity to two types of distributional statistics (frequency and entropy) during online processing of multiword sequences across four registers of English (spoken, fiction, news and academic language). Our results show that both native and non-native speakers are able to `tune to' multiple distributional statistics inherent in different types of real language input.

2021 ◽  
Vol 55 (1) ◽  
pp. 1-2
Author(s):  
Bhaskar Mitra

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents---or short passages---in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms---such as a person's name or a product model number---not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections---such as the document index of a commercial Web search engine---containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks. We ground our contributions with a detailed survey of the growing body of neural IR literature [Mitra and Craswell, 2018]. Our key contribution towards improving the effectiveness of deep ranking models is developing the Duet principle [Mitra et al., 2017] which emphasizes the importance of incorporating evidence based on both patterns of exact term matches and similarities between learned latent representations of query and document. To efficiently retrieve from large collections, we develop a framework to incorporate query term independence [Mitra et al., 2019] into any arbitrary deep model that enables large-scale precomputation and the use of inverted index for fast retrieval. In the context of stochastic ranking, we further develop optimization strategies for exposure-based objectives [Diaz et al., 2020]. Finally, this dissertation also summarizes our contributions towards benchmarking neural IR models in the presence of large training datasets [Craswell et al., 2019] and explores the application of neural methods to other IR tasks, such as query auto-completion.


2015 ◽  
Vol 38 (1) ◽  
Author(s):  
Huili Wang ◽  
Liwen Ma ◽  
Youyou Wang ◽  
Melissa Troyer ◽  
Qiang Li

AbstractThe processing of relative clauses receives much concern from linguists. The finding that object relatives are easier to process than subject relatives in Chinese challenges the notion that subject relative clauses are preferred universally. A large body of literature provides theories related to sentence processing mechanisms for native speakers but leaves one area relatively untouched: how bilinguals process sentences. This study is designed to examine the case where the individuals with a Chinese L1 language background process subject-extracted subject relative clauses (SS) and subject-extracted object relative clauses (SO) by using eventrelated potentials (ERPs) to probe into the real-time language processing and presents a direct manifestation of brain activity. The findings from this study support the subject relative clause preference due to the strong influence of English relative clause markedness and bilinguals’ relative lower working memory capacity.


2020 ◽  
Author(s):  
Theresa Redl ◽  
Stefan L. Frank ◽  
Peter de Swart ◽  
Helen de Hoop

A self-paced reading experiment tested if a generically-used masculine personal pronoun leads to a male bias in online processing. We presented Dutch native speakers (N=95, 47 male) with generic statements featuring the masculine pronoun hij ‘he’ (e.g., Someone who always promises that he will really be on time, such as Ms/Mr Knoop, will sometimes be late anyway). We further presented participants with control items expressing the same meaning, but without the pronoun. Reading times were significantly higher when a female individual was given as an example (i.e., Ms Knoop in the example above) following the masculine generic pronoun hij ‘he’, but not in the control condition. This effect did not interact with participant gender. This shows that the generically-intended masculine personal pronoun leads to a male bias in online processing for male as well as female participants. Masculine personal pronouns are still commonly used for generic reference in many languages such as Dutch, but the results of this experiment refute the notion that a pronoun such as hij ‘he’ can be readily processed as gender-neutral.


2021 ◽  
Vol 12 ◽  
Author(s):  
Agnieszka Otwinowska ◽  
Marta Marecka ◽  
Alba Casado ◽  
Joanna Durlik ◽  
Jakub Szewczyk ◽  
...  

Multi-word expressions (MWEs) are fixed, conventional phrases often used by native speakers of a given language (L1). The type of MWEs investigated in this study were collocations. For bilinguals who have intensive contact with the second language (L2), collocational patterns can be transferred from the L2 to the L1 as a result of cross-linguistic influence (CLI). For example, bilingual migrants can accept collocations from their L2 translated to their L1 as correct. In this study, we asked whether such CLI is possible in native speakers living in the L1 environment and whether it depends on their L2 English proficiency. To this end, we created three lists of expressions in Polish: (1) well-formed Polish verb-noun collocations (e.g., ma sens – ∗has sense), (2) collocational calques from English (loan translations), where the English verb was replaced by a Polish translation equivalent (e.g., ∗robi sens – makes sense), and, as a reference (3) absurd verb-noun expression, where the verb did not collocate with the noun (e.g., ∗zjada sens – ∗eats sense). We embedded the three types of collocations in sentences and presented them to L1 Polish participants of varying L2 English proficiency in two experiments. We investigated whether L2 calques would (1) be explicitly judged as non-native in the L1; (2) whether they would evoke differential brain response than native L1 Polish equivalents in the event-related potentials (ERPs). We also explored whether the sensitivity to CLI in calques depended on participants’ level of proficiency in L2 English. The results indicated that native speakers of Polish assessed the calques from English as less acceptable than the correct Polish collocations. Still, there was no difference in online processing of correct and calques collocations as measured by the ERPs. This suggests a dissociation between explicit offline judgments and indices of online language processing. Interestingly, English L2 proficiency did not modulate these effects. The results indicate that the influence of English on Polish is so pervasive that collocational calques from this language are likely to become accepted and used by Poles.


2020 ◽  
Author(s):  
Fritz Guenther ◽  
Luca Rinaldi

Large-scale linguistic data is nowadays available in abundance. Here, we demonstrate that the surface-level statistical structure of language alone opens a window into how our brain represents the world. To this end, we examine the statistical occurrence of words referring to body parts in very different languages, covering nearly 4 billions of native speakers. Our findings indicate that the human body as extracted from language resembles the distorted human-like figure known as the sensory homunculus, whose form depicts the amount of cortical area dedicated to somatosensory functions of each body part. This links the way conceptual knowledge is represented and communicated in language to how the brain processes information from the sensory systems.


2013 ◽  
Vol 4 (1) ◽  
Author(s):  
Georgie Columbus

AbstractMultiword units (MWUs) are word combinations which sit within the continuum of formulaic language. Many experimental studies have focused on the online processing of MWUs by native and non-native speakers, and the processing of idioms in particular. However, some studies use a mix of various MWU subtypes, while other studies have varying definitions for the same subtypes. For results from MWU studies to be useful to theories of language processing, storage and access, clearer classifications are needed for MWU subtypes. This study aims to empirically validate MWU categories as described by certain phraseologists in the European tradition. This will be done using MWUs from the British National Corpus, from across the continuum of frequent to infrequent occurrence and co-occurrence. Hence, in this paper I will describe the empirical findings that may validate the classifications for MWU categories of restricted collocations, idioms, and lexical bundles, using corpus-based measures and human ratings.


Interpreting ◽  
2017 ◽  
Vol 19 (1) ◽  
pp. 1-20 ◽  
Author(s):  
Ena Hodzik ◽  
John N. Williams

We report a study on prediction in shadowing and simultaneous interpreting (SI), both considered as forms of real-time, ‘online’ spoken language processing. The study comprised two experiments, focusing on: (i) shadowing of German head-final sentences by 20 advanced students of German, all native speakers of English; (ii) SI of the same sentences into English head-initial sentences by 22 advanced students of German, again native English speakers, and also by 11 trainee and practising interpreters. Latency times for input and production of the target verbs were measured. Drawing on studies of prediction in English-language reading production, we examined two cues to prediction in both experiments: contextual constraints (semantic cues in the context) and transitional probability (the statistical likelihood of words occurring together in the language concerned). While context affected prediction during both shadowing and SI, transitional probability appeared to favour prediction during shadowing but not during SI. This suggests that the two cues operate on different levels of language processing in SI.


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


Author(s):  
Gianluca Bardaro ◽  
Alessio Antonini ◽  
Enrico Motta

AbstractOver the last two decades, several deployments of robots for in-house assistance of older adults have been trialled. However, these solutions are mostly prototypes and remain unused in real-life scenarios. In this work, we review the historical and current landscape of the field, to try and understand why robots have yet to succeed as personal assistants in daily life. Our analysis focuses on two complementary aspects: the capabilities of the physical platform and the logic of the deployment. The former analysis shows regularities in hardware configurations and functionalities, leading to the definition of a set of six application-level capabilities (exploration, identification, remote control, communication, manipulation, and digital situatedness). The latter focuses on the impact of robots on the daily life of users and categorises the deployment of robots for healthcare interventions using three types of services: support, mitigation, and response. Our investigation reveals that the value of healthcare interventions is limited by a stagnation of functionalities and a disconnection between the robotic platform and the design of the intervention. To address this issue, we propose a novel co-design toolkit, which uses an ecological framework for robot interventions in the healthcare domain. Our approach connects robot capabilities with known geriatric factors, to create a holistic view encompassing both the physical platform and the logic of the deployment. As a case study-based validation, we discuss the use of the toolkit in the pre-design of the robotic platform for an pilot intervention, part of the EU large-scale pilot of the EU H2020 GATEKEEPER project.


Technologies ◽  
2020 ◽  
Vol 9 (1) ◽  
pp. 2
Author(s):  
Ashish Jaiswal ◽  
Ashwin Ramesh Babu ◽  
Mohammad Zaki Zadeh ◽  
Debapriya Banerjee ◽  
Fillia Makedon

Self-supervised learning has gained popularity because of its ability to avoid the cost of annotating large-scale datasets. It is capable of adopting self-defined pseudolabels as supervision and use the learned representations for several downstream tasks. Specifically, contrastive learning has recently become a dominant component in self-supervised learning for computer vision, natural language processing (NLP), and other domains. It aims at embedding augmented versions of the same sample close to each other while trying to push away embeddings from different samples. This paper provides an extensive review of self-supervised methods that follow the contrastive approach. The work explains commonly used pretext tasks in a contrastive learning setup, followed by different architectures that have been proposed so far. Next, we present a performance comparison of different methods for multiple downstream tasks such as image classification, object detection, and action recognition. Finally, we conclude with the limitations of the current methods and the need for further techniques and future directions to make meaningful progress.


Sign in / Sign up

Export Citation Format

Share Document