Linguistic levelling in Spanish: The analogical strong preterites

AbstractCertain Peninsular Spanish varieties have two third-person plural forms in the simple past indicative of verbs with ‘strong’ (stem-stressed) preterites. While this phenomenon is documented in large-scale linguistic atlas surveys, its current geographic distribution and diachronic origins remain under-studied. This paper sets out to: 1) establish the geographic distribution of these variants; the differing methodologies and epochs of the data sources make them particularly interesting to compare, showing that these analogical strong preterites have suffered a drastic decline over the last century; 2) use historical corpus data to show that the vernacular variant is by no means a recent phenomenon; 3) examine external history as a source of explanation in linguistic reconstruction, showing that this process of analogical levelling took place after the reconquest and resettlement of these regions. These findings support the hypothesis of a feature which spread over the centuries by linguistic diffusion.

Download Full-text

Syncretism of plural forms in Spanish Dialects

The Linguistic Review ◽

10.1515/tlr-2021-2066 ◽

2021 ◽

Vol 0 (0) ◽

Author(s):

María Mare

Keyword(s):

Syntactic Structure ◽

Person Plural ◽

Person Features ◽

Third Person ◽

Lexical Items ◽

First Person Plural ◽

The Moment ◽

The Third Person ◽

Shed Light ◽

Spanish Dialects

Abstract One of the main discussions about the interaction between morphology and syntax revolves around the richness or poverty of features and wherever this richness/poverty is found either in the syntactic structure or the lexical items. A phenomenon subject to this debate has been syncretism, especially in theories that assume late insertion such as Distributed Morphology. This paper delves into the syncretism observed between the first person plural and the third person in the clitic domain in some Spanish dialects. Our analysis will lead to a revision of the distribution of person features and their relationship with plural number, while at the same time it will shed light on other morphological alternations displayed in Spanish dialects; that is, subject-verb unagreement and mesoclisis in imperatives. In order to explain the behavior of the data under discussion, I propose that lexical items are specified for all the relevant features at the moment of insertion, although the values of these features can be neutralized. I argue that the distribution proposed allows for some fundamental generalizations about the vocabulary inventories in Spanish varieties, and shows that the variation pattern exhibits an *ABA effect, i.e., only contiguous cells in a paradigm are syncretic.

Download Full-text

A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research—An International Collaboration

Epidemiologia ◽

10.3390/epidemiologia2030024 ◽

2021 ◽

Vol 2 (3) ◽

pp. 315-324

Author(s):

Juan M. Banda ◽

Ramya Tekumalla ◽

Guanyu Wang ◽

Jingyuan Yu ◽

Tuo Liu ◽

...

Keyword(s):

Large Scale ◽

Social Dynamics ◽

Additional Data ◽

Open Data ◽

Data Sources ◽

Research Projects ◽

Research Groups ◽

The World ◽

Data Source

As the COVID-19 pandemic continues to spread worldwide, an unprecedented amount of open data is being generated for medical, genetics, and epidemiological research. The unparalleled rate at which many research groups around the world are releasing data and publications on the ongoing pandemic is allowing other scientists to learn from local experiences and data generated on the front lines of the COVID-19 pandemic. However, there is a need to integrate additional data sources that map and measure the role of social dynamics of such a unique worldwide event in biomedical, biological, and epidemiological analyses. For this purpose, we present a large-scale curated dataset of over 1.12 billion tweets, growing daily, related to COVID-19 chatter generated from 1 January 2020 to 27 June 2021 at the time of writing. This data source provides a freely available additional data source for researchers worldwide to conduct a wide and diverse number of research projects, such as epidemiological analyses, emotional and mental responses to social distancing measures, the identification of sources of misinformation, stratified measurement of sentiment towards the pandemic in near real time, among many others.

Download Full-text

EdgeDIPN

Proceedings of the VLDB Endowment ◽

10.14778/3430915.3430922 ◽

2020 ◽

Vol 14 (3) ◽

pp. 320-328

Author(s):

Long Guo ◽

Lifeng Hua ◽

Rongfei Jia ◽

Fei Fang ◽

Binqiang Zhao ◽

...

Keyword(s):

Real Time ◽

Large Scale ◽

Attention Mechanism ◽

Data Sources ◽

User Intent ◽

Multiple User ◽

Shopping Experience ◽

Data Source ◽

Intent Prediction ◽

The Right

With the rapid growth of e-commerce in recent years, e-commerce platforms are becoming a primary place for people to find, compare and ultimately purchase products. To improve online shopping experience for consumers and increase sales for sellers, it is important to understand user intent accurately and be notified of its change timely. In this way, the right information could be offered to the right person at the right time. To achieve this goal, we propose a unified deep intent prediction network, named EdgeDIPN, which is deployed at the edge, i.e., mobile device, and able to monitor multiple user intent with different granularity simultaneously in real-time. We propose to train EdgeDIPN with multi-task learning, by which EdgeDIPN can share representations between different tasks for better performance and saving edge resources in the meantime. In particular, we propose a novel task-specific attention mechanism which enables different tasks to pick out the most relevant features from different data sources. To extract the shared representations more effectively, we utilize two kinds of attention mechanisms, where the multi-level attention mechanism tries to identify the important actions within each data source and the inter-view attention mechanism learns the interactions between different data sources. In the experiments conducted on a large-scale industrial dataset, EdgeDIPN significantly outperforms the baseline solutions. Moreover, EdgeDIPN has been deployed in the operational system of Alibaba. Online A/B testing results in several business scenarios reveal the potential of monitoring user intent in real-time. To the best of our knowledge, EdgeDIPN is the first full-fledged real-time user intent understanding center deployed at the edge and serving hundreds of millions of users in a large-scale e-commerce platform.

Download Full-text

Contributions of precipitation and temperature to the large scale geographic distribution of fleshy-fruited plant species: Growth form matters

Scientific Reports ◽

10.1038/s41598-018-35436-x ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 2

Author(s):

Yuan Zhao ◽

Honglin Cao ◽

Wubing Xu ◽

Guoke Chen ◽

Juyu Lian ◽

...

Keyword(s):

Plant Species ◽

Geographic Distribution ◽

Large Scale ◽

Growth Form ◽

Precipitation And Temperature

Download Full-text

A comprehensive evaluation of cache utilization characteristics in large scale WSN considering network driven cache replacement techniques

MATEC Web of Conferences ◽

10.1051/matecconf/201818805004 ◽

2018 ◽

Vol 188 ◽

pp. 05004

Author(s):

Christos Panagiotou ◽

Christos Antonopoulos ◽

Stavros Koubias

Keyword(s):

Smart City ◽

Large Scale ◽

Network Performance ◽

Comprehensive Evaluation ◽

Resource Conservation ◽

Data Sources ◽

Data Cache ◽

Cache Replacement ◽

Large Scale Networks ◽

Result Analysis

WSNs as adopted in current smart city deployments, must address demanding traffic factors and resilience in failures. Furthermore, caching of data in WSN can significantly benefit resource conservation and network performance. However, data sources generate data volumes that could not fit in the restricted data cache resources of the caching nodes. This unavoidably leads to data items been evicted and replaced. This paper aims to experimentally evaluate the prominent caching techniques in large scale networks that resemble the Smart city paradigm regarding network performance with respect to critical application and network parameters. Through respective result analysis valuable insights are provided concerning the behaviour of caching in typical large scale WSN scenarios.

Download Full-text

On honorification in Czech: The Czech polite forms: Theory and corpus data

Juznoslovenski filolog ◽

10.2298/jfi0965101p ◽

2009 ◽

pp. 101-108

Author(s):

Jarmila Panevova

Keyword(s):

Person Plural ◽

Single Person ◽

Verb Forms ◽

Corpus Data ◽

Semantic Distinction

The author claims that the Czech polite forms (so-called 'vykani') for addressing the 2nd person should be understood as a legitimate part of the Czech conjugation paradigm. If we address a single person in a polite way some Czech analytical verb forms exhibit 'hybrid' agreement (auxiliaries are in plural, while participle form is in singular). However, the paradigm for singular and plural polite forms (addressing a single person, or two or more persons, respectively) is not symmetrical. The question, whether 2nd person plural polite forms are ambiguous (between the polite meaning and 2nd plural non-polite), or whether the semantic distinction 'polite - non-polite' is neutralized in plural, is open for further discussion. Some corpus data illustrating the contexts for the 2nd person polite forms are analyzed here too.

Download Full-text

Characterization and selection of Japanese electronic health record databases used as data sources for non-interventional observational studies

10.21203/rs.3.rs-184585/v1 ◽

2021 ◽

Author(s):

Yumi Wakabayashi ◽

Masamitsu Eitoku ◽

Narufumi Suganuma

Keyword(s):

Electronic Health Record ◽

Observational Studies ◽

Large Scale ◽

Data Sources ◽

Flow Diagram ◽

Health Record ◽

Medical Institutions ◽

Data Source ◽

Electronic Health ◽

Using Data

Abstract Background Interventional studies are the fundamental method for obtaining answers to clinical question. However, these studies are sometimes difficult to conduct because of insufficient financial or human resources or the rarity of the disease in question. One means of addressing these issues is to conduct a non-interventional observational study using electronic health record (EHR) databases as the data source, although how best to evaluate the suitability of an EHR database when planning a study remains to be clarified. The aim of the present study is to identify and characterize the data sources that have been used for conducting non-interventional observational studies in Japan and propose a flow diagram to help researchers determine the most appropriate EHR database for their study goals. Methods We compiled a list of published articles reporting observational studies conducted in Japan by searching PubMed for relevant articles published in the last 3 years and by searching database providers’ publication lists related to studies using their databases. For each article, we reviewed the abstract and/or full text to obtain information about data source, target disease or therapeutic area, number of patients, and study design (prospective or retrospective). We then characterized the identified EHR databases. Results In Japan, non-interventional observational studies have been mostly conducted using data stored locally at individual medical institutions (713/1463) or collected from several collaborating medical institutions (351/1463). Whereas the studies conducted with large-scale integrated databases (195/1463) were mostly retrospective (68.2%), 27.2% of the single-center studies, 46.2% of the multi-center studies, and 74.4% of the post-marketing surveillance studies, identified in the present study, were conducted prospectively. Conclusions Our analysis revealed that the non-interventional observational studies were conducted using data stored local at individual medical institutions or collected from collaborating medical institutions in Japan. Disease registries, disease databases, and large-scale databases would enable researchers to conduct studies with large sample sizes to provide robust data from which strong inferences could be drawn. Using our flow diagram, researchers planning non-interventional observational studies should consider the strengths and limitations of each available database and choose the most appropriate one for their study goals. Trial registration Not applicable.

Download Full-text

Regressive Vowel Harmony in Libyan Arabic

(Faculty of Arts Journal) مجلة كلية الآداب - جامعة مصراتة ◽

10.36602/faj.2019.n14.09 ◽

2019 ◽

pp. 36-50

Author(s):

Yousef Mokhtar Elramli ◽

Tareq Bashir Maiteq

Keyword(s):

Vowel Harmony ◽

Skeletal Structure ◽

Front Vowel ◽

Person Plural ◽

The Third ◽

Third Person ◽

The Third Person ◽

The City

The aim of this paper is to study Regressive vowel harmony induced by a suffixal back round vowel in the Libyan Arabic dialect spoken in the city of Misrata. The skeletal structure in the collected words is a /CVCVC-/ stem followed by the third person plural suffix /-u/. Consequently, the derived form of the examined words becomes /CVCVCV/. Following a rule of re-syllabification, the coda of the ultimate syllable in the stem becomes the onset of the newly formed syllable (ultimate in the derived form). Thus, in the presence of the suffix /-u/ in the derived form, all vowels in the word must harmonise with the [+round] feature of /-u/ unless there is a high front vowel /i/ intervening. In such cases, the high front vowel is defined as an opaque segment that is incompatible with the feature [+round]. Syllable and morpheme boundaries within words do not seem to contribute to blocking the regressive spreading of harmony. An autosegmental approach to analyze these words is adopted here. It is concluded that there are two sources in underlying representations for regressive vowel harmony in Libyan Arabic. One source is floating [+round] and another source is [+round].

Download Full-text

Community-curated and standardised metadata of published ancient metagenomic samples with AncientMetagenomeDir

10.1101/2020.09.02.279570 ◽

2020 ◽

Author(s):

James A. Fellows Yates ◽

Aida Andrades Valtueña ◽

Ashild J. Vågene ◽

Becky Cribdon ◽

Irina M. Velsko ◽

...

Keyword(s):

Large Scale ◽

Genetic Data ◽

Data Retrieval ◽

Data Sources ◽

Valuable Data ◽

Dna And Rna ◽

Wide Range ◽

Evolutionary Studies ◽

Meta Analyses ◽

Microbial Samples

ABSTRACTAncient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has risen dramatically in recent years, and tracking this data for reuse is particularly important for large-scale ecological and evolutionary studies of individual microbial taxa, microbial communities, and metagenomic assemblages. AncientMetagenomeDir (archived at https://doi.org/10.5281/zenodo.3980833) is a collection of indices of published genetic data deriving from ancient microbial samples that provides basic, standardised metadata and accession numbers to allow rapid data retrieval from online repositories. These collections are community-curated and span multiple sub-disciplines in order to ensure adequate breadth and consensus in metadata definitions, as well as longevity of the database. Internal guidelines and automated checks to facilitate compatibility with established sequence-read archives and term-ontologies ensure consistency and interoperability for future meta-analyses. This collection will also assist in standardising metadata reporting for future ancient metagenomic studies.

Download Full-text

Dimensionality Reduction With Multi-Fold Deep Denoising Autoencoder

Advances in Systems Analysis, Software Engineering, and High Performance Computing - Deep Learning Techniques and Optimization Strategies in Big Data Analytics ◽

10.4018/978-1-7998-1192-3.ch010 ◽

2020 ◽

pp. 154-165

Author(s):

Pattabiraman V. ◽

Parvathi R.

Keyword(s):

Large Scale ◽

Curse Of Dimensionality ◽

Data Sources ◽

Sensor Data ◽

Future Research ◽

Computational Power ◽

Nonlinear Methods ◽

Research Areas ◽

Learning Techniques ◽

Natural Data

Natural data erupting directly out of various data sources, such as text, image, video, audio, and sensor data, comes with an inherent property of having very large dimensions or features of the data. While these features add richness and perspectives to the data, due to sparsity associated with them, it adds to the computational complexity while learning, unable to visualize and interpret them, thus requiring large scale computational power to make insights out of it. This is famously called “curse of dimensionality.” This chapter discusses the methods by which curse of dimensionality is cured using conventional methods and analyzes its performance for given complex datasets. It also discusses the advantages of nonlinear methods over linear methods and neural networks, which could be a better approach when compared to other nonlinear methods. It also discusses future research areas such as application of deep learning techniques, which can be applied as a cure for this curse.

Download Full-text