scholarly journals Are corpus-based predictions mirrored in the preferential choices and ratings of native speakers? Predicting the alternation between the Estonian adessive case and the adposition peal ‘on’

Author(s):  
Jane Klavan ◽  
Ann Veismann

Recent work in usage-based linguistics stresses the importance of combining corpus-based analyses with experimental studies. A number of studies have compared the performance of a corpus-based statistical model against the behaviour of native speakers in a linguistic experiment. The present paper takes this line of analysis further by combining corpus-based work with two sources of experimental data. A mixedeffects logistic regression model is fitted to the corpus data of the Estonian adessive case and the adposition peal ‘on’ in present-day written Estonian. In order to evaluate the goodness of the corpus-based model, its performance is compared to the behaviour of native speakers in a forced choice task and a rating task.Kokkuvõte. Jane Klavan ja Ann Veismann: Kas keelekasutajate valikud ja hinnangud peegeldavad korpuspõhiseid tõenäosushinnanguid? Eesti keele adessiivi ja kaassõna peal kasutus tänapäeva kirjakeeles. Tänapäevases kasutuspõhises keeleteaduses rõhutatakse vajadust kombineerida korpusandmetele toetuvat analüüsi katseliste uuringutega. Mitmed uurimused on võrrelnud korpusel põhineva statistilise mudeli headust emakeelsete kõnelejate käitumisega keelelistes katsetes. Käesolev artikkel jätkab seda uurimisliini, pannes võrdlusesse korpusandmetega kaks keelelist katset. Artiklis hinnatakse korpuspõhise segamudeli headust, võrreldes seda eesti keelt emakeelena kõnelejate käitumisega sunnitud valiku katses ja hinnangukatses. Uuritavaks nähtuseks on eesti keele adessiivi ja kaassõna peal rööpne kasutus kohasuhete väljendamisel tänapäeva kirjakeeles.Võtmesõnad: konstruktsioonilised alternatsioonid; korpuslingvistika; sunnitud valiku katse; hinnangukatse; statistiline mudeldamine; eesti keel

2021 ◽  
Author(s):  
Katrin Nissen ◽  
Stefan Rupp ◽  
Björn Guse ◽  
Uwe Ulbrich ◽  
Sergiy Vorogushyn ◽  
...  

<p>In this study we present the results of a logistic regression model aimed at describing changes in probabilities for rockfall events in Germany in response to changes in meteorological and hydrological conditions.</p><p>The rockfall events for this study are taken from the landslide database for Germany (Damm and Klose, 2015). The meteorological variables we tested as predictors for the logistic regression model are daily precipitation from the REGNIE data set (Rauthe et al. 2013), hourly precipitation from the RADKLIM radar climatology (Winterrath et al., 2018) and temperature from the E-OBS data set (Cornes et al., 2018). As there is no observational soil moisture data set covering the entire country, we used soil moisture modelled with the state-of-the-art hydrological model mHM (Samaniego et al. 2010), which was calibrated using gauge measurements.</p><p>In order to select the best statistical model we tested a large number of physically plausible combinations of meteorological and hydrological predictors. Each model was checked using cross-validation. The decision on the final model was based on the value of the logarithmic skill score and on expert judgement.</p><p>The final statistical model includes the local percentile of daily precipitation, total relative soil moisture and freeze-thawing cycles in the previous weeks as predictors. It was found that daily precipitation is the most important parameter in the model. An increase of daily precipitation from its median to its 80th percentile approximately doubles the probability for a rockfall event. Higher soil moisture and the occurrence of freeze-thaw cycles also increase the probability for rockfall events. </p><p><br>Cornes, R. C. et al., 2018: An ensemble version of the E‐OBS temperature and precipitation data sets. Journal of Geophysical Research: Atmospheres, 123, 9391– 9409.</p><p>Damm, B., Klose, M., 2015. The landslide database for Germany: Closing the gap at national level. Geomorphology 249, 82–93</p><p>Rauthe, M. et al., 2013: A Central European precipitation climatology – Part I: Generation and validation of a high-reso-lution gridded daily data set (HYRAS), Vol. 22(3), p 235–256.</p><p>Samaniego, L. et al., 2010: Multiscale parameter regionalization of a grid-based hydrologic model at the mesoscale. Water Resour. Res., 46,W05523</p><p>Winterrath, T. et al., 2018: RADKLIM Version 2017.002: Reprocessed gauge-adjusted radar data, one-hour precipitation sums (RW), DOI: 10.5676/DWD/RADKLIM_RW_V2017.002.</p>


2014 ◽  
Vol 6 (2) ◽  
pp. 271-299 ◽  
Author(s):  
STEFANIE WULFF ◽  
NICHOLAS LESTER ◽  
MARIA T. MARTINEZ-GARCIA

abstractIn certain English finite complement clauses, inclusion of the complementizer that is optional. Previous research has identified various factors that influence when native speakers tend to produce or omit the complementizer, including syntactic weight, clause juncture constraints, and predicate frequency. The present study addresses the question to what extent German and Spanish learners of English as a second language (L2) produce and omit the complementizer under similar conditions. 3,622 instances of English adjectival, object, and subject complement constructions were retrieved from the International Corpus of English and the German and Spanish components of the International Corpus of Learner English. A logistic regression model suggests that L2 learners’ and natives’ production is largely governed by the same factors. However, in comparison with native speakers, L2 learners display a lower rate of complementizer omission. They are more impacted by processing-related factors such as complexity and clause juncture, and less sensitive to verb-construction cue validity.


2013 ◽  
Vol 18 (3) ◽  
pp. 327-356 ◽  
Author(s):  
Stefan Th. Gries ◽  
Stefanie Wulff

This paper exemplifies an approach to learner corpus data that adopts a multifactorial definition of ‘context’. We apply a logistic regression to 2,986 attestations of genitive alternation (the squirrel’s nest vs. the nest of the squirrel) from the Chinese and German sub-sections of the International Corpus of Learner English and the British component of the International Corpus of English that were coded for 12 factors. Importantly, the speakers’ L1 was included as a predictor to be able to compare properly the native speakers with the learners as well as the two learner groups with each other. The final regression model predicts all speakers’ genitive choices very accurately (> 93%) and suggests that (i) the learners rely heavily on processing-related factors, which can be overridden by semantic constraints, and (ii) learners’ choices are differentially modulated by their L1. We close with a discussion of how this context-based, multifactorial approach goes beyond traditional learner corpus research.


2017 ◽  
Vol 6 (3) ◽  
pp. 132 ◽  
Author(s):  
Idelphonse Leandre Tawanou Gbohounme ◽  
Oscar Owino Ngesa ◽  
Jude Eggoh

Logistic regression model is the most common model used for the analysis of binary data. However, the problem of atypical observations in the data has an unduly effect on the parameter estimates. Many researchers have developed robust statistical model to solve this problem of outliers. Gelman (2004) proposed GRLR, a robust  model by trimming the probability of success in LR. The trimming values in this model were fixed and the user is required to specify this value well in advance. In particular this study developed SsRLR model by allowing the data itself to select the alpha value. We proposed a Restricted LR model to substitute the LR in presence of outliers. We proved that the SsRLR model is the more robust to the presence of leverage points in the data. Parameter estimations is done using a full Bayesian approach implemented in WinBUGS 14 software.


2020 ◽  
Vol 30 (Supplement_5) ◽  
Author(s):  
J Matos ◽  
C Matias Dias ◽  
A Félix

Abstract Background Studies on the impact of patients with multimorbidity in the absence of work indicate that the number and type of chronic diseases may increase absenteeism and that the risk of absence from work is higher in people with two or more chronic diseases. This study analyzed the association between multimorbidity and greater frequency and duration of work absence in the portuguese population between the ages of 25 and 65 during 2015. Methods This is an epidemiological, observational, cross-sectional study with an analytical component that has its source of information from the 1st National Health Examination Survey. The study analyzed univariate, bivariate and multivariate variables under study. A multivariate logistic regression model was constructed. Results The prevalence of absenteeism was 55,1%. Education showed an association with absence of work (p = 0,0157), as well as professional activity (p = 0,0086). It wasn't possible to verify association between the presence of chronic diseases (p = 0,9358) or the presence of multimorbidity (p = 0,4309) with absence of work. The prevalence of multimorbidity was 31,8%. There was association between age (p < 0,0001), education (p < 0,001) and yield (p = 0,0009) and multimorbidity. There is no increase in the number of days of absence from work due to the increase in the number of chronic diseases. In the optimized logistic regression model the only variables that demonstrated association with the variable labor absence were age (p = 0,0391) and education (0,0089). Conclusions The scientific evidence generated will contribute to the current discussion on the need for the health and social security system to develop policies to patients with multimorbidity. Key messages The prevalence of absenteeism and multimorbidity in Portugal was respectively 55,1% and 31,8%. In the optimized model age and education demonstrated association with the variable labor absence.


Sign in / Sign up

Export Citation Format

Share Document