Topic marking in a Shanghainese corpus: from observation to prediction

2017 ◽  
Vol 13 (2) ◽  
pp. 291-319 ◽  
Author(s):  
Weifeng Han ◽  
Antti Arppe ◽  
John Newman

AbstractShanghainese is an extremely topic-prominent language with many topic markers in competition with one another, often without any obvious basis for the selection of one topic marker over another. We explore the influence of five variables on the five most frequent topic markers in a corpus of (spoken) Shanghainese: topic length, syntactic category of the topic, function of the topic, comment type, and genre. We carry out a multivariate statistical analysis of the data, relying on a polytomous logistic regression model. Our approach leads to a satisfying quantification of the role of each factor, as well as an estimate of the probabilities of combinations of factors, in influencing the choice of topic marker. This study serves simultaneously as an introduction to thepolytomouspackage (Arppe 2013) in the statistical software package R.

2000 ◽  
Vol 22 (2) ◽  
pp. 209-228 ◽  
Author(s):  
John C. Paolillo

Felix (1988) claimed to demonstrate that UG-based knowledge of grammaticality causes nonnative speakers (NNSs) to have more accurate grammaticality judgments on sentences that are ungrammatical according to UG than on those that are grammatical. Birdsong (1994) criticized the methodology employed, noting that it ignores “response bias” (a propensity to judge sentences as ungrammatical) as a potential explanation. Felix and Zobl (1994) dismissed this criticism as merely methodological. In this paper, Birdsong's criticism is upheld by considering a statistical model of the data. At the same time, a more complete logistic regression model allows a fuller statistical analysis, revealing tentative support for the asymmetry claim, as well as differential learning states for different constructions and a tendency toward transfer avoidance. These theoretically significant effects were unnoticed in the earlier discussion of this research. For SLA research on grammaticality judgments to proceed fruitfully, appropriate statistical models need to be considered in designing the research.


2016 ◽  
Vol 27 (1) ◽  
pp. 1-33 ◽  
Author(s):  
Dagmar Divjak ◽  
Ewa Dąbrowska ◽  
Antti Arppe

AbstractLinguistic convention typically allows speakers several options. Evidence is accumulating that the various options are preferred in different contexts, yet the criteria governing the selection of the appropriate form are often far from obvious. Most researchers who attempt to discover the factors determining a preference rely on the linguistic analysis and statistical modeling of data extracted from large corpora. In this paper, we address the question of how to evaluate such models and explicitly compare the performance of a statistical model derived from a corpus with that of native speakers in selecting one of six Russian TRY verbs. Building on earlier work we trained a polytomous logistic regression model to predict verb choice given the sentential context. We compare the predictions the model makes for 60 unseen sentences to the choices adult native speakers make in those same sentences. We then look in more detail at the interplay of the contextual properties and model computationally how individual differences in assessing the importance of contextual properties may impact the linguistic knowledge of native speakers. Finally, we compare the probability the model assigns to encountering each of the six verbs in the 60 test sentences to the acceptability ratings the adult native speakers give to those sentences. We discuss the implications of our findings for both usage-based theory and empirical linguistic methodology.


2010 ◽  
pp. 144-170
Author(s):  
Sean Eom

The previous two chapters examined the two alternative approaches of retrieving cocitation counts using custom databases and cocitation frequency counts extraction systems. The cocitaion frequency counts are the inputs to the SAS or SPSS systems for multivariate statistical analysis. The primary purpose of this chapter is to overview several important steps in author cocitation analysis. ACA consists of the six major steps beginning with the selection of author sets for further analysis, then collection of cocitation frequency counts, statistical analysis of the cocitation frequency counts, and the validation and interpretation of statistical outputs.


Author(s):  
Hilary I Okagbue ◽  
Sheila A Bishop ◽  
Anjoreoluwa E Boluwajoko ◽  
Adaeze M Ezenkwe ◽  
Glory N Anene ◽  
...  

<p class="0abstract">Effective study plan is a predictor of good academic performance. However, there are few evidences available on the role of gender and age in the study plan for students. This paper investigated the role of gender and age in the adoption of study plan that can guarantee success. A questionnaire was designed and administered to undergraduate students of a world class privately funded university located in Ogun State, Nigeria. Simple random sampling was used and 294 students responded. Chi-square test of independence revealed that gender and age are not associated with frequency of study, study environment, study content preferences and study motivation. There is no Gender difference in the preference of study type, factors that drive, motivation for study and satisfaction with the study plan whereas, age is significantly associated. The logistic regression model was significant and correctly classified 66.3% of satisfaction with the study plan. Gender was not significant and age of students can predict their satisfaction with their study plan. Older students have more odds to be satisfied with their study plan. As students progressed from year one to the final year, they tend to adopt a study plan that can help them obtain high grades and graduate with good result. Artificial Neural Network correctly classified 71.4% of satisfaction using only age as the only factor because, only age contributed significantly to the logistic regression model. Timely academic advising or mentorship is advocated especially for freshers.</p>


2021 ◽  
Vol 68 (3) ◽  
pp. 773-788
Author(s):  
Isidora Ljumović ◽  
Aida Hanić ◽  
Vlado Kovačević

The purpose of this paper is to provide insight into the role of reward-based crowdfunding in farm financing, with a focus on its likelihood of success. The study uses a sample of 1,566 projects from the Kickstarter platform between 2014 and 2020. We added the level of urbanisation and relative importance of agriculture in the country's economy to the basic elements to assess the importance of the crowdfunding. We run a logistic regression model to investigate factors that motivate investment decisions. We discovered a statistically significant negative correlation between the self-set campaign goal and project success, as well as a small positive impact of number of backers and a positive impact of the importance of agriculture in the country's economy on crowdfunding success. In an era of rapid innovation and the rise of social networks, this paper contributes to the current literature on the agri-food industry's reword-based crowdfunding approach.


1997 ◽  
Vol 77 (4) ◽  
pp. 601-608 ◽  
Author(s):  
J. W. Dürr ◽  
H. G. Monardes ◽  
R. I. Cue ◽  
J. C. Philpot

A total of 1 558 080 lactation records from PATLQ Holstein cows were used to describe the annual trends in reasons for disposal in Quebec dairy herds from 1981 to 1994. Differences in culling trends between official and owner sampler herds, between parities, and between Quebec agricultural regions were compared. Statistical analysis was carried out by means of a logistic regression model, and the significance of trends was tested by linear contrasts. Involuntary culling had a clearly ascending trend during the period of study (from 23% in 1981 to 32% in 1994), as opposed to culling for low production (voluntary), which had a descending trend (from 16% in 1981 to 4.5% in 1994). This increase in involuntary culling was mainly due to increasing trends in culling for reproductive problems, mastitis and feet and leg problems. Official herds had a greater proportion of cows with sale codes and fewer cows culled for mastitis than owner sampler herds, and the trend for sale codes was ascending for official and stable for owner sampler herds. Culling for low production was more intensive in first parity, but all parities showed a descending trend over time. The proportion of cows with sale codes decreased with parity number. For all involuntary reasons, proportion of cows culled increased with parity number. Key words: Reasons for disposal, Holstein, Quebec, culling


2011 ◽  
Vol 139 (12) ◽  
pp. 1919-1927 ◽  
Author(s):  
S. E. VIRTANEN ◽  
L. K. SALONEN ◽  
R. LAUKKANEN ◽  
M. HAKKINEN ◽  
H. KORKEALA

SUMMARYA survey of 788 pigs from 120 farms was conducted to determine the within-farm prevalence of pathogenicYersinia enterocoliticaand a questionnaire of management conditions was mailed to the farms afterwards. A univariate statistical analysis with carriage and shedding as outcomes was conducted with random-effects logistic regression with farm as a clustering factor. Variables with aPvalue <0·15 were included into the respective multivariate random-effects logistic regression model. The use of municipal water was discovered to be a protective factor against carriage and faecal shedding of the pathogen. Organic production and buying feed from a certain feed manufacturer were also protective against total carriage. Tonsillar carriage, a different feed manufacturer, fasting pigs before transport to the slaughterhouse, higher-level farm health classification, and snout contacts between pigs were risk factors for faecal shedding. We concluded that differences in management can explain different prevalences ofY. enterocoliticabetween farms.


Sign in / Sign up

Export Citation Format

Share Document