Effects of subtitle speed on proportional reading time

Translation Cognition & Behavior ◽

10.1075/tcb.00057.sza ◽

2021 ◽

Author(s):

Agnieszka Szarkowska ◽

Breno Silva ◽

David Orrego-Carmona

Keyword(s):

Eye Tracking ◽

Strong Evidence ◽

Mixed Models ◽

Reading Time ◽

English Proficiency ◽

Linear Mixed Models ◽

Statistical Techniques

Abstract How much time do viewers spend reading subtitles and does it depend on the subtitle speed? By posing these questions, in this paper we re-analyse previous data to address this issue while promoting two methodological advancements in eye-tracking audiovisual research: (1) the use of proportional reading time (PRT) as a metric of time spent on subtitle reading and (2) the analysis of data via linear mixed models (LMMs). We tested 19 Polish L1 viewers with advanced English proficiency watching two clips with English soundtrack with Polish subtitles. First, we compared PRT at two different subtitle speeds: 12 characters per second (cps) and 20 cps. Then, we used actual subtitle speed rates to better understand the speed-PRT relationship. The results showed a significantly higher PRT for 20 cps compared to 12 cps, with the models predicting a PRT of 45.24% at 20 cps. We have also found strong evidence of the advantage of LMMs over more commonly used statistical techniques.

Download Full-text

The Opposition of Surprisal and Semantic Similarity in the Prediction of Language Processing: Evidence from Eye-tracking Data

10.31234/osf.io/zypk9 ◽

2020 ◽

Author(s):

Kun Sun

Keyword(s):

Eye Tracking ◽

Semantic Similarity ◽

Cognitive Processing ◽

Language Processing ◽

Language Comprehension ◽

Word Processing ◽

Reading Time ◽

Computational Models ◽

Tracking Data ◽

Dynamic Approach

Expectations or predictions about upcoming content play an important role during language comprehension and processing. One important aspect of recent studies of language comprehension and processing concerns the estimation of the upcoming words in a sentence or discourse. Many studies have used eye-tracking data to explore computational and cognitive models for contextual word predictions and word processing. Eye-tracking data has previously been widely explored with a view to investigating the factors that influence word prediction. However, these studies are problematic on several levels, including the stimuli, corpora, statistical tools they applied. Although various computational models have been proposed for simulating contextual word predictions, past studies usually preferred to use a single computational model. The disadvantage of this is that it often cannot give an adequate account of cognitive processing in language comprehension. To avoid these problems, this study draws upon a massive natural and coherent discourse as stimuli in collecting the data on reading time. This study trains two state-of-art computational models (surprisal and semantic (dis)similarity from word vectors by linear discriminative learning (LDL)), measuring knowledge of both the syntagmatic and paradigmatic structure of language. We develop a `dynamic approach' to compute semantic (dis)similarity. It is the first time that these two computational models have been merged. Models are evaluated using advanced statistical methods. Meanwhile, in order to test the efficiency of our approach, one recently developed cosine method of computing semantic (dis)similarity based on word vectors data adopted is used to compare with our `dynamic' approach. The two computational and fixed-effect statistical models can be used to cross-verify the findings, thus ensuring that the result is reliable. All results support that surprisal and semantic similarity are opposed in the prediction of the reading time of words although both can make good predictions. Additionally, our `dynamic' approach performs better than the popular cosine method. The findings of this study are therefore of significance with regard to acquiring a better understanding how humans process words in a real-world context and how they make predictions in language cognition and processing.

Download Full-text

Heat, Hills and the High Season: A Model-Based Comparative Analysis of Spatio-Temporal Factors Affecting Shared Bicycle Use in Three Southern European Islands

Sustainability ◽

10.3390/su13063274 ◽

2021 ◽

Vol 13 (6) ◽

pp. 3274

Author(s):

Suzanne Maas ◽

Paraskevas Nikolaou ◽

Maria Attard ◽

Loukas Dimitriou

Keyword(s):

Correlation Analysis ◽

Mixed Models ◽

Linear Mixed Models ◽

Positive Association ◽

Sustainable Mobility ◽

Weather Factors ◽

Gran Canaria ◽

Bivariate Correlation ◽

Temporal Factors

Bicycle sharing systems (BSSs) have been implemented in cities worldwide in an attempt to promote cycling. Despite exhibiting characteristics considered to be barriers to cycling, such as hot summers, hilliness and car-oriented infrastructure, Southern European island cities and tourist destinations Limassol (Cyprus), Las Palmas de Gran Canaria (Canary Islands, Spain) and the Valletta conurbation (Malta) are all experiencing the implementation of BSSs and policies to promote cycling. In this study, a year of trip data and secondary datasets are used to analyze dock-based BSS usage in the three case-study cities. How land use, socio-economic, network and temporal factors influence BSS use at station locations, both as an origin and as a destination, was examined using bivariate correlation analysis and through the development of linear mixed models for each case study. Bivariate correlations showed significant positive associations with the number of cafes and restaurants, vicinity to the beach or promenade and the percentage of foreign population at the BSS station locations in all cities. A positive relation with cycling infrastructure was evident in Limassol and Las Palmas de Gran Canaria, but not in Malta, as no cycling infrastructure is present in the island’s conurbation, where the BSS is primarily operational. Elevation had a negative association with BSS use in all three cities. In Limassol and Malta, where seasonality in weather patterns is strongest, a negative effect of rainfall and a positive effect of higher temperature were observed. Although there was a positive association between BSS use and the number of visiting tourists in Limassol and Malta, this is predominantly explained through the multi-collinearity with weather factors rather than by intensive use of the BSS by tourists. The linear mixed models showed more fine-grained results and explained differences in BSS use at stations, including differences for station use as an origin and as a destination. The insights from the correlation analysis and linear mixed models can be used to inform policies promoting cycling and BSS use and support sustainable mobility policies in the case-study cities and cities with similar characteristics.

Download Full-text

Confidence, prediction, and tolerance in linear mixed models

Statistics in Medicine ◽

10.1002/sim.8386 ◽

2019 ◽

Vol 38 (30) ◽

pp. 5603-5622 ◽

Cited By ~ 3

Author(s):

Bernard G. Francq ◽

Dan Lin ◽

Walter Hoyer

Keyword(s):

Mixed Models ◽

Linear Mixed Models

Download Full-text

Power for balanced linear mixed models with complex missing data processes

Communication in Statistics- Theory and Methods ◽

10.1080/03610926.2021.1909732 ◽

2021 ◽

pp. 1-19

Author(s):

Kevin P. Josey ◽

Brandy M. Ringham ◽

Anna E. Barón ◽

Margaret Schenkman ◽

Katherine A. Sauder ◽

...

Keyword(s):

Missing Data ◽

Mixed Models ◽

Linear Mixed Models

Download Full-text

l2-Penalized temporal logit-mixed models for the estimation of regional obesity prevalence over time

Statistical Methods in Medical Research ◽

10.1177/09622802211017583 ◽

2021 ◽

pp. 096228022110175

Author(s):

Jan P Burgard ◽

Joscha Krause ◽

Ralf Münnich ◽

Domingo Morales

Keyword(s):

Parameter Estimation ◽

Medical Treatment ◽

Mixed Models ◽

Generalized Linear Mixed Models ◽

Linear Mixed Models ◽

Obesity Prevalence ◽

Model Parameter ◽

Model Parameter Estimation ◽

Public Health Reporting ◽

Over Time

Obesity is considered to be one of the primary health risks in modern industrialized societies. Estimating the evolution of its prevalence over time is an essential element of public health reporting. This requires the application of suitable statistical methods on epidemiologic data with substantial local detail. Generalized linear-mixed models with medical treatment records as covariates mark a powerful combination for this purpose. However, the task is methodologically challenging. Disease frequencies are subject to both regional and temporal heterogeneity. Medical treatment records often show strong internal correlation due to diagnosis-related grouping. This frequently causes excessive variance in model parameter estimation due to rank-deficiency problems. Further, generalized linear-mixed models are often estimated via approximate inference methods as their likelihood functions do not have closed forms. These problems combined lead to unacceptable uncertainty in prevalence estimates over time. We propose an l2-penalized temporal logit-mixed model to solve these issues. We derive empirical best predictors and present a parametric bootstrap to estimate their mean-squared errors. A novel penalized maximum approximate likelihood algorithm for model parameter estimation is stated. With this new methodology, the regional obesity prevalence in Germany from 2009 to 2012 is estimated. We find that the national prevalence ranges between 15 and 16%, with significant regional clustering in eastern Germany.

Download Full-text