EPJ Data Science | ScienceGate

A large scale study of reader interactions with images on Wikipedia

EPJ Data Science ◽

10.1140/epjds/s13688-021-00312-8 ◽

2022 ◽

Vol 11 (1) ◽

Author(s):

Daniele Rama ◽

Tiziano Piccardi ◽

Miriam Redi ◽

Rossano Schifanella

Keyword(s):

Visual Arts ◽

Large Scale ◽

Scale Analysis ◽

Information Need ◽

Visual Content ◽

Large Scale Analysis ◽

User Communities ◽

Order Of Magnitude ◽

The Web

AbstractWikipedia is the largest source of free encyclopedic knowledge and one of the most visited sites on the Web. To increase reader understanding of the article, Wikipedia editors add images within the text of the article’s body. However, despite their widespread usage on web platforms and the huge volume of visual content on Wikipedia, little is known about the importance of images in the context of free knowledge environments. To bridge this gap, we collect data about English Wikipedia reader interactions with images during one month and perform the first large-scale analysis of how interactions with images happen on Wikipedia. First, we quantify the overall engagement with images, finding that one in 29 pageviews results in a click on at least one image, one order of magnitude higher than interactions with other types of article content. Second, we study what factors associate with image engagement and observe that clicks on images occur more often in shorter articles and articles about visual arts or transports and biographies of less well-known people. Third, we look at interactions with Wikipedia article previews and find that images help support reader information need when navigating through the site, especially for more popular pages. The findings in this study deepen our understanding of the role of images for free knowledge and provide a guide for Wikipedia editors and web user communities to enrich the world’s largest source of encyclopedic knowledge.

Imagine a Walkable City: Physical activity and urban imageability across 19 major cities

EPJ Data Science ◽

10.1140/epjds/s13688-021-00313-7 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Marios Constantinides ◽

Sagar Joglekar ◽

Sanja Šćepanović ◽

Daniele Quercia

Keyword(s):

Physical Activity ◽

Spatial Configuration ◽

Economic Status ◽

Positive Association ◽

Weather Conditions ◽

Coarse Grained ◽

Compensation Mechanism ◽

Bad Weather ◽

Developed World ◽

Urban Characteristics

AbstractCan the shape of a city promote physical activity? The question of why individuals engage in physical activity has been widely researched, but that research has predominantly focused on socio-demographic characteristics (e.g., age, gender, economic status) and coarse-grained spatial characteristics (e.g., population density), overlooking key urban characteristics of, say, whether a city is navigable or, as urban theorist Kevin Lynch put it, whether it is ‘imageable’ (whether its spatial configuration is economic of mental effort). That is mainly because, at scale, it is neither easy to model imageability nor feasible to measure physical activity. We modeled urban imageability with a single scalable metric of entropy, and then measured physical activity from 233K wearable devices over three years, and did so across 19 major cities in the developed world. We found that, after controlling for greenery, wealth, walkability, presence of landmarks, and weather conditions, the legibility hypothesis still holds: the more imageable a city, the more its dwellers engage in physical activity. Interestingly, wealth (GDP per capita) has a positive association with physical activity only in cities with inclement climate, effectively acting as a compensation mechanism for bad weather.

Companies under stress: the impact of shocks on the production network

EPJ Data Science ◽

10.1140/epjds/s13688-021-00310-w ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Róbert Pálovics ◽

Primož Dolenc ◽

Jure Leskovec

Keyword(s):

Financial Crisis ◽

Performance Indicators ◽

Matched Pairs ◽

Production Network ◽

Production Networks ◽

The Future ◽

Trading Partners ◽

Financial Crisis Of 2008 ◽

The Impact ◽

Different Levels

AbstractIn this paper we analyze the effect of shocks in production networks. Our work is based on a rich dataset that contains information about companies from Slovenia right after the financial crisis of 2008. The processed data spans for 8 years and covers the transaction history as well as performance indicators and various metadata of the companies. We define sales shocks at different levels, and identify companies impacted by them. Next we investigate stress, the potential immediate upstream and downstream impact of a shock within the production network. We base our main findings on a matched pairs analysis of stressed companies. We find that both shock and stress are associated with reporting bankruptcy in the future and that stress foremost impacts the future sales of customers. Furthermore, we find evidence that stress not only results in performance losses but the reconfiguration of the production network as well. We show that stressed companies actively seek for new trading partners, and that these new links often share the industry of the shocked company. These results suggest that both stressed customers and suppliers react quickly to stress and adjust their trading relationships.

The presence of occupational structure in online texts based on word embedding NLP models

EPJ Data Science ◽

10.1140/epjds/s13688-021-00311-9 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Zoltán Kmetty ◽

Júlia Koltai ◽

Tamás Rudas

Keyword(s):

Social Distance ◽

Social Stratification ◽

Text Analysis ◽

Semantic Space ◽

Affirmative Answer ◽

Occupational Structure ◽

The Social ◽

Textual Data ◽

Organizational Aspect ◽

Online Texts

AbstractResearch on social stratification is closely linked to analyzing the prestige associated with different occupations. This research focuses on the positions of occupations in the semantic space represented by large amounts of textual data. The results are compared to standard results in social stratification to see whether the classical results are reproduced and if additional insights can be gained into the social positions of occupations. The paper gives an affirmative answer to both questions.The results show a fundamental similarity of the occupational structure obtained from text analysis to the structure described by prestige and social distance scales. While our research reinforces many theories and empirical findings of the traditional body of literature on social stratification and, in particular, occupational hierarchy, it pointed to the importance of a factor not discussed in the mainline of stratification literature so far: the power and organizational aspect.

A profile-based sentiment-aware approach for depression detection in social media

EPJ Data Science ◽

10.1140/epjds/s13688-021-00309-3 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

José de Jesús Titla-Tlatelpa ◽

Rosa María Ortega-Mendoza ◽

Manuel Montes-y-Gómez ◽

Luis Villaseñor-Pineda

Keyword(s):

Social Media ◽

Main Idea ◽

Text Representation ◽

Computational Tools ◽

New Approach ◽

Linguistic Markers ◽

Depression Detection ◽

Benchmark Datasets ◽

Discriminative Value ◽

Severe Mental Health Problem

AbstractDepression is a severe mental health problem. Due to its relevance, the development of computational tools for its detection has attracted increasing attention in recent years. In this context, several research works have addressed the problem using word-based approaches (e.g., a bag of words). This type of representation has shown to be useful, indicating that words act as linguistic markers of depression. However, we believe that in addition to words, their contexts contain implicitly valuable information that could be inferred and exploited to enhance the detection of signs of depression. Specifically, we explore the use of user’s characteristics and the expressed sentiments in the messages as context insights. The main idea is that the words’ discriminative value depends on the characteristics of the person who is writing and on the polarity of the messages where they occur. Hence, this paper introduces a new approach based on specializing the framework of classification to profiles of users (e.g., males or women) and considering the sentiments expressed in the messages through a new text representation that captures their polarity (e.g., positive or negative). The proposed approach was evaluated on benchmark datasets from social media; the results achieved are encouraging, since they outperform those of state-of-the-art corresponding to computationally more expensive methods.

Characterizing partisan political narrative frameworks about COVID-19 on Twitter

EPJ Data Science ◽

10.1140/epjds/s13688-021-00308-4 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Elise Jing ◽

Yong-Yeol Ahn

Keyword(s):

Small Businesses ◽

Social Issues ◽

Local Politics ◽

Critical Role ◽

The United States ◽

Crisis Response ◽

Framing Analysis ◽

Membership Categorization ◽

Political Narrative

AbstractThe COVID-19 pandemic is a global crisis that has been testing every society and exposing the critical role of local politics in crisis response. In the United States, there has been a strong partisan divide between the Democratic and Republican party’s narratives about the pandemic which resulted in polarization of individual behaviors and divergent policy adoption across regions. As shown in this case, as well as in most major social issues, strongly polarized narrative frameworks facilitate such narratives. To understand polarization and other social chasms, it is critical to dissect these diverging narratives. Here, taking the Democratic and Republican political social media posts about the pandemic as a case study, we demonstrate that a combination of computational methods can provide useful insights into the different contexts, framing, and characters and relationships that construct their narrative frameworks which individual posts source from. Leveraging a dataset of tweets from the politicians in the U.S., including the ex-president, members of Congress, and state governors, we found that the Democrats’ narrative tends to be more concerned with the pandemic as well as financial and social support, while the Republicans discuss more about other political entities such as China. We then perform an automatic framing analysis to characterize the ways in which they frame their narratives, where we found that the Democrats emphasize the government’s role in responding to the pandemic, and the Republicans emphasize the roles of individuals and support for small businesses. Finally, we present a semantic role analysis that uncovers the important characters and relationships in their narratives as well as how they facilitate a membership categorization process. Our findings concretely expose the gaps in the “elusive consensus” between the two parties. Our methodologies may be applied to computationally study narratives in various domains.

Finding disease outbreak locations from human mobility data

EPJ Data Science ◽

10.1140/epjds/s13688-021-00306-6 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Frank Schlosser ◽

Dirk Brockmann

Keyword(s):

Human Mobility ◽

Disease Outbreaks ◽

Theoretical Models ◽

Disease Outbreak ◽

Digital Data ◽

Contact Tracing ◽

Movement Trajectories ◽

Mobility Data ◽

Movement Data ◽

Synthetic Datasets

AbstractFinding the origin location of an infectious disease outbreak quickly is crucial in mitigating its further dissemination. Current methods to identify outbreak locations early on rely on interviewing affected individuals and correlating their movements, which is a manual, time-consuming, and error-prone process. Other methods such as contact tracing, genomic sequencing or theoretical models of epidemic spread offer help, but they are not applicable at the onset of an outbreak as they require highly processed information or established transmission chains. Digital data sources such as mobile phones offer new ways to find outbreak sources in an automated way. Here, we propose a novel method to determine outbreak origins from geolocated movement data of individuals affected by the outbreak. Our algorithm scans movement trajectories for shared locations and identifies the outbreak origin as the most dominant among them. We test the method using various empirical and synthetic datasets, and demonstrate that it is able to single out the true outbreak location with high accuracy, requiring only data of $N=4$ N = 4 individuals. The method can be applied to scenarios with multiple outbreak locations, and is even able to estimate the number of outbreak sources if unknown, while being robust to noise. Our method is the first to offer a reliable, accurate out-of-the-box approach to identify outbreak locations in the initial phase of an outbreak. It can be easily and quickly applied in a crisis situation, improving on previous manual approaches. The method is not only applicable in the context of disease outbreaks, but can be used to find shared locations in movement data in other contexts as well.

Emotions in online rumor diffusion

EPJ Data Science ◽

10.1140/epjds/s13688-021-00307-5 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Nicolas Pröllochs ◽

Dominik Bär ◽

Stefan Feuerriegel

Keyword(s):

Linear Regression ◽

Regression Model ◽

Human Behavior ◽

Linear Regression Model ◽

Large Scale ◽

Basic Emotions ◽

Time Horizons ◽

Spreading Dynamics ◽

Diffusion Dynamics ◽

Generalized Linear Regression

AbstractEmotions are regarded as a dominant driver of human behavior, and yet their role in online rumor diffusion is largely unexplored. In this study, we empirically study the extent to which emotions explain the diffusion of online rumors. We analyze a large-scale sample of 107,014 online rumors from Twitter, as well as their cascades. For each rumor, the embedded emotions were measured based on eight so-called basic emotions from Plutchik’s wheel of emotions (i.e., anticipation–surprise, anger–fear, trust–disgust, joy–sadness). We then estimated using a generalized linear regression model how emotions are associated with the spread of online rumors in terms of (1) cascade size, (2) cascade lifetime, and (3) structural virality. Our results suggest that rumors conveying anticipation, anger, and trust generate more reshares, spread over longer time horizons, and become more viral. In contrast, a smaller size, lifetime, and virality is found for surprise, fear, and disgust. We further study how the presence of 24 dyadic emotional interactions (i.e., feelings composed of two emotions) is associated with diffusion dynamics. Here, we find that rumors cascades with high degrees of aggressiveness are larger in size, longer-lived, and more viral. Altogether, emotions embedded in online rumors are important determinants of the spreading dynamics.

Predictive modeling to study lifestyle politics with Facebook likes

EPJ Data Science ◽

10.1140/epjds/s13688-021-00305-7 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Stiene Praet ◽

Peter Van Aelst ◽

Patrick van Erkel ◽

Stephan Van der Veeken ◽

David Martens

Keyword(s):

Social Media ◽

Survey Data ◽

Party System ◽

Political Polarization ◽

Daily Lives ◽

Political Preference ◽

Social Media Data ◽

Cultural Preferences ◽

Wide Range ◽

Media Data

Abstract“Lifestyle politics” suggests that political and ideological opinions are strongly connected to our consumption choices, music and food taste, cultural preferences, and other aspects of our daily lives. With the growing political polarization this idea has become all the more relevant to a wide range of social scientists. Empirical research in this domain, however, is confronted with an impractical challenge; this type of detailed information on people’s lifestyle is very difficult to operationalize, and extremely time consuming and costly to query in a survey. A potential valuable alternative data source to capture these values and lifestyle choices is social media data. In this study, we explore the value of Facebook “like” data to complement traditional survey data to study lifestyle politics. We collect a unique dataset of Facebook likes and survey data of more than 6500 participants in Belgium, a fragmented multi-party system. Based on both types of data, we infer the political and ideological preference of our respondents. The results indicate that non-political Facebook likes are indicative of political preference and are useful to describe voters in terms of common interests, cultural preferences, and lifestyle features. This shows that social media data can be a valuable complement to traditional survey data to study lifestyle politics.

On estimating the predictability of human mobility: the role of routine

EPJ Data Science ◽

10.1140/epjds/s13688-021-00304-8 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Douglas do Couto Teixeira ◽

Jussara M. Almeida ◽

Aline Carneiro Viana

Keyword(s):

Human Behavior ◽

Human Mobility ◽

Time Period ◽

Current Location ◽

Different Types ◽

History Of ◽

Mobility Behavior

AbstractGiven the difficulties in predicting human behavior, one may wish to establish bounds on our ability to accurately perform such predictions. In the case of mobility-related behavior, there exists a fundamental technique to estimate the predictability of an individual’s mobility, as expressed in a given dataset. Although useful in several scenarios, this technique focused on human mobility as a monolithic entity, which poses challenges to understanding different types of behavior that may be hard to predict. In this paper, we propose to study predictability in terms of two components of human mobility: routine and novelty, where routine is related to preferential returns, and novelty is related to exploration. Viewing one’s mobility in terms of these two components allows us to identify important patterns about the predictability of one’s mobility.Additionally, we argue that mobility behavior in the novelty component is hard to predict if we rely on the history of visited locations (as the predictability technique does), and therefore we here focus on analyzing what affects the predictability of one’s routine. To that end, we propose a technique that allows us to (i) quantify the effect of novelty on predictability, and (ii) gauge how much one’s routine deviates from a reference routine that is completely predictable, therefore estimating the amount of hard-to-predict behavior in one’s routine. Finally, we rely on previously proposed metrics, as well as a newly proposed one, to understand what affects the predictability of a person’s routine. Our experiments show that our metrics are able to capture most of the variability in one’s routine (adjusted $R^{2}$ R 2 of up to 84.9% and 96.0% on a GPS and CDR datasets, respectively), and that routine behavior can be largely explained by three types of patterns: (i) stationary patterns, in which a person stays in her current location for a given time period, (ii) regular visits, in which people visit a few preferred locations with occasional visits to other places, and (iii) diversity of trajectories, in which people change the order in which they visit certain locations.

EPJ Data Science
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Springer-Verlag

A large scale study of reader interactions with images on Wikipedia

Imagine a Walkable City: Physical activity and urban imageability across 19 major cities

Companies under stress: the impact of shocks on the production network

The presence of occupational structure in online texts based on word embedding NLP models

A profile-based sentiment-aware approach for depression detection in social media

Characterizing partisan political narrative frameworks about COVID-19 on Twitter

Finding disease outbreak locations from human mobility data

Emotions in online rumor diffusion

Predictive modeling to study lifestyle politics with Facebook likes

On estimating the predictability of human mobility: the role of routine

Export Citation Format

EPJ Data ScienceLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Springer-Verlag

A large scale study of reader interactions with images on Wikipedia

Imagine a Walkable City: Physical activity and urban imageability across 19 major cities

Companies under stress: the impact of shocks on the production network

The presence of occupational structure in online texts based on word embedding NLP models

A profile-based sentiment-aware approach for depression detection in social media

Characterizing partisan political narrative frameworks about COVID-19 on Twitter

Finding disease outbreak locations from human mobility data

Emotions in online rumor diffusion

Predictive modeling to study lifestyle politics with Facebook likes

On estimating the predictability of human mobility: the role of routine

EPJ Data Science
Latest Publications