scholarly journals The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure

Author(s):  
Yoonsang Kim ◽  
Rachel Nordgren ◽  
Sherry Emery

Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication. The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and user accounts of retrieved tweets from each API. Such information is crucial to the validity, interpretation, and replicability of research findings. This study examines whether tweets collected using the same search filters over the same time period, but calling different APIs, would retrieve comparable datasets. We collected tweets about anti-smoking, e-cigarettes, and tobacco using the aforementioned APIs. The retrieved tweets largely overlapped between three APIs, but each also retrieved unique tweets, and the extent of overlap varied over time and by topic, resulting in different trends and potentially supporting diverging inferences. Researchers need to understand how different data sources can influence both the amount, content, and user accounts of data they retrieve from social media, in order to assess the implications of their choice of data source.

2017 ◽  
Vol 36 (2) ◽  
pp. 195-211 ◽  
Author(s):  
Patrick Rafail

Twitter data are widely used in the social sciences. The Twitter Application Programming Interface (API) allows researchers to build large databases of user activity efficiently. Despite the potential of Twitter as a data source, less attention has been paid to issues of sampling, and in particular, the implications of different sampling strategies on overall data quality. This research proposes a set of conceptual distinctions between four types of populations that emerge when analyzing Twitter data and suggests sampling strategies that facilitate more comprehensive data collection from the Twitter API. Using three applications drawn from large databases of Twitter activity, this research also compares the results from the proposed sampling strategies, which provide defensible representations of the population of activity, to those collected with more frequently used hashtag samples. The results suggest that hashtag samples misrepresent important aspects of Twitter activity and may lead researchers to erroneous conclusions.


2019 ◽  
Vol 6 (1) ◽  
pp. 205395171982761 ◽  
Author(s):  
Christoph Raetzsch ◽  
Gabriel Pereira ◽  
Lasse S Vestergaard ◽  
Martin Brynskov

This article addresses the role of application programming interfaces (APIs) for integrating data sources in the context of smart cities and communities. On top of the built infrastructures in cities, application programming interfaces allow to weave new kinds of seams from static and dynamic data sources into the urban fabric. Contributing to debates about “urban informatics” and the governance of urban information infrastructures, this article provides a technically informed and critically grounded approach to evaluating APIs as crucial but often overlooked elements within these infrastructures. The conceptualization of what we term City APIs is informed by three perspectives: In the first part, we review established criticisms of proprietary social media APIs and their crucial function in current web architectures. In the second part, we discuss how the design process of APIs defines conventions of data exchanges that also reflect negotiations between API producers and API consumers about affordances and mental models of the underlying computer systems involved. In the third part, we present recent urban data innovation initiatives, especially CitySDK and OrganiCity, to underline the centrality of API design and governance for new kinds of civic and commercial services developed within and for cities. By bridging the fields of criticism, design, and implementation, we argue that City APIs as elements of infrastructures reveal how urban renewal processes become crucial sites of socio-political contestation between data science, technological development, urban management, and civic participation.


Author(s):  
Tomi Dahlberg ◽  
Päivi Hokkanen ◽  
Mike Newman

The authors investigate the determinants of CIOs' organizational role and tasks. They first review previous studies, which they classify as either evolutionary or CIO role studies. They consider them to be characteristic to the usage of certain technologies or certain periods of times. The authors modify Leavitt's well-known organization diagnostic model to describe factors that shape the role and the tasks of CIOs over time, industries and technologies. They validate the model from interviews with 36 CIOs within six industries covering the time period from 1960s to present times. The authors also show that the model can be used to categorize prior research findings. They then use the model to describe how technology influences business strategy and how business strategy and technology impacts CIOs' role and tasks and vice versa. The authors discovered that the modified Leavitt model is a useful description of factors that both define CIOs' role and tasks at any particular time in any specific organization, and show how those tasks change.


2021 ◽  
Vol 376 (1829) ◽  
pp. 20200283
Author(s):  
Katharine Sherratt ◽  
Sam Abbott ◽  
Sophie R. Meakin ◽  
Joel Hellewell ◽  
James D. Munday ◽  
...  

The time-varying reproduction number ( R t : the average number of secondary infections caused by each infected person) may be used to assess changes in transmission potential during an epidemic. While new infections are not usually observed directly, they can be estimated from data. However, data may be delayed and potentially biased. We investigated the sensitivity of R t estimates to different data sources representing COVID-19 in England, and we explored how this sensitivity could track epidemic dynamics in population sub-groups. We sourced public data on test-positive cases, hospital admissions and deaths with confirmed COVID-19 in seven regions of England over March through August 2020. We estimated R t using a model that mapped unobserved infections to each data source. We then compared differences in R t with the demographic and social context of surveillance data over time. Our estimates of transmission potential varied for each data source, with the relative inconsistency of estimates varying across regions and over time. R t estimates based on hospital admissions and deaths were more spatio-temporally synchronous than when compared to estimates from all test positives. We found these differences may be linked to biased representations of subpopulations in each data source. These included spatially clustered testing, and where outbreaks in hospitals, care homes, and young age groups reflected the link between age and severity of the disease. We highlight that policy makers could better target interventions by considering the source populations of R t estimates. Further work should clarify the best way to combine and interpret R t estimates from different data sources based on the desired use. This article is part of the theme issue ‘Modelling that shaped the early COVID-19 pandemic response in the UK’.


2019 ◽  
Vol 40 (s1) ◽  
pp. 31-49
Author(s):  
Anja Bechmann

AbstractThis study investigates the Facebook posting behaviour of 922 posting users over a time span of seven years (from 2007 to 2014), using an innovative combination of survey data and private profile feed post counts obtained through the Facebook Application Programming Interface (API) prior to the changes in 2015. A digital inequality lens is applied to study the effect of socio-demographic characteristics as well as time on posting behaviour. The findings indicate differences, for example in terms of gender and age, but some of this inequality is becoming smaller over time. The data set also shows inequality in the poster ratio in different age groups. Across all the demographic groups, the results show an increase in posting frequency in the time period observed, and limited evidence is found that young age groups have posted less on Facebook in more recent years.


Author(s):  
Michelle Degli Esposti ◽  
David K Humphreys ◽  
Lucy Bowes

Background Child maltreatment is a major public health problem affecting one quarter of children in England and Wales. Good epidemiological data are needed to establish how many and which children are most at risk, and to evaluate the impact of policies and interventions. However, a comprehensive data source on child maltreatment is currently lacking. Aim We aimed to create a rich data source on the incidence of Child maltreatment over Time (iCoverT) in England and Wales. Methods We developed systematic methods to search and identify administrative data sources that regularly measured child maltreatment. Data sources were investigated and assessed against pre-specified eligibility criteria and a bespoke quality assessment tool. Relevant data were extracted, digitalised, and harmonised over time. All data and their accompanying documentation were prepared to form an open access data source: the iCoverT (osf.io/cf7mv). Results We identified 13 unique sources of administrative data, six of which met our eligibility criteria: Child protection statistics, Children in care, Criminal statistics, Homicide index, Mortality statistics and NSPCC statistics. Data and documentation were prepared and combined to form the iCoverT, including 272 variables, over 43,500 data points, and spanning over 150 years. A subsequent time series analysis demonstrated the utility of the iCoverT; identifying large overall decreases in child maltreatment from 1858 to 2016 (e.g. 90% decrease in child homicides (2.7 per fewer per 100,000 children)) but worrying recent increases from 2000 to 2016. Conclusion We systematically developed a rich data source on child maltreatment in England and Wales. Our methodology overcomes practical obstacles and offers a new approach for harnessing administrative data for research. Our resulting data source is a valuable public health surveillance tool, which can be used to monitor national levels of child maltreatment and to evaluate the effectiveness of child protection initiatives.


Author(s):  
Christopher Hood ◽  
Rozana Himaz

This chapter draws on historical statistics reporting financial outcomes for spending, taxation, debt, and deficit for the UK over a century to (a) identify quantitatively and compare the main fiscal squeeze episodes (i.e. major revenue increases, spending cuts, or both) in terms of type (soft squeezes and hard squeezes, spending squeezes, and revenue squeezes), depth, and length; (b) compare these periods of austerity against measures of fiscal consolidation in terms of deficit reduction; and (c) identify economic and financial conditions before and after the various squeezes. It explores the extent to which the identification of squeeze episodes and their classification is sensitive to which thresholds are set and what data sources are used. The chapter identifies major changes over time that emerge from this analysis over the changing depth and types of squeeze.


Author(s):  
Catherine E. De Vries

This chapter introduces a benchmark theory of public opinion towards European integration. Rather than relying on generic labels like support or scepticism, the chapter suggests that public opinion towards the EU is both multidimensional and multilevel in nature. People’s attitudes towards Europe are essentially based on a comparison between the benefits of the status quo of membership and those associated with an alternative state, namely one’s country being outside the EU. This comparison is coined the ‘EU differential’. When comparing these benefits, people rely on both their evaluations of the outcomes (policy evaluations) and the system that produces them (regime evaluations). This chapter presents a fine-grained conceptualization of what it means to be an EU supporter or Eurosceptic; it also designs a careful empirical measurement strategy to capture variation, both cross-nationally and over time. The chapter cross-validates these measures against a variety of existing and newly developed data sources.


Sign in / Sign up

Export Citation Format

Share Document