The Story of Goldilocks and Three Twitter’s APIs: A Pilot Study on Twitter Data Sources and Disclosure

Public health and social science increasingly use Twitter for behavioral and marketing surveillance. However, few studies provide sufficient detail about Twitter data collection to allow either direct comparisons between studies or to support replication. The three primary application programming interfaces (API) of Twitter data sources are Streaming, Search, and Firehose. To date, no clear guidance exists about the advantages and limitations of each API, or about the comparability of the amount, content, and user accounts of retrieved tweets from each API. Such information is crucial to the validity, interpretation, and replicability of research findings. This study examines whether tweets collected using the same search filters over the same time period, but calling different APIs, would retrieve comparable datasets. We collected tweets about anti-smoking, e-cigarettes, and tobacco using the aforementioned APIs. The retrieved tweets largely overlapped between three APIs, but each also retrieved unique tweets, and the extent of overlap varied over time and by topic, resulting in different trends and potentially supporting diverging inferences. Researchers need to understand how different data sources can influence both the amount, content, and user accounts of data they retrieve from social media, in order to assess the implications of their choice of data source.

Download Full-text

Nonprobability Sampling and Twitter

Social Science Computer Review ◽

10.1177/0894439317709431 ◽

2017 ◽

Vol 36 (2) ◽

pp. 195-211 ◽

Cited By ~ 9

Author(s):

Patrick Rafail

Keyword(s):

Social Sciences ◽

Application Programming Interface ◽

Sampling Strategies ◽

Twitter Data ◽

Large Databases ◽

The Social ◽

Application Programming ◽

Data Source ◽

Twitter Activity ◽

Programming Interface

Twitter data are widely used in the social sciences. The Twitter Application Programming Interface (API) allows researchers to build large databases of user activity efficiently. Despite the potential of Twitter as a data source, less attention has been paid to issues of sampling, and in particular, the implications of different sampling strategies on overall data quality. This research proposes a set of conceptual distinctions between four types of populations that emerge when analyzing Twitter data and suggests sampling strategies that facilitate more comprehensive data collection from the Twitter API. Using three applications drawn from large databases of Twitter activity, this research also compares the results from the proposed sampling strategies, which provide defensible representations of the population of activity, to those collected with more frequently used hashtag samples. The results suggest that hashtag samples misrepresent important aspects of Twitter activity and may lead researchers to erroneous conclusions.

Download Full-text

Weaving seams with data: Conceptualizing City APIs as elements of infrastructures

Big Data & Society ◽

10.1177/2053951719827619 ◽

2019 ◽

Vol 6 (1) ◽

pp. 205395171982761 ◽

Cited By ~ 10

Author(s):

Christoph Raetzsch ◽

Gabriel Pereira ◽

Lasse S Vestergaard ◽

Martin Brynskov

Keyword(s):

Data Science ◽

Technological Development ◽

Smart Cities ◽

Data Sources ◽

Renewal Processes ◽

Application Programming Interfaces ◽

Information Infrastructures ◽

Application Programming ◽

Programming Interfaces

This article addresses the role of application programming interfaces (APIs) for integrating data sources in the context of smart cities and communities. On top of the built infrastructures in cities, application programming interfaces allow to weave new kinds of seams from static and dynamic data sources into the urban fabric. Contributing to debates about “urban informatics” and the governance of urban information infrastructures, this article provides a technically informed and critically grounded approach to evaluating APIs as crucial but often overlooked elements within these infrastructures. The conceptualization of what we term City APIs is informed by three perspectives: In the first part, we review established criticisms of proprietary social media APIs and their crucial function in current web architectures. In the second part, we discuss how the design process of APIs defines conventions of data exchanges that also reflect negotiations between API producers and API consumers about affordances and mental models of the underlying computer systems involved. In the third part, we present recent urban data innovation initiatives, especially CitySDK and OrganiCity, to underline the centrality of API design and governance for new kinds of civic and commercial services developed within and for cities. By bridging the fields of criticism, design, and implementation, we argue that City APIs as elements of infrastructures reveal how urban renewal processes become crucial sites of socio-political contestation between data science, technological development, urban management, and civic participation.

Download Full-text

How Business Strategy and Technology Impact the Role and the Tasks of CIOs

International Journal on IT/Business Alignment and Governance ◽

10.4018/ijitbag.2016010101 ◽

2016 ◽

Vol 7 (1) ◽

pp. 1-19 ◽

Cited By ~ 3

Author(s):

Tomi Dahlberg ◽

Päivi Hokkanen ◽

Mike Newman

Keyword(s):

Business Strategy ◽

Diagnostic Model ◽

Organizational Role ◽

Time Period ◽

Research Findings ◽

Over Time

The authors investigate the determinants of CIOs' organizational role and tasks. They first review previous studies, which they classify as either evolutionary or CIO role studies. They consider them to be characteristic to the usage of certain technologies or certain periods of times. The authors modify Leavitt's well-known organization diagnostic model to describe factors that shape the role and the tasks of CIOs over time, industries and technologies. They validate the model from interviews with 36 CIOs within six industries covering the time period from 1960s to present times. The authors also show that the model can be used to categorize prior research findings. They then use the model to describe how technology influences business strategy and how business strategy and technology impacts CIOs' role and tasks and vice versa. The authors discovered that the modified Leavitt model is a useful description of factors that both define CIOs' role and tasks at any particular time in any specific organization, and show how those tasks change.

Download Full-text

Exploring surveillance data biases when estimating the reproduction number: with insights into subpopulation transmission of COVID-19 in England

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2020.0283 ◽

2021 ◽

Vol 376 (1829) ◽

pp. 20200283

Author(s):

Katharine Sherratt ◽

Sam Abbott ◽

Sophie R. Meakin ◽

Joel Hellewell ◽

James D. Munday ◽

...

Keyword(s):

Hospital Admissions ◽

Reproduction Number ◽

Age Groups ◽

Surveillance Data ◽

Data Sources ◽

Transmission Potential ◽

Epidemic Dynamics ◽

Public Data ◽

Data Source ◽

Over Time

The time-varying reproduction number ( R t : the average number of secondary infections caused by each infected person) may be used to assess changes in transmission potential during an epidemic. While new infections are not usually observed directly, they can be estimated from data. However, data may be delayed and potentially biased. We investigated the sensitivity of R t estimates to different data sources representing COVID-19 in England, and we explored how this sensitivity could track epidemic dynamics in population sub-groups. We sourced public data on test-positive cases, hospital admissions and deaths with confirmed COVID-19 in seven regions of England over March through August 2020. We estimated R t using a model that mapped unobserved infections to each data source. We then compared differences in R t with the demographic and social context of surveillance data over time. Our estimates of transmission potential varied for each data source, with the relative inconsistency of estimates varying across regions and over time. R t estimates based on hospital admissions and deaths were more spatio-temporally synchronous than when compared to estimates from all test positives. We found these differences may be linked to biased representations of subpopulations in each data source. These included spatially clustered testing, and where outbreaks in hospitals, care homes, and young age groups reflected the link between age and severity of the disease. We highlight that policy makers could better target interventions by considering the source populations of R t estimates. Further work should clarify the best way to combine and interpret R t estimates from different data sources based on the desired use. This article is part of the theme issue ‘Modelling that shaped the early COVID-19 pandemic response in the UK’.

Download Full-text

Inequality in Posting Behaviour Over Time

Nordicom Review ◽

10.2478/nor-2019-0012 ◽

2019 ◽

Vol 40 (s1) ◽

pp. 31-49

Author(s):

Anja Bechmann

Keyword(s):

Age Groups ◽

Application Programming Interface ◽

Data Set ◽

Digital Inequality ◽

Demographic Groups ◽

Time Period ◽

Gender And Age ◽

Application Programming ◽

Programming Interface ◽

Over Time

AbstractThis study investigates the Facebook posting behaviour of 922 posting users over a time span of seven years (from 2007 to 2014), using an innovative combination of survey data and private profile feed post counts obtained through the Facebook Application Programming Interface (API) prior to the changes in 2015. A digital inequality lens is applied to study the effect of socio-demographic characteristics as well as time on posting behaviour. The findings indicate differences, for example in terms of gender and age, but some of this inequality is becoming smaller over time. The data set also shows inequality in the poster ratio in different age groups. Across all the demographic groups, the results show an increase in posting frequency in the time period observed, and limited evidence is found that young age groups have posted less on Facebook in more recent years.

Download Full-text

iCoverT: A rich data source on the incidence of child maltreatment over time in England and Wales

International Journal for Population Data Science ◽

10.23889/ijpds.v4i3.1176 ◽

2019 ◽

Vol 4 (3) ◽

Author(s):

Michelle Degli Esposti ◽

David K Humphreys ◽

Lucy Bowes

Keyword(s):

Public Health ◽

Child Maltreatment ◽

Administrative Data ◽

Child Protection ◽

Data Sources ◽

Eligibility Criteria ◽

England And Wales ◽

Data Source ◽

Rich Data ◽

Over Time

Background Child maltreatment is a major public health problem affecting one quarter of children in England and Wales. Good epidemiological data are needed to establish how many and which children are most at risk, and to evaluate the impact of policies and interventions. However, a comprehensive data source on child maltreatment is currently lacking. Aim We aimed to create a rich data source on the incidence of Child maltreatment over Time (iCoverT) in England and Wales. Methods We developed systematic methods to search and identify administrative data sources that regularly measured child maltreatment. Data sources were investigated and assessed against pre-specified eligibility criteria and a bespoke quality assessment tool. Relevant data were extracted, digitalised, and harmonised over time. All data and their accompanying documentation were prepared to form an open access data source: the iCoverT (osf.io/cf7mv). Results We identified 13 unique sources of administrative data, six of which met our eligibility criteria: Child protection statistics, Children in care, Criminal statistics, Homicide index, Mortality statistics and NSPCC statistics. Data and documentation were prepared and combined to form the iCoverT, including 272 variables, over 43,500 data points, and spanning over 150 years. A subsequent time series analysis demonstrated the utility of the iCoverT; identifying large overall decreases in child maltreatment from 1858 to 2016 (e.g. 90% decrease in child homicides (2.7 per fewer per 100,000 children)) but worrying recent increases from 2000 to 2016. Conclusion We systematically developed a rich data source on child maltreatment in England and Wales. Our methodology overcomes practical obstacles and offers a new approach for harnessing administrative data for research. Our resulting data source is a valuable public health surveillance tool, which can be used to monitor national levels of child maltreatment and to evaluate the effectiveness of child protection initiatives.

Download Full-text

Why Design Students Need Application Programming Interfaces (APIs)

Insider Knowledge - Proceedings of the Design Research Society Learn X Design Conference, 2019 ◽

10.21606/learnxdesign.2019.01038 ◽

2019 ◽

Author(s):

Mahshid Farzinfar ◽

◽

Stan Ruecker

Keyword(s):

Application Programming Interfaces ◽

Application Programming ◽

Design Students ◽

Programming Interfaces

Download Full-text

UK Fiscal Squeezes over a Century

10.1093/oso/9780198779612.003.0002 ◽

2017 ◽

Author(s):

Christopher Hood ◽

Rozana Himaz

Keyword(s):

Data Sources ◽

Fiscal Consolidation ◽

Financial Outcomes ◽

Financial Conditions ◽

Deficit Reduction ◽

Before And After ◽

The Uk ◽

Changes Over Time ◽

Spending Cuts ◽

Over Time

This chapter draws on historical statistics reporting financial outcomes for spending, taxation, debt, and deficit for the UK over a century to (a) identify quantitatively and compare the main fiscal squeeze episodes (i.e. major revenue increases, spending cuts, or both) in terms of type (soft squeezes and hard squeezes, spending squeezes, and revenue squeezes), depth, and length; (b) compare these periods of austerity against measures of fiscal consolidation in terms of deficit reduction; and (c) identify economic and financial conditions before and after the various squeezes. It explores the extent to which the identification of squeeze episodes and their classification is sensitive to which thresholds are set and what data sources are used. The chapter identifies major changes over time that emerge from this analysis over the changing depth and types of squeeze.

Download Full-text

In or Out?

10.1093/oso/9780198793380.003.0003 ◽

2018 ◽

Author(s):

Catherine E. De Vries

Keyword(s):

Public Opinion ◽

Status Quo ◽

Data Sources ◽

Fine Grained ◽

Alternative State ◽

Empirical Measurement ◽

The Status ◽

People’S Attitudes ◽

Over Time ◽

The Eu

This chapter introduces a benchmark theory of public opinion towards European integration. Rather than relying on generic labels like support or scepticism, the chapter suggests that public opinion towards the EU is both multidimensional and multilevel in nature. People’s attitudes towards Europe are essentially based on a comparison between the benefits of the status quo of membership and those associated with an alternative state, namely one’s country being outside the EU. This comparison is coined the ‘EU differential’. When comparing these benefits, people rely on both their evaluations of the outcomes (policy evaluations) and the system that produces them (regime evaluations). This chapter presents a fine-grained conceptualization of what it means to be an EU supporter or Eurosceptic; it also designs a careful empirical measurement strategy to capture variation, both cross-nationally and over time. The chapter cross-validates these measures against a variety of existing and newly developed data sources.

Download Full-text

A systematic gray literature review: The technologies and concerns of microservice application programming interfaces

Software Practice and Experience ◽

10.1002/spe.2967 ◽

2021 ◽

Author(s):

Fangwei Chen ◽

Li Zhang ◽

Xiaoli Lian

Keyword(s):

Literature Review ◽

Application Programming Interfaces ◽

Gray Literature ◽

Application Programming ◽

Programming Interfaces

Download Full-text