Linking Survey and Twitter Data: Informed Consent, Disclosure, Security, and Archiving

Linked survey and Twitter data present an unprecedented opportunity for social scientific analysis, but the ethical implications for such work are complex—requiring a deeper understanding of the nature and composition of Twitter data to fully appreciate the risks of disclosure and harm to participants. In this article, we draw on our experience of three recent linked data studies, briefly discussing the background research on data linkage and the complications around ensuring informed consent. Particular attention is paid to the vast array of data available from Twitter and in what manner it might be disclosive. In light of this, the issues of maintaining security, minimizing risk, archiving, and reuse are applied to linked Twitter and survey data. In conclusion, we reflect on how our ability to collect and work with Twitter data has outpaced our technical understandings of how the data are constituted and observe that understanding one’s data is an essential prerequisite for ensuring best ethical practice.

Download Full-text

Who Tweets in the United Kingdom? Profiling the Twitter Population Using the British Social Attitudes Survey 2015

Social Media + Society ◽

10.1177/2056305117698981 ◽

2017 ◽

Vol 3 (1) ◽

pp. 205630511769898 ◽

Cited By ~ 12

Author(s):

Luke Sloan

Keyword(s):

Ground Truth ◽

Social Attitudes ◽

Demographic Characteristics ◽

National Statistics ◽

The United Kingdom ◽

Twitter Data ◽

Scientific Analysis ◽

Twitter Users ◽

Disproportionate Number ◽

Social Scientific

The headache any researcher faces while using Twitter data for social scientific analysis is that we do not know who tweets. In this article, we report on results from the British Social Attitudes Survey (BSA) 2015 on Twitter use. We focus on associations between using Twitter and three demographic characteristics—age, sex, and class (defined here as National Statistics Socio-Economic Classification [NS-SEC]). In addition to this, we compare findings from BSA 2015, treated as ground truth (known characteristics), with previous attempts to map the demographic nature of UK Twitter users using computational methods resulting in demographic proxies. Where appropriate, the datasets are compared with UK Census 2011 data to illustrate that Twitter users are not representative of the wider population. We find that there are a disproportionate number of male Twitter users, in relation to both the Census 2011 and previous proxy estimates; that Twitter users are predominantly young, but there are more older users than previously estimated; and that there are strong class effects associated with Twitter use.

Download Full-text

Centre for Health Record Linkage

International Journal for Population Data Science ◽

10.23889/ijpds.v4i2.1142 ◽

2020 ◽

Vol 4 (2) ◽

Cited By ~ 1

Author(s):

Katie Irvine ◽

Rick Hall ◽

Lee Taylor

Keyword(s):

Record Linkage ◽

Linked Data ◽

Best Practice ◽

Data Linkage ◽

New South ◽

Data Driven ◽

Health Record ◽

Data Governance ◽

South Wales ◽

Data Informed

ContextThe Centre for Health Record Linkage (CHeReL) was established in 2006 as a dedicated health and human services data linkage facility for two Australian jurisdictions, New South Wales and the geographically-nested Australian Capital Territory. The two jurisdictions have their own Governments and separate Health and Human Service systems. Purpose and OperationsThe primary purpose of the CHeReL is to make linked administrative and routinely collected healthdata available to researchers and government within relevant regulatory and governance frameworks.The CHeReL’s data governance and technical operations draw on international best practice andhave been refined by learnings from other data linkage centres. OutcomesOver twelve years of operation, more than 2,320 unique investigators from 140 institutions haveused the CHeReL, producing 615 publications in peer-reviewed literature. A robust pipeline of newdevelopment is expected to further amplify the use of linked data for cutting edge medical researchand support a vision of data-informed policy and data-driven government services.

Download Full-text

Just Because the Data Is There, It Doesn’t Mean It’s Yours to Take

Emerging Library & Information Perspectives ◽

10.5206/elip.v4i1.13554 ◽

2021 ◽

Vol 4 (1) ◽

pp. 34-61

Author(s):

Kate McCandless

Keyword(s):

Informed Consent ◽

Literature Review ◽

Research Ethics ◽

Private Information ◽

Public And Private ◽

Back Seat ◽

Twitter Data ◽

Research Ethics Boards ◽

The University ◽

Data Informed

In research conducted using Twitter data, informed consent has taken the back seat. This literature review examines the perspectives of users, researchers and research ethics boards to provide nuance and context to the issue. Users are generally unaware that their data can be taken for research purposes and that they have agreed to be studied within the platform’s terms of service. This is concerning for both researchers and users alike, as it continues to blur the line of public and private information. Users want to be informed when they are being studied. When informed consent is not obtained, researchers are not respecting the data and the humans who created it. If researchers were required to obtain informed consent when engaging with Twitter data, the resulting research would be more ethical and protect everyone involved: the researcher, the user, and the university.

Download Full-text

Ethics in Social Research

10.1093/oso/9780198786580.003.0006 ◽

2018 ◽

Author(s):

Steve Bruce

Keyword(s):

Informed Consent ◽

Data Collection ◽

Research Ethics ◽

Social Research ◽

Medical Interventions ◽

Ethical Implications ◽

Social Researcher ◽

Naturally Occurring ◽

The Social ◽

Collection Phase

It is right that social researchers consider the ethical implications of their work, but discussion of research ethics has been distorted by the primacy of the ‘informed consent’ model for policing medical interventions. It is remarkably rare for the data collection phase of social research to be in any sense harmful, and in most cases seeking consent from, say, members of a church congregation would disrupt the naturally occurring phenomena we wish to study. More relevant is the way we report our research. It is in the disparity between how people would like to see themselves described and explained and how the social researcher describes and explains them that we find the greatest potential for ill-feeling, and even here it is slight.

Download Full-text

Social Scientific Analysis of Nuclear Weapons

Journal of Conflict Resolution ◽

10.1177/0022002717721389 ◽

2017 ◽

Vol 61 (9) ◽

pp. 1853-1874 ◽

Cited By ~ 3

Author(s):

Erik Gartzke ◽

Matthew Kroenig

Keyword(s):

Nuclear Weapons ◽

Scientific Analysis ◽

Social Scientific

Download Full-text

Vasectomy reversal and prostate cancer risk: A multi-centre collaborative demonstration project of the Intentional Population Data Linkage Network

International Journal for Population Data Science ◽

10.23889/ijpds.v3i1.730 ◽

2018 ◽

Vol 3 (1) ◽

Author(s):

James Boyd ◽

Sean Randall ◽

Emma Fuller

Keyword(s):

Prostate Cancer ◽

United Kingdom ◽

Cancer Risk ◽

Linked Data ◽

Data Linkage ◽

Prostate Cancer Risk ◽

Population Data ◽

Demonstration Project ◽

The United Kingdom ◽

Vasectomy Reversal

This first collaborative demonstration project of the International Population Data Linkage Network (IPDLN) has recently been completed. This project collated data from five data linkage centres across Australia, the United Kingdom and Canada to investigate the effect of vasectomy reversal on prostate cancer risk in vasectomized men. We discuss the study and the challenges of organising and analysing multi-centre linked data studies.

Download Full-text

Enhancing Joint Replacement Outcomes Through Registry Linkage with National Health Administrative Data in Australia

International Journal for Population Data Science ◽

10.23889/ijpds.v5i5.1576 ◽

2020 ◽

Vol 5 (5) ◽

Author(s):

Katherine Duszynski ◽

Stephen E Graves ◽

Nicole Pratt ◽

Maria Inacio ◽

Richard De Steiger ◽

...

Keyword(s):

Health Service ◽

Administrative Data ◽

Joint Replacement ◽

Linked Data ◽

Data Linkage ◽

Service Utilisation ◽

Prescription Data ◽

National Data ◽

Joint Replacement Surgery ◽

Replacement Surgery

IntroductionMonitoring of joint replacement (JR) data from the Australian Orthopaedic Association National Joint Replacement Registry (AOANJRR) has reduced revision rates and improved surgical practice. Outcome assessment post-arthroplasty is limited however, to revision (reoperation) surgery and mortality outcomes. The AOANJRR National Data Linkage project seeks to broaden the scope of outcomes investigation in Australia by linking registry and health administrative datasets. Objectives and ApproachUsing linked registry and administrative data, the project seeks to describe and quantify national/regional trends and variation in major complications (infection, dislocation, arthrofibrosis, chronic pain, venous thromboembolism, cardiac events), malignancy and health service utilisation (readmissions, emergency encounters and inpatient rehabilitation) following hip, knee and shoulder joint replacement surgery. Evidence will be generated on how these outcomes are associated with and vary according to patient, surgical, implant, hospital and pharmacological factors. As Australia lacks a national identifier, seven linkage agencies are probabilistically linking AOANJRR hip, knee and shoulder replacement data (1999-2017) with 20 datasets. Datasets include government-subsidised health services, procedural and prescription data. Hospital separations and emergency attendance data from Australia’s eight jurisdictions together with national cancer registry and rehabilitation service data are also planned for linkage. Linked data are maintained in a secure remote access computing environment. ResultsTo date, national Medicare Benefits Schedule, Pharmaceutical Benefits Scheme and the Australian Cancer Database data have been linked with >900,000 AOANJRR patients, representing 607.6 million health service records (1999-2018), 467.7 million prescriptions (2002-2018) and 184,000 cancer records, respectively. Remaining linked data will be available in mid-2020. Some initial summary results across a selected range of studies will be presented. Conclusion / ImplicationsThis national data-linkage program will identify areas for improvement in joint replacement surgery and modifiable risk factors contributing to poor patient outcomes.

Download Full-text

Twenty Years of Data Linkage in The Australian Longitudinal Study on Women’s Health

International Journal for Population Data Science ◽

10.23889/ijpds.v5i5.1500 ◽

2020 ◽

Vol 5 (5) ◽

Author(s):

Colleen Loos ◽

Gita Mishra ◽

Annette Dobson ◽

Leigh Tooth

Keyword(s):

Longitudinal Study ◽

Data Collection ◽

Women’S Health ◽

Women's Health ◽

Linked Data ◽

Data Linkage ◽

Data Access ◽

National Study ◽

Australian Longitudinal Study ◽

Data Collections

IntroductionLinked health record collections, when combined with large longitudinal surveys, are a rich research resource to inform policy development and clinical practice across multiple sectors. Objectives and ApproachThe Australian Longitudinal Study on Women’s Health (ALSWH) is a national study of over 57,000 women in four cohorts. Survey data collection commenced in 1996. Over the past 20 years, ALSWH has also established an extensive data linkage program. The aim of this poster is to provide an overview of ALSWH’s program of regularly up-dated linked data collections for use in parallel with on-going surveys, and to demonstrate how data are made widely available to research collaborators. ResultsALSWH surveys collect information on health conditions, ageing, reproductive characteristics, access to health services, lifestyle, and socio-demographic factors. Regularly updated linked national and state administrative data collections add information on health events, health outcomes, diagnoses, treatments, and patterns of service use. ALSWH’s national linked data collections, include Medicare Benefits Schedule, Pharmaceutical Benefits Scheme, the National Death Index, the Australian Cancer Database, and the National Aged Care Data Collection. State and Territory hospital collections include Admitted Patients, Emergency Department and Perinatal Data. There are also substudies, such as the Mothers and their Children’s Health Study (MatCH), which involves linkage to children’s educational records. ALSWH has an internal Data Access Committee along with systems and protocols to facilitate collaborative multi-sectoral research using de-identified linked data. Conclusion / ImplicationsAs a large scale Australian longitudinal multi-jurisdictional data linkage and sharing program, ALSWH is a useful model for anyone planning similar research.

Download Full-text

Record Linkage Methodology for the Social Data Linkage Environment at Statistics Canada

International Journal for Population Data Science ◽

10.23889/ijpds.v1i1.49 ◽

2017 ◽

Vol 1 (1) ◽

Author(s):

Colin Babyak ◽

Abdelnasser Saidi

Keyword(s):

Computational Complexity ◽

Record Linkage ◽

False Positive ◽

Linked Data ◽

Data Linkage ◽

Social Data ◽

Probabilistic Record Linkage ◽

The Social ◽

Data Source ◽

Statistics Canada

ABSTRACTObjectivesThe objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment.ApproachWe will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis.ResultsAlthough Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome.ConclusionThe combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data.

Download Full-text

Case Study for Stroke: National Stroke Data Linkage Program

International Journal for Population Data Science ◽

10.23889/ijpds.v5i5.1496 ◽

2020 ◽

Vol 5 (5) ◽

Author(s):

Monique F Kilkenny ◽

Joosup Kim ◽

Lachlan Dalli ◽

Amminadab Eliakundu ◽

Muideen Olaiya

Keyword(s):

Linked Data ◽

Data Linkage ◽

Stroke Care ◽

Aged Care ◽

Public Hospitals ◽

Economic Evaluations ◽

Care Services ◽

Linkage Program ◽

The Impact

IntroductionStroke is a leading cause of death and disability. Since 2012, our innovative national data linkage program, has enabled the successful linkage of data from the Australian Stroke Clinical Registry (AuSCR) with national and state-based datasets to investigate the continuum of stroke care and associated outcomes. Objectives and ApproachUsing stroke as a case study, in this symposium we will describe the use of linked data to undertake clinical and economic evaluations and contribute new knowledge for policy and practice. We have undertaken a range of iterative and innovative projects linking the AuSCR (used now in >80 public hospitals across Australia with follow-up survey of patients between 90-180 days) with various administrative datasets. Linkages with the National Death Index, inpatient admissions and emergency presentations, Pharmaceutical Benefits Scheme (PBS), Medicare Benefits Schedule (MBS), Aged Care services; Ambulance Victoria, Australian Rehabilitation Outcomes Centre and general practice network datasets (POLAR) have been achieved. ResultsThe symposium will provide case studies and results from four data linkage projects involving the AuSCR: 1) Stroke123 (NHMRC: #1034415), a study to investigate the impact of quality of acute care on admission/emergency presentations and survival; 2) PRECISE (NHMRC:#1141848), a study to evaluate models of primary care involving linkages with PBS/MBS, aged care services and admissions/emergency data; 3) AMBULANCE: a study to investigate how pre-hospital care affects acute stroke care involving linkages with the ambulance and admissions/emergency datasets; and 4) POLAR: a study to understand the long-term management of stroke involving linkages with primary health data. Conclusion / ImplicationsThe National Stroke Data Linkage Program has been visionary and remains highly contemporary in the field of linked data. A unique feature of this program is the active participation of clinicians and policy-makers to ensure the evidence generated have direct benefits for accelerating change in practice and informing policy.

Download Full-text