Research note: Examining potential bias in large-scale censored data

Mapping Intimacies ◽

10.37016/mr-2020-74 ◽

2021 ◽

Author(s):

Jennifer Allen ◽

Markus Mobius ◽

David M. Rothschild ◽

Duncan J. Watts

Keyword(s):

Censored Data ◽

Large Scale ◽

Research Note ◽

Potential Bias ◽

Fake News ◽

User Privacy ◽

Public Share

We examine potential bias in Facebook’s 10-trillion cell URLs dataset, consisting of URLs shared on its platform and their engagement metrics. Despite the unprecedented size of the dataset, it was altered to protect user privacy in two ways: 1) by adding differentially private noise to engagement counts, and 2) by censoring the data with a 100-public-share threshold for a URL’s inclusion. To understand how these alterations affect conclusions drawn from the data, we estimate the preva-lence of fake news in the massive, censored URLs dataset and compare it to an estimate from a smaller, representative dataset. We show that censoring can substantially alter conclusions that are drawn from the Facebook dataset. Because of this 100-public-share threshold, descriptive statis-tics from the Facebook URLs dataset overestimate the share of fake news and news overall by as much as 4X. We conclude with more general implications for censoring data.

Download Full-text

Experiences With and Lessons Learned From Developing, Implementing, and Evaluating a Support Program for Older Hearing Aid Users and Their Communication Partners in the Hearing Aid Dispensing Setting

American Journal of Audiology ◽

10.1044/2020_aja-19-00072 ◽

2020 ◽

Vol 29 (3S) ◽

pp. 638-647 ◽

Cited By ~ 2

Author(s):

Janine F. J. Meijerink ◽

Marieke Pronk ◽

Sophia E. Kramer

Keyword(s):

Large Scale ◽

Hearing Aid ◽

Critical Discussion ◽

Lessons Learned ◽

Research Note ◽

Support Program ◽

Large Sample Size ◽

Long Term Effects ◽

Communication Program ◽

Communication Partners

Purpose The SUpport PRogram (SUPR) study was carried out in the context of a private academic partnership and is the first study to evaluate the long-term effects of a communication program (SUPR) for older hearing aid users and their communication partners on a large scale in a hearing aid dispensing setting. The purpose of this research note is to reflect on the lessons that we learned during the different development, implementation, and evaluation phases of the SUPR project. Procedure This research note describes the procedures that were followed during the different phases of the SUPR project and provides a critical discussion to describe the strengths and weaknesses of the approach taken. Conclusion This research note might provide researchers and intervention developers with useful insights as to how aural rehabilitation interventions, such as the SUPR, can be developed by incorporating the needs of the different stakeholders, evaluated by using a robust research design (including a large sample size and a longer term follow-up assessment), and implemented widely by collaborating with a private partner (hearing aid dispensing practice chain).

Download Full-text

Research note: The scale of Facebook’s problem depends upon how ‘fake news’ is classified

10.37016/mr-2020-43 ◽

2020 ◽

Author(s):

Richard Rogers

Keyword(s):

Presidential Elections ◽

Mass Scale ◽

Research Note ◽

User Engagement ◽

Fake News ◽

Scale Problem ◽

Media Organizations ◽

False News

Ushering in the contemporary ‘fake news’ crisis, Craig Silverman of Buzzfeed News reported that it outperformed mainstream news on Facebook in the three months prior to the 2016 US presidential elections. Here the report’s methods and findings are revisited for 2020. Examining Facebook user engagement of election-related stories, and applying Silverman’s classification of fake news, it was found that the problem has worsened, implying that the measures undertaken to date have not remedied the issue. If, however, one were to classify ‘fake news’ in a stricter fashion, as Facebook as well as certain media organizations do with the notion of ‘false news’, the scale of the problem shrinks. A smaller scale problem could imply a greater role for fact-checkers (rather than deferring to mass-scale content moderation), while a larger one could lead to the further politicisation of source adjudication, where labelling particular sources broadly as ‘fake’, ‘problematic’ and/or ‘junk’ results in backlash.

Download Full-text

Temporal and cultural limits of privacy in smartphone app usage

Scientific Reports ◽

10.1038/s41598-021-82294-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Vedran Sekara ◽

Laura Alessandretti ◽

Enys Mones ◽

Håkan Jonsson

Keyword(s):

Large Scale ◽

Public Information ◽

Large Datasets ◽

Smartphone App ◽

User Privacy ◽

Privacy Concerns ◽

Cultural Variations ◽

Simple Strategy ◽

Country Specific ◽

Usage Data

AbstractLarge-scale collection of human behavioural data by companies raises serious privacy concerns. We show that behaviour captured in the form of application usage data collected from smartphones is highly unique even in large datasets encompassing millions of individuals. This makes behaviour-based re-identification of users across datasets possible. We study 12 months of data from 3.5 million people from 33 countries and show that although four apps are enough to uniquely re-identify 91.2% of individuals using a simple strategy based on public information, there are considerable seasonal and cultural variations in re-identification rates. We find that people have more unique app-fingerprints during summer months making it easier to re-identify them. Further, we find significant variations in uniqueness across countries, and reveal that American users are the easiest to re-identify, while Finns have the least unique app-fingerprints. We show that differences across countries can largely be explained by two characteristics of the country specific app-ecosystems: the popularity distribution and the size of app-fingerprints. Our work highlights problems with current policies intended to protect user privacy and emphasizes that policies cannot directly be ported between countries. We anticipate this will nuance the discussion around re-identifiability in digital datasets and improve digital privacy.

Download Full-text

Testing an Online Digital Literacy Intervention to Improve the Ability to Spot Fake News: Evidence from a Large-Scale RCT in India

AEA Randomized Controlled Trials ◽

10.1257/rct.6886 ◽

2020 ◽

Author(s):

Prashant Loyalka

Keyword(s):

Digital Literacy ◽

Large Scale ◽

Literacy Intervention ◽

Fake News

Download Full-text

Efficient Authentication for Internet of Things Devices in Information Management Systems

Wireless Communications and Mobile Computing ◽

10.1155/2021/9921036 ◽

2021 ◽

Vol 2021 ◽

pp. 1-14

Author(s):

Xiaofeng Wu ◽

Fangyuan Ren ◽

Yiming Li ◽

Zhenwei Chen ◽

Xiaoling Tao

Keyword(s):

Internet Of Things ◽

Information Management ◽

Management System ◽

Large Scale ◽

Rapid Development ◽

Mutual Authentication ◽

Information Management System ◽

Management Systems ◽

User Privacy ◽

Information Management Systems

With the rapid development of the Internet of Things (IoT) technology, it has been widely used in various fields. IoT device as an information collection unit can be built into an information management system with an information processing and storage unit composed of multiple servers. However, a large amount of sensitive data contained in IoT devices is transmitted in the system under the actual wireless network environment will cause a series of security issues and will become inefficient in the scenario where a large number of devices are concurrently accessed. If each device is individually authenticated, the authentication overhead is huge, and the network burden is excessive. Aiming at these problems, we propose a protocol that is efficient authentication for Internet of Things devices in information management systems. In the proposed scheme, aggregated certificateless signcryption is used to complete mutual authentication and encrypted transmission of data, and a cloud server is introduced to ensure service continuity and stability. This scheme is suitable for scenarios where large-scale IoT terminal devices are simultaneously connected to the information management system. It not only reduces the authentication overhead but also ensures the user privacy and data integrity. Through the experimental results and security analysis, it is indicated that the proposed scheme is suitable for information management systems.

Download Full-text

Aspects of Discourse Stream Analysis

Global Language Review ◽

10.31703/glr.2019(iv-ii).01 ◽

2019 ◽

Vol IV (II) ◽

pp. 1-6

Author(s):

Mark Perkins

Keyword(s):

Big Data ◽

Real Time ◽

Text Analysis ◽

Large Scale ◽

Relative Speed ◽

Fake News ◽

New Techniques ◽

Subject Orientation ◽

Theory Of Discourse ◽

Theoretical Developments

The huge proliferation of textual (and other data) in digital and organisational sources has led to new techniques of text analysis. The potential thereby unleashed may be underpinned by further theoretical developments to the theory of Discourse Stream Analysis (DSA) as presented here. These include the notion of change in the discourse stream in terms of discourse stream fronts, linguistic elements evolving in real time, and notions of time itself in terms of relative speed, subject orientation and perception. Big data has also given rise to fake news, the manipulation of messages on a large scale. Fake news is conveyed in fake discourse streams and has led to a new field of description and analysis.

Download Full-text

IFND: a benchmark dataset for fake news detection

Complex & Intelligent Systems ◽

10.1007/s40747-021-00552-1 ◽

2021 ◽

Author(s):

Dilip Kumar Sharma ◽

Sonal Garg

Keyword(s):

Large Scale ◽

Latent Dirichlet Allocation ◽

Prediction Models ◽

Benchmark Dataset ◽

Fake News ◽

Text And Image ◽

People Detection ◽

Digital Platforms ◽

Augmentation Algorithm ◽

Large Scale Dataset

AbstractSpotting fake news is a critical problem nowadays. Social media are responsible for propagating fake news. Fake news propagated over digital platforms generates confusion as well as induce biased perspectives in people. Detection of misinformation over the digital platform is essential to mitigate its adverse impact. Many approaches have been implemented in recent years. Despite the productive work, fake news identification poses many challenges due to the lack of a comprehensive publicly available benchmark dataset. There is no large-scale dataset that consists of Indian news only. So, this paper presents IFND (Indian fake news dataset) dataset. The dataset consists of both text and images. The majority of the content in the dataset is about events from the year 2013 to the year 2021. Dataset content is scrapped using the Parsehub tool. To increase the size of the fake news in the dataset, an intelligent augmentation algorithm is used. An intelligent augmentation algorithm generates meaningful fake news statements. The latent Dirichlet allocation (LDA) technique is employed for topic modelling to assign the categories to news statements. Various machine learning and deep-learning classifiers are implemented on text and image modality to observe the proposed IFND dataset's performance. A multi-modal approach is also proposed, which considers both textual and visual features for fake news detection. The proposed IFND dataset achieved satisfactory results. This study affirms that the accessibility of such a huge dataset can actuate research in this laborious exploration issue and lead to better prediction models.

Download Full-text

A tutorial on rank-based coefficient estimation for censored data in small- and large-scale problems

Statistics and Computing ◽

10.1007/s11222-012-9333-9 ◽

2012 ◽

Vol 23 (5) ◽

pp. 601-614 ◽

Cited By ~ 10

Author(s):

Matthias Chung ◽

Qi Long ◽

Brent A. Johnson

Keyword(s):

Censored Data ◽

Large Scale ◽

Coefficient Estimation ◽

Large Scale Problems

Download Full-text

Preserving User Privacy Through Ephemeral Sharing Design: A Large-Scale Randomized Field Experiment in the Online Dating Context

SSRN Electronic Journal ◽

10.2139/ssrn.3740782 ◽

2020 ◽

Author(s):

Yumei He ◽

Xingchen Xu ◽

Ni Huang ◽

Yili Hong ◽

De Liu

Keyword(s):

Field Experiment ◽

Large Scale ◽

Online Dating ◽

User Privacy ◽

Randomized Field Experiment

Download Full-text

Be Wary of Those Who Ask: A Randomized Experiment on the Size and Determinants of the Enumerator Effect

The World Bank Economic Review ◽

10.1093/wber/lhy024 ◽

2019 ◽

Vol 34 (3) ◽

pp. 654-669 ◽

Cited By ~ 1

Author(s):

Michele Di Maio ◽

Nathan Fiala

Keyword(s):

Data Collection ◽

Survey Data ◽

Large Scale ◽

Randomized Experiment ◽

Potential Bias ◽

Political Preference ◽

Respondent Characteristics ◽

Scale Experiment ◽

Data Collections

Abstract During survey data collection, respondents’ answers may be influenced by the behavior and characteristics of the enumerator, the so-called enumerator effect. Using a large-scale experiment in Uganda in which the study randomly pairs enumerators and respondents, the study explores for which types of questions the enumerator effect may exist. It is found that the enumerator effect is minimal in many questions, but is large for political preference questions, for which it can account for over 30 percent of the variation in responses. The study then explores which enumerator characteristics, and which of their combination with respondent characteristics, could account for this effect. Finally, the conclusion provides some practical suggestions on how to minimize enumerator effects, and potential bias, in various types of data collections.

Download Full-text