scholarly journals Addressing Beacon re-identification attacks: quantification and mitigation of privacy risks

2017 ◽  
Vol 24 (4) ◽  
pp. 799-805 ◽  
Author(s):  
Jean Louis Raisaro ◽  
Florian Tramèr ◽  
Zhanglong Ji ◽  
Diyue Bu ◽  
Yongan Zhao ◽  
...  

Abstract The Global Alliance for Genomics and Health (GA4GH) created the Beacon Project as a means of testing the willingness of data holders to share genetic data in the simplest technical context—a query for the presence of a specified nucleotide at a given position within a chromosome. Each participating site (or “beacon”) is responsible for assuring that genomic data are exposed through the Beacon service only with the permission of the individual to whom the data pertains and in accordance with the GA4GH policy and standards. While recognizing the inference risks associated with large-scale data aggregation, and the fact that some beacons contain sensitive phenotypic associations that increase privacy risk, the GA4GH adjudged the risk of re-identification based on the binary yes/no allele-presence query responses as acceptable. However, recent work demonstrated that, given a beacon with specific characteristics (including relatively small sample size and an adversary who possesses an individual’s whole genome sequence), the individual’s membership in a beacon can be inferred through repeated queries for variants present in the individual’s genome. In this paper, we propose three practical strategies for reducing re-identification risks in beacons. The first two strategies manipulate the beacon such that the presence of rare alleles is obscured; the third strategy budgets the number of accesses per user for each individual genome. Using a beacon containing data from the 1000 Genomes Project, we demonstrate that the proposed strategies can effectively reduce re-identification risk in beacon-like datasets.

2020 ◽  
Vol 33 (3-4) ◽  
pp. 160-174 ◽  
Author(s):  
Jacy L. Young

In the late 19th century, the questionnaire was one means of taking the case study into the multitudes. This article engages with Forrester’s idea of thinking in cases as a means of interrogating questionnaire-based research in early American psychology. Questionnaire research was explicitly framed by psychologists as a practice involving both natural historical and statistical forms of scientific reasoning. At the same time, questionnaire projects failed to successfully enact the latter aspiration in terms of synthesizing masses of collected data into a coherent whole. Difficulties in managing the scores of descriptive information questionnaires generated ensured the continuing presence of individuals in the results of this research, as the individual case was excerpted and discussed alongside a cast of others. As a consequence, questionnaire research embodied an amalgam of case, natural historical, and statistical thinking. Ultimately, large-scale data collection undertaken with questionnaires failed in its aim to construct composite exemplars or ‘types’ of particular kinds of individuals; to produce the singular from the multitudes.


2021 ◽  
Author(s):  
Angeline Tsui ◽  
Virginia A. Marchman ◽  
Michael C. Frank

Young children typically begin learning words during their first two years of life. On the other hand, they also vary substantially in their language learning. Similarities and differences in language learning call for a quantitative theory that can predict and explain which aspects of early language are consistent and which are variable. However, current developmental research practices limit our ability to build such quantitative theories because of small sample sizes and challenges related to reproducibility and replicability. In this chapter, we suggest that three approaches – meta-analysis, multi-site collaborations, and secondary data aggregation – can together address some of the limitations of current research in the developmental area. We review the strengths and limitations of each approach and end by discussing the potential impacts of combining these three approaches.


2020 ◽  
Vol 8 (3) ◽  
pp. 305-319 ◽  
Author(s):  
Dániel Hegedűs

The web 2.0 phenomenon and social media – without question – have reshaped our everyday experiences. These changes that they have generated affect how we consume, communicate and present ourselves, just to name a few aspects of life, and moreover, opened up new perspectives for sociology. Though many social practices persist in a somewhat altered form, brand new types of entities have emerged on different social media platforms: one of them is the video blogger. These actors have gained great visibility through so-called micro-celebrity practices and have become potential large-scale distributors of ideas, values and knowledge. Celebrities, in this case micro-celebrities (video bloggers), may disseminate such cognitive patterns through their constructed discourse which is objectified in the online space through a peculiar digital face (a social media profile) where fans can react, share and comment according to the affordances of the digital space. Most importantly, all of these interactions are accessible for scholars to examine the fan and celebrity practices of our era. This research attempts to reconstruct these discursive interactions on the Facebook pages of ten top Hungarian video bloggers. All findings are based on a large-scale data collection using the Netvizz application. As part of the interpretation of the results, a further consideration was that celebrity discourses may be a sort of disciplinary force in (post)modern society, which normalizes the individual to some extent by providing adequate schemas of attitude, mentality and ways of consumption.


2016 ◽  
Vol 55 (03) ◽  
pp. 284-291
Author(s):  
Junghyun Park ◽  
Seokjoon Yoon ◽  
Minki Kim

SummaryBackground: Sophisticated anti-fraud systems for the healthcare sector have been built based on several statistical methods. Although existing methods have been developed to detect fraud in the healthcare sector, these algorithms consume considerable time and cost, and lack a theoretical basis to handle large-scale data.Objectives: Based on mathematical theory, this study proposes a new approach to using Benford’s Law in that we closely examined the individual-level data to identify specific fees for in-depth analysis.Methods: We extended the mathematical theory to demonstrate the manner in which large-scale data conform to Benford’s Law. Then, we empirically tested its applicability using actual large-scale healthcare data from Korea’s Health Insurance Review and Assessment (HIRA) National Patient Sample (NPS). For Benford’s Law, we considered the mean absolute deviation (MAD) formula to test the large-scale data.Results: We conducted our study on 32 diseases, comprising 25 representative diseases and 7 DRG-regulated diseases. We performed an empirical test on 25 diseases, showing the applicability of Benford’s Law to large-scale data in the healthcare industry. For the seven DRG-regulated diseases, we examined the individual-level data to identify specific fees to carry out an in-depth analysis. Among the eight categories of medical costs, we considered the strength of certain irregularities based on the details of each DRG-regulated disease.Conclusions: Using the degree of abnormality, we propose priority action to be taken by government health departments and private insurance institutions to bring unnecessary medical expenses under control. However, when we detect deviations from Benford’s Law, relatively high contamination ratios are required at conventional significance levels.


Parasitology ◽  
2005 ◽  
Vol 132 (3) ◽  
pp. 331-338 ◽  
Author(s):  
L. K. SILVA ◽  
S. LIU ◽  
R. E. BLANTON

Human parasites are often distributed in metapopulations, which makes random sampling for genetic epidemiology difficult. The typical approach to sampling Schistosoma mansoni involves laboratory passage to obtain individual worms with small sample size and selection bias as a consequence. By contrast, the naturally pooled samples from egg output in stool or urine directly represent the genetic composition of current populations. To test whether pooled samples could be used to estimate population allele frequencies, DNA from individual cloned parasites was pooled and amplified by PCR for 7 microsatellites. By polyacrylamide gel analysis, the relative band intensities of the products from the major alleles in the pooled samples differed by 0–6% from the summed intensities of the individual clones (mean=2·1%±2·1% S.D.). The number of PCR cycles (25–40) did not influence the accuracy of the estimate. Varying the frequency of 1 allele in pooled samples from 32 to 69% likewise did not affect accuracy. Allele frequency estimates from aggregate samples such as eggs will be a better foundation for studies of parasite population dynamics as well as the basis for large-scale association studies of host and parasite characteristics.


2013 ◽  
Vol 26 (20) ◽  
pp. 7957-7965 ◽  
Author(s):  
Timothy DelSole ◽  
Liwei Jia ◽  
Michael K. Tippett

Abstract This paper proposes a new approach to linearly combining multimodel forecasts, called scale-selective ridge regression, which ensures that the weighting coefficients satisfy certain smoothness constraints. The smoothness constraint reflects the “prior assumption” that seasonally predictable patterns tend to be large scale. In the absence of a smoothness constraint, regression methods typically produce noisy weights and hence noisy predictions. Constraining the weights to be smooth ensures that the multimodel combination is no less smooth than the individual model forecasts. The proposed method is equivalent to minimizing a cost function comprising the familiar mean square error plus a “penalty function” that penalizes weights with large spatial gradients. The method reduces to pointwise ridge regression for a suitable choice of constraint. The method is tested using the Ensemble-Based Predictions of Climate Changes and Their Impacts (ENSEMBLES) hindcast dataset during 1960–2005. The cross-validated skill of the proposed forecast method is shown to be larger than the skill of either ordinary least squares or pointwise ridge regression, although the significance of this difference is difficult to test owing to the small sample size. The model weights derived from the method are much smoother than those obtained from ordinary least squares or pointwise ridge regression. Interestingly, regressions in which the weights are completely independent of space give comparable overall skill. The scale-selective ridge is numerically more intensive than pointwise methods since the solution requires solving equations that couple all grid points together.


2008 ◽  
Vol 31 (4) ◽  
pp. 19
Author(s):  
I Pasic ◽  
A Shlien ◽  
A Novokmet ◽  
C Zhang ◽  
U Tabori ◽  
...  

Introduction: OS, a common Li-Fraumeni syndrome (LFS)-associated neoplasm, is a common bone malignancy of children and adolescents. Sporadic OS is also characterized by young age of onset and high genomic instability, suggesting a genetic contribution to disease. This study examined the contribution of novel DNA structural variation elements, CNVs, to OS susceptibility. Given our finding of excessive constitutional DNA CNV in LFS patients, which often coincide with cancer-related genes, we hypothesized that constitutional CNV may also provide clues about the aetiology of LFS-related sporadic neoplasms like OS. Methods: CNV in blood DNA of 26 patients with sporadic OS was compared to that of 263 normal control samples from the International HapMap project, as well as 62 local controls. Analysis was performed on DNA hybridized to Affymetrix genome-wide human SNP array 6.0 by Partek Genomic Suite. Results: There was no detectable difference in average number of CNVs, CNV length, and total structural variation (product of average CNV number and length) between individuals with OS and controls. While this data is preliminary (small sample size), it argues against the presence of constitutional genomic instability in individuals with sporadic OS. Conclusion: We found that the majority of tumours from patients with sporadic OS show CN loss at chr3q13.31, raising the possibility that chr3q13.31 may represent a “driver” region in OS aetiology. In at least one OS tumour, which displays CN loss at chr3q13.31, we demonstrate decreased expression of a known tumour suppressor gene located at chr3q13.31. We are investigating the role ofchr3q13.31 in development of OS.


2009 ◽  
Vol 28 (11) ◽  
pp. 2737-2740
Author(s):  
Xiao ZHANG ◽  
Shan WANG ◽  
Na LIAN

Sign in / Sign up

Export Citation Format

Share Document