scholarly journals On the Privacy and Utility Properties of Triple Matrix-Masking

2020 ◽  
Vol 10 (2) ◽  
Author(s):  
Aidong Adam Ding ◽  
Guanhong Miao ◽  
Samuel Shangwu Wu

Privacy protection is an important requirement in many statistical studies. A recently proposed data collection method, triple matrix-masking, retains exact summary statistics without exposing the raw data at any point in the process. In this paper, we provide theoretical formulation and proofs showing that a modified version of the procedure is strong collection obfuscating: no party in the data collection process is able to gain knowledge of the individual level data, even with some partially masked data information in addition to the publicly published data. This provides a theoretical foundation for the usage of such a procedure to collect masked data that allows exact statistical inference for linear models, while preserving a well-defined notion of privacy protection for each individual participant in the study. This paper fits into a line of work tackling the problem of how to create useful synthetic data without having a trustworthy data aggregator. We achieve this by splitting the trust between two parties, the ``"masking service provider" and the ``"data collector."

Author(s):  
Yan Wang ◽  
Feng Hao ◽  
Yunxia Liu

Population change and environmental degradation have become two of the most pressing issues for sustainable development in the contemporary world, while the effect of population aging on pro-environmental behavior remains controversial. In this paper, we examine the effects of individual and population aging on pro-environmental behavior through multilevel analyses of cross-national data from 31 countries. Hierarchical linear models with random intercepts are employed to analyze the data. The findings reveal a positive relationship between aging and pro-environmental behavior. At the individual level, older people are more likely to participate in environmental behavior (b = 0.052, p < 0.001), and at the national level, living in a country with a greater share of older persons encourages individuals to behave sustainably (b = 0.023, p < 0.01). We also found that the elderly are more environmentally active in an aging society. The findings imply that the longevity of human beings may offer opportunities for the improvement of the natural environment.


2021 ◽  
pp. 003329412110268
Author(s):  
Jaime Ballard ◽  
Adeya Richmond ◽  
Suzanne van den Hoogenhof ◽  
Lynne Borden ◽  
Daniel Francis Perkins

Background Multilevel data can be missing at the individual level or at a nested level, such as family, classroom, or program site. Increased knowledge of higher-level missing data is necessary to develop evaluation design and statistical methods to address it. Methods Participants included 9,514 individuals participating in 47 youth and family programs nationwide who completed multiple self-report measures before and after program participation. Data were marked as missing or not missing at the item, scale, and wave levels for both individuals and program sites. Results Site-level missing data represented a substantial portion of missing data, ranging from 0–46% of missing data at pre-test and 35–71% of missing data at post-test. Youth were the most likely to be missing data, although site-level data did not differ by the age of participants served. In this dataset youth had the most surveys to complete, so their missing data could be due to survey fatigue. Conclusions Much of the missing data for individuals can be explained by the site not administering those questions or scales. These results suggest a need for statistical methods that account for site-level missing data, and for research design methods to reduce the prevalence of site-level missing data or reduce its impact. Researchers can generate buy-in with sites during the community collaboration stage, assessing problematic items for revision or removal and need for ongoing site support, particularly at post-test. We recommend that researchers conducting multilevel data report the amount and mechanism of missing data at each level.


2020 ◽  
Author(s):  
Xing Zhao ◽  
Feng Hong ◽  
Jianzhong Yin ◽  
Wenge Tang ◽  
Gang Zhang ◽  
...  

AbstractCohort purposeThe China Multi-Ethnic Cohort (CMEC) is a community population-based prospective observational study aiming to address the urgent need for understanding NCD prevalence, risk factors and associated conditions in resource-constrained settings for ethnic minorities in China.Cohort BasicsA total of 99 556 participants aged 30 to 79 years (Tibetan populations include those aged 18 to 30 years) from the Tibetan, Yi, Miao, Bai, Bouyei, and Dong ethnic groups in Southwest China were recruited between May 2018 and September 2019.Follow-up and attritionAll surviving study participants will be invited for re-interviews every 3-5 years with concise questionnaires to review risk exposures and disease incidence. Furthermore, the vital status of study participants will be followed up through linkage with established electronic disease registries annually.Design and MeasuresThe CMEC baseline survey collected data with an electronic questionnaire and face-to-face interviews, medical examinations and clinical laboratory tests. Furthermore, we collected biological specimens, including blood, saliva and stool, for long-term storage. In addition to the individual level data, we also collected regional level data for each investigation site.Collaboration and data accessCollaborations are welcome. Please send specific ideas to corresponding author at: [email protected].


2021 ◽  
Vol 4 ◽  
Author(s):  
Michael Platzer ◽  
Thomas Reutterer

AI-based data synthesis has seen rapid progress over the last several years and is increasingly recognized for its promise to enable privacy-respecting high-fidelity data sharing. This is reflected by the growing availability of both commercial and open-sourced software solutions for synthesizing private data. However, despite these recent advances, adequately evaluating the quality of generated synthetic datasets is still an open challenge. We aim to close this gap and introduce a novel holdout-based empirical assessment framework for quantifying the fidelity as well as the privacy risk of synthetic data solutions for mixed-type tabular data. Measuring fidelity is based on statistical distances of lower-dimensional marginal distributions, which provide a model-free and easy-to-communicate empirical metric for the representativeness of a synthetic dataset. Privacy risk is assessed by calculating the individual-level distances to closest record with respect to the training data. By showing that the synthetic samples are just as close to the training as to the holdout data, we yield strong evidence that the synthesizer indeed learned to generalize patterns and is independent of individual training records. We empirically demonstrate the presented framework for seven distinct synthetic data solutions across four mixed-type datasets and compare these then to traditional data perturbation techniques. Both a Python-based implementation of the proposed metrics and the demonstration study setup is made available open-source. The results highlight the need to systematically assess the fidelity just as well as the privacy of these emerging class of synthetic data generators.


Author(s):  
Sarah Lowe ◽  
Laura McGinn ◽  
Marcos Quintela ◽  
Luke Player ◽  
Karen Tingay

BackgroundFlying Start (FS) is the Welsh Government’s (WG) flagship Early Years programme for families with children aged less than 4 years of age. Running since 2006, the four entitlements are: Free part-time childcare for 2-3 year olds Enhanced Health Visiting Parenting support Speech, language, and communication support ObjectivesCurrently, while we know which areas in Wales are receiving FS support, individual-level data on which child received what entitlements is not available. Area-level outcomes can be used as proxy indicators but the individual impact of receiving FS support cannot be examined.The project aims to evaluate FS by linking the FS cohort to a range of outcomes including health, education and social care. MethodsA Dataflow Development Project (DDP) has been launched to install SAIL (Secure Anonymised Information Linkage) appliances into 6 pilot Local Authorities in Wales which will test acquiring and linking the individual level FS data from pilot Local Authorities with other datasets in SAIL. FindingsThe project will report some emerging findings from the analysis of pilot data. ImplicationsThere is a growing interest in using linked administrative data to evaluate government initiatives, and mounting enthusiasm in Local Government. If successful, this model is likely to be adopted by related WG programmes; improving the evidence base, facilitating effective evaluation, and adding to the data available for re-use in Wales.


2017 ◽  
Vol 59 (7/8) ◽  
pp. 856-870 ◽  
Author(s):  
Soodeh Mohammadinezhad ◽  
Maryam Sharifzadeh

Purpose The purpose of this paper is to investigate the importance of academic courses on agricultural entrepreneurship. Design/methodology/approach Modified global entrepreneurship and development index (GEDI) was used to determine entrepreneurial dimensions among 19 graduated students of agricultural colleges resided in Iran. Fuzzy analytical hierarchy process was applied to understand agricultural graduates’ preferences on effectiveness of university courses (core, free elective and restricted elective). Findings Results suggested the importance of professional restricted elective courses to provide students with necessary skills. These courses were successful in providing a context for entrepreneurial profile. Research limitations/implications Innate talent or acquired skills were always the place of debate on entrepreneurial development. The paper builds on the premise that entrepreneurs are made through education and continuing reconstruction of experience, further research is required as the field develops in experience and complexity. Practical implications The paper provides strategies to effectively modify practical route in higher education to enhance entrepreneurial orientation among students. Originality/value The paper is innovative at a conceptual level in modifying GEDI elements in individual-level variables based on GEDI configuration theory. This approach is particularly useful in addressing the bottleneck problems of entrepreneurship profile and focusses on the information interpreted at weights of the individual-level data.


1987 ◽  
Vol 20 (1) ◽  
pp. 3-33 ◽  
Author(s):  
JOHN R. HIBBING

This is an analysis of the effects of economic factors on voting behavior in the United Kingdom. Aggregate- and individual-level data are used. When the results are compared to findings generated by the United States case, some intriguing differences appear. To mention just two examples, unemployment and inflation seem to be much more important in the United Kingdom than in the United States, and changes in real per capita income are positively related to election results in the United States and negatively related in the United Kingdom. More generally, while the aggregate results are strong and the individual-level results weak in the United States, in the United Kingdom the situation is practically reversed.


2018 ◽  
Vol 47 (4) ◽  
pp. 428-438 ◽  
Author(s):  
Kim Bloomfield ◽  
Gabriele Berg-Beckhoff ◽  
Abdu Kedir Seid ◽  
Christiane Stock

Aims: Greater area-level relative deprivation has been related to poorer health behaviours, but studies specifically on alcohol use and abuse have been equivocal. The main purpose of the present study was to investigate how area-level relative deprivation in Denmark relates to alcohol use and misuse in the country. Methods: As individual-level data, we used the national alcohol and drug survey of 2011 ( n= 5133). Data were procured from Statistics Denmark to construct an index of relative deprivation at the parish level ( n=2119). The deprivation index has two components, which were divided into quintiles. Multilevel linear and logistic regressions analysed the influence of area deprivation on mean alcohol use and hazardous drinking, as measured by the Alcohol Use Disorder Identification Test. Results: Men who lived in parishes designated as ‘very deprived’ on the socioeconomic component were more likely to consume less alcohol; women who lived in parishes designated as ‘deprived’ on the housing component were less likely to drink hazardously. But at the individual level, education was positively related to mean alcohol consumption, and higher individual income was positively related to mean consumption for women. Higher-educated men were more likely to drink hazardously. Conclusions: Area-level measures of relative deprivation were not strongly related to alcohol use, yet in the same models individual-level socioeconomic variables had a more noticeable influence. This suggests that in a stronger welfare state, the impact of area-level relative deprivation may not be as great. Further work is needed to develop more sensitive measures of relative deprivation.


2019 ◽  
Vol 10 (1) ◽  
pp. 20190048 ◽  
Author(s):  
Wasiur R. KhudaBukhsh ◽  
Boseung Choi ◽  
Eben Kenah ◽  
Grzegorz A. Rempała

In this paper, we show that solutions to ordinary differential equations describing the large-population limits of Markovian stochastic epidemic models can be interpreted as survival or cumulative hazard functions when analysing data on individuals sampled from the population. We refer to the individual-level survival and hazard functions derived from population-level equations as a survival dynamical system (SDS). To illustrate how population-level dynamics imply probability laws for individual-level infection and recovery times that can be used for statistical inference, we show numerical examples based on synthetic data. In these examples, we show that an SDS analysis compares favourably with a complete-data maximum-likelihood analysis. Finally, we use the SDS approach to analyse data from a 2009 influenza A(H1N1) outbreak at Washington State University.


1990 ◽  
Vol 15 (1) ◽  
pp. 9-38 ◽  
Author(s):  
Albert E. Beaton ◽  
Eugene G. Johnson

The average response method (ARM) of scaling nonbinary data was developed to scale the data from the assessments of writing conducted by the National Assessment of Educational Progress (NAEP). The ARM applies linear models and multiple imputations technologies to characterize the predictive distribution of the person-level average of ratings over a pool of exercises when each person has responded to only a few of the exercises. The derivations of “plausible values” from the individual-level distributions of potential scale scores are given. Conditions are provided for the unbiasedness of estimates based on the plausible values, and the potential magnitude of the bias when the conditions are not met is indicated. Also discussed is how the plausible values allow for an accounting of the uncertainties due to the sampling of individuals and to the incomplete information on each sampled individual. The technique is illustrated using data from the assessment of writing.


Sign in / Sign up

Export Citation Format

Share Document