Improved Differentially Private Analysis of Variance

Abstract Hypothesis testing is one of the most common types of data analysis and forms the backbone of scientific research in many disciplines. Analysis of variance (ANOVA) in particular is used to detect dependence between a categorical and a numerical variable. Here we show how one can carry out this hypothesis test under the restrictions of differential privacy. We show that the F -statistic, the optimal test statistic in the public setting, is no longer optimal in the private setting, and we develop a new test statistic F1 with much higher statistical power. We show how to rigorously compute a reference distribution for the F1 statistic and give an algorithm that outputs accurate p-values. We implement our test and experimentally optimize several parameters. We then compare our test to the only previous work on private ANOVA testing, using the same effect size as that work. We see an order of magnitude improvement, with our test requiring only 7% as much data to detect the effect.

Download Full-text

On the statistical power of Baarda’s outlier test and some alternative

Journal of Geodetic Science ◽

10.1515/jogs-2017-0008 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 8

Author(s):

R. Lehmann ◽

A. Voß-Böhme

Keyword(s):

Model Test ◽

Statistical Power ◽

Local Model ◽

Test Statistics ◽

Test Statistic ◽

Optimal Test ◽

Outlier Test ◽

Uniformly Most Powerful ◽

Ump Test

AbstractBaarda’s outlier test is one of the best established theories in geodetic practice. The optimal test statistic of the local model test for a single outlier is known as the normalized residual. Also other model disturbances can be detected and identified with this test. It enjoys the property of being a uniformly most powerful invariant (UMPI) test, but is not a uniformly most powerful (UMP) test. In this contribution we will prove that in the class of test statistics following a common central or non-central χ

Download Full-text

Learning Attribute Hierarchies From Data: Two Exploratory Approaches

Journal of Educational and Behavioral Statistics ◽

10.3102/1076998620931094 ◽

2020 ◽

pp. 107699862093109

Author(s):

Chun Wang ◽

Jing Lu

Keyword(s):

Latent Variable ◽

Latent Class ◽

Hypothesis Test ◽

Structural Features ◽

Classification Model ◽

Ratio Test ◽

Test Statistic ◽

Reference Distribution ◽

Fine Grained ◽

Log Linear

In cognitive diagnostic assessment, multiple fine-grained attributes are measured simultaneously. Attribute hierarchies are considered important structural features of cognitive diagnostic models (CDMs) that provide useful information about the nature of attributes. Templin and Bradshaw first introduced a hierarchical diagnostic classification model (HDCM) that directly takes into account attribute hierarchies, and hence, HDCM is nested within more general CDMs. They also formulated an empirically driven hypothesis test to statistically test one hypothesized link (between two attributes) at a time. However, their likelihood ratio test statistic does not have a known reference distribution, so it is cumbersome to perform hypothesis testing at scale. In this article, we studied two exploratory approaches that could learn the attribute hierarchies directly from data, namely, the latent variable selection (LVS) approach and the regularized latent class modeling (RLCM) approach. An identification constraint was proposed for the LVS approach. Simulation results revealed that both approaches could successfully identify different types of attribute hierarchies, when the underlying CDM is either the deterministic input noisy and gate model or the saturated log-linear CDM. The LVS approach outperformed the RLCM approach, especially when the total number of attributes increases.

Download Full-text

A novel rank-based non-parametric method for longitudinal ordinal data

Statistical Methods in Medical Research ◽

10.1177/0962280216686628 ◽

2017 ◽

Vol 27 (9) ◽

pp. 2775-2794 ◽

Cited By ~ 5

Author(s):

Yan Zhuang ◽

Ying Guan ◽

Libin Qiu ◽

Meisheng Lai ◽

Ming T Tan ◽

...

Keyword(s):

Analysis Of Variance ◽

Ordinal Data ◽

Statistical Power ◽

Parametric Method ◽

Type I ◽

Test Statistic ◽

Transform Method ◽

Rank Transform ◽

Aligned Rank ◽

Non Parametric

Longitudinal ordinal data are common in biomedical research. Although various methods for the analysis of such data have been proposed in the past few decades, they are limited in several ways. For instance, the constraints on parameters in the proportional odds model may result in convergence problems; the rank-based aligned rank transform method imposes constraints on other parameters and the distributional assumptions with parametric model. We propose a novel rank-based non-parametric method that models the profile rather than the distribution of the data to make an effective statistical inference without the constraint conditions. We construct the test statistic of the interaction first, and then construct the test statistics of the main effects separately with or without the interaction, while “adjusted coefficient” for the case of ties is derived. A simulation study is conducted for comparison between rank-based non-parametric and rank-transformed analysis of variance. The results show that type I errors of the two methods are both maintained closer to the priori level, but the statistical power of rank-based non-parametric is greater than that of rank-transformed analysis of variance, suggesting higher efficiency of the former. We then apply rank-based non-parametric to two real studies on acne and osteoporosis, and the results also illustrate the effectiveness of rank-based non-parametric, particularly when the distribution is skewed.

Download Full-text

Radiation exposure in the environment of patients after application of radiopharmaceuticals

Nuklearmedizin ◽

10.3413/nukmed-0184 ◽

2008 ◽

Vol 47 (06) ◽

pp. 267-274 ◽

Cited By ~ 4

Author(s):

F. Boldt ◽

C. Kobe ◽

W. Eschner ◽

H. Schicha ◽

F. Sudbrock

Keyword(s):

Nuclear Medicine ◽

Radiation Exposure ◽

Radioactive Source ◽

Time Intervals ◽

The Public ◽

Dose Rates ◽

Diagnostic Nuclear Medicine ◽

Photon Dose ◽

Order Of Magnitude ◽

Study Dose

Summary Aim: After application of radiopharmaceuticals the patient becomes a radioactive source which leads to radiation exposure in the proximity. The photon dose rates after administration of different radiopharmaceuticals used in diagnostic nuclear medicine were measured at several distances and different time intervals. These data are of importance for estimating the exposure of technologists and members of the public. Patients, method: In this study dose rates were measured for 67 patients after application of the following radiopharmaceuticals: 99mTc-HDP as well as 99mTcpertechnetate, 18F-fluorodeoxyglucose, 111In-Octreotid and Zevalin® and 123I-mIBG in addition to 123I-NaI. The dose rates were measured immediately following application at six different distances to the patient. After two hours the measurements were repeated and – whenever possible – after 24 hours and seven days. Results: Immediately following application the highest dose rates were below 1 mSv / h: with a maximum at 780 μSv/h for 18F (370 MBq), 250 μSv/h for 99mTc (700 MBq), 150 μSv/h for 111In (185 MBq) and 132 μSv/ h for 123I (370 MBq). At a distance of 0.5 m the values decrease significantly by an order of magnitude. Two hours after application the values are diminished to 1/3 (99mTc, 18F), to nearly ½ (123I) but remain in the same order of magnitude for the longer-lived 111In radiopharmaceuticals. Conclusion: For greater distances the doses remain below the limits outlined in the national legislation.

Download Full-text

A Hypothesis Test for the Goodness-of-Fit of the Marginal Distribution of a Time Series with Application to Stablecoin Data

Engineering Proceedings ◽

10.3390/engproc2021005010 ◽

2021 ◽

Vol 5 (1) ◽

pp. 10

Author(s):

Mark Levene

Keyword(s):

Time Series ◽

Goodness Of Fit ◽

Marginal Distribution ◽

Hypothesis Test ◽

Data Sets ◽

Test Statistic ◽

Sample Test ◽

Kolmogorov Smirnov ◽

Heavy Tailed ◽

Jensen Shannon Divergence

A bootstrap-based hypothesis test of the goodness-of-fit for the marginal distribution of a time series is presented. Two metrics, the empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2), are compared on four data sets—three stablecoin time series and a Bitcoin time series. We demonstrate that, after applying first-order differencing, all the data sets fit heavy-tailed α-stable distributions with 1<α<2 at the 95% confidence level. Moreover, ESJS is more powerful than KS2 on these data sets, since the widths of the derived confidence intervals for KS2 are, proportionately, much larger than those of ESJS.

Download Full-text

Model Comparison in the One-Way Between-Subjects Design: An Sas Implementation

Perceptual and Motor Skills ◽

10.2466/pms.1992.75.3f.1124 ◽

1992 ◽

Vol 75 (3_suppl) ◽

pp. 1124-1126

Author(s):

John F. Walsh

Keyword(s):

Analysis Of Variance ◽

Model Comparison ◽

Statistical Test ◽

Statistical System ◽

Test Statistic ◽

Cell Means ◽

Treatment Conditions ◽

Competing Models ◽

Proportional Increase ◽

The One

A statistical test is developed based on the comparison of sums of squared errors associated with two competing models. A model based on cell means is compared to a representation that specifies the means for the treatment conditions. Comparing models is more general than the traditional H0 in analysis of variance wherein all the cell means are assumed equal. The test statistic, Proportional Increase in Error, is computed using the SAS statistical system.

Download Full-text

Statistical power analysis for one-way analysis of variance: A computer program

Behavior Research Methods Instruments &amp Computers ◽

10.3758/bf03209816 ◽

1990 ◽

Vol 22 (3) ◽

pp. 271-282 ◽

Cited By ~ 11

Author(s):

Michael Borenstein ◽

Jacob Cohen ◽

Hannah R. Rothstein ◽

Simcha Pollack ◽

John M. Kane

Keyword(s):

Computer Program ◽

Analysis Of Variance ◽

Power Analysis ◽

Statistical Power ◽

Statistical Power Analysis

Download Full-text

RON-Gauss: Enhancing Utility in Non-Interactive Private Data Release

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2019-0003 ◽

2019 ◽

Vol 2019 (1) ◽

pp. 26-46 ◽

Cited By ~ 2

Author(s):

Thee Chanyaswad ◽

Changchang Liu ◽

Prateek Mittal

Keyword(s):

Machine Learning ◽

Real World ◽

Differential Privacy ◽

Real Data ◽

The Novel ◽

Private Data ◽

Data Release ◽

Machine Learning Applications ◽

Order Of Magnitude ◽

Real World Datasets

Abstract A key challenge facing the design of differential privacy in the non-interactive setting is to maintain the utility of the released data. To overcome this challenge, we utilize the Diaconis-Freedman-Meckes (DFM) effect, which states that most projections of high-dimensional data are nearly Gaussian. Hence, we propose the RON-Gauss model that leverages the novel combination of dimensionality reduction via random orthonormal (RON) projection and the Gaussian generative model for synthesizing differentially-private data. We analyze how RON-Gauss benefits from the DFM effect, and present multiple algorithms for a range of machine learning applications, including both unsupervised and supervised learning. Furthermore, we rigorously prove that (a) our algorithms satisfy the strong ɛ-differential privacy guarantee, and (b) RON projection can lower the level of perturbation required for differential privacy. Finally, we illustrate the effectiveness of RON-Gauss under three common machine learning applications – clustering, classification, and regression – on three large real-world datasets. Our empirical results show that (a) RON-Gauss outperforms previous approaches by up to an order of magnitude, and (b) loss in utility compared to the non-private real data is small. Thus, RON-Gauss can serve as a key enabler for real-world deployment of privacy-preserving data release.

Download Full-text

On “Field Significance” and the False Discovery Rate

Journal of Applied Meteorology and Climatology ◽

10.1175/jam2404.1 ◽

2006 ◽

Vol 45 (9) ◽

pp. 1181-1189 ◽

Cited By ~ 280

Author(s):

D. S. Wilks

Keyword(s):

False Discovery Rate ◽

Statistical Power ◽

Statistical Significance ◽

Significance Test ◽

P Value ◽

Test Statistic ◽

Global Test ◽

Additional Advantage ◽

Counting Procedure ◽

False Discovery

Abstract The conventional approach to evaluating the joint statistical significance of multiple hypothesis tests (i.e., “field,” or “global,” significance) in meteorology and climatology is to count the number of individual (or “local”) tests yielding nominally significant results and then to judge the unusualness of this integer value in the context of the distribution of such counts that would occur if all local null hypotheses were true. The sensitivity (i.e., statistical power) of this approach is potentially compromised both by the discrete nature of the test statistic and by the fact that the approach ignores the confidence with which locally significant tests reject their null hypotheses. An alternative global test statistic that has neither of these problems is the minimum p value among all of the local tests. Evaluation of field significance using the minimum local p value as the global test statistic, which is also known as the Walker test, has strong connections to the joint evaluation of multiple tests in a way that controls the “false discovery rate” (FDR, or the expected fraction of local null hypothesis rejections that are incorrect). In particular, using the minimum local p value to evaluate field significance at a level αglobal is nearly equivalent to the slightly more powerful global test based on the FDR criterion. An additional advantage shared by Walker’s test and the FDR approach is that both are robust to spatial dependence within the field of tests. The FDR method not only provides a more broadly applicable and generally more powerful field significance test than the conventional counting procedure but also allows better identification of locations with significant differences, because fewer than αglobal × 100% (on average) of apparently significant local tests will have resulted from local null hypotheses that are true.

Download Full-text

Privacy in Control and Dynamical Systems

Annual Review of Control Robotics and Autonomous Systems ◽

10.1146/annurev-control-060117-105018 ◽

2018 ◽

Vol 1 (1) ◽

pp. 309-332 ◽

Cited By ~ 10

Author(s):

Shuo Han ◽

George J. Pappas

Keyword(s):

Dynamical Systems ◽

Smart Grids ◽

Differential Privacy ◽

Side Information ◽

Sensitive Information ◽

Efficient Operation ◽

The Public ◽

Rigorous Approach ◽

Trade Offs ◽

User Data

Many modern dynamical systems, such as smart grids and traffic networks, rely on user data for efficient operation. These data often contain sensitive information that the participating users do not wish to reveal to the public. One major challenge is to protect the privacy of participating users when utilizing user data. Over the past decade, differential privacy has emerged as a mathematically rigorous approach that provides strong privacy guarantees. In particular, differential privacy has several useful properties, including resistance to both postprocessing and the use of side information by adversaries. Although differential privacy was first proposed for static-database applications, this review focuses on its use in the context of control systems, in which the data under processing often take the form of data streams. Through two major applications—filtering and optimization algorithms—we illustrate the use of mathematical tools from control and optimization to convert a nonprivate algorithm to its private counterpart. These tools also enable us to quantify the trade-offs between privacy and system performance.

Download Full-text