statistical queries
Recently Published Documents


TOTAL DOCUMENTS

35
(FIVE YEARS 4)

H-INDEX

7
(FIVE YEARS 0)

2021 ◽  
Vol 11 (2) ◽  
Author(s):  
Yuval Dagan ◽  
Vitaly Feldman

Local differential privacy (LDP) is a model where users send privatized data to an untrusted central server whose goal it to solve some data analysis task. In the non-interactive version of this model the protocol consists of a single round in which a server sends requests to all users then receives their responses. This version is deployed in industry due to its practical advantages and has attracted significant research interest. Our main result is an exponential lower bound on the number of samples necessary to solve the standard task of learning a large-margin linear separator in the non-interactive LDP model. Via a standard reduction this lower bound implies an exponential lower bound for stochastic convex optimization and specifically, for learning linear models with a convex, Lipschitz and smooth loss. These results answer the questions posed by Smith, Thakurta, and Upadhyay (IEEE Symposium on Security and Privacy 2017) and Daniely and Feldman (NeurIPS 2019). Our lower bound relies on a new technique for constructing pairs of distributions with nearly matching moments but whose supports can be nearly separated by a large margin hyperplane. These lower bounds also hold in the model where communication from each user is limited and follow from a lower bound on learning using non-adaptive statistical queries.



Author(s):  
Vitaly Feldman ◽  
Cristóbal Guzmán ◽  
Santosh Vempala

Stochastic convex optimization, by which the objective is the expectation of a random convex function, is an important and widely used method with numerous applications in machine learning, statistics, operations research, and other areas. We study the complexity of stochastic convex optimization given only statistical query (SQ) access to the objective function. We show that well-known and popular first-order iterative methods can be implemented using only statistical queries. For many cases of interest, we derive nearly matching upper and lower bounds on the estimation (sample) complexity, including linear optimization in the most general setting. We then present several consequences for machine learning, differential privacy, and proving concrete lower bounds on the power of convex optimization–based methods. The key ingredient of our work is SQ algorithms and lower bounds for estimating the mean vector of a distribution over vectors supported on a convex body in Rd. This natural problem has not been previously studied, and we show that our solutions can be used to get substantially improved SQ versions of Perceptron and other online algorithms for learning halfspaces.



2021 ◽  
Author(s):  
Daniela Tirsch ◽  
Joana R. C. Voigt ◽  
Christina E. Viviano ◽  
Janice L. Bishop ◽  
Melissa D. Lane ◽  
...  

<p>Tyrrhena Terra hosts an intriguing variety of aqueously altered materials accompanied by unaltered mafic rocks. Our study region extends from the southern rim of the Isidis impact basin, including the Libya Montes region, southward to the Hellas Basin rim (Fig. 1). The NW part is dominated by lava flows from Syrtis Major that grade southwards into the TT highlands, dissected by fluvial channels and overprinted by abundant impact craters. These landforms together with lobate and fan-shaped deposits within impact craters are evidence for a variable history of erosion and deposition. Ancient phyllosilicate-rich materials have been exposed and uplifted from the subsurface, as they often occur in crater ejecta and central crater uplifts.</p><p>Our previous studies used CRISM spectral data together with CTX, HiRISE, and HRSC images as well as their derived topography data to create geomorphological maps of the southern Isidis region and Tyrrhena Terra. These datasets were used to map and characterize the types and occurrences of phyllosilicates, chlorite, opal, zeolites, carbonates, olivines, and pyroxenes and to assess the relationships between selected aqueous outcrops and surface features.</p><p>In this work, we build on these results by seeking correlations between aqueous mineral detections with our geomorphological map to assess 1) whether or not there are relationships between specific units and mineral occurrences, and 2) if there are trends across the study region in terms of mineral occurrence and abundance.</p><p>The mineralogical map originates from a study that spans not only the inter-Isidis-Hellas region, but also extends northwards to Nili Fosse and westwards to Terra Sabea. The focus of that study was on the metamorphic- and hydrothermally-related alteration history using CRISM targeted and mapping data, including hundreds of calibrated MTRDR images. These mineral detections were available to us as a mapped shape file, enabling us to assess the minerals in context with the geomorphological map. We utilized ESRI’s ArcGIS system and conducted multiple statistical queries in terms of mineral occurrence/type versus map unit in order to reveal possible trends within and across the study region.</p><p>Fe/Mg-phyllosilicates are the dominant aqueous mineral type within the study region and are more abundant in the central region compared to the proximity of either the Isidis or Hellas impact basin. Chlorites increase in abundance with distance from both impact basins, which could be an indication of hydrothermal processes from geothermal flux. The large Hellas impact event appears to have produced more varied temperatures and water chemistries, resulting in increased mineral variability near its rim.</p>



Author(s):  
Philip Derbeko ◽  
Shlomi Dolev ◽  
Ehud Gudes ◽  
Jeffrey D. Ullman


2020 ◽  
Vol 3 (3) ◽  
pp. 155-168
Author(s):  
Benoit Thieurmel ◽  
Martin Masson

The collection of information in the database of a medical registry finds its first interest in the possibility for a doctor and a care team to analyze their results and to compare themselves with other teams for the purpose of sharing experience and knowledge. Since 1986, the French Language Peritoneal Dialysis Registry (RDPLF) has collected data from 45,000 patients with renal failure treated at home in French-speaking countries. A partnership has been created between the RDPLF and Datastorm (https://www.datastorm.fr), the expertise and consultancy subsidiary of the ENSAE-ENSAI Group (National Schools of Economics and Statistics) in order to develop an application that allows to carry out simple statistical queries on the RDPLF database, by means of a user-friendly WEB interface. Thus, any doctor or member of the healthcare team can evaluate, without any special statistical skills, results by region and by French-speaking country. Special access also allows any center to compare its own results with those of a reference region. The generated graphics can be used for presentations during team meetings or for work. The application is based on the R software (https://www.r-project.org) and its SHINY visualization interface (https://shiny.rstudio.com). We report on development modalities and its functionalities (based on preselected criteria: incidence rate, prevalence, survival, infection rate, distribution of treatments, nursing aspects). This article describes how both nurses and doctors can easily realize studies with the application. Its bilingual interface also opens it up to English-speaking communities and thus facilitates international communication.



2020 ◽  
Author(s):  
Artem Kaznatcheev

AbstractValiant [1] proposed to treat Darwinian evolution as a special kind of computational learning from statistical queries. The statistical queries represent a genotype’s fitness over a distribution of challenges. And this distribution of challenges along with the best response to them specify a given abiotic environment or static fitness landscape. Valiant’s model distinguished families of environments that are “adaptable-to” from those that are not. But this model of evolution omits the vital ecological interactions between different evolving agents – it neglects the rich biotic environment that is central to the struggle for existence.In this article, I extend algorithmic Darwinism to include the ecological dynamics of frequency-dependent selection as a population-dependent bias to the distribution of challenges that specify an environment. This extended algorithmic Darwinism replaces simple invasion of wild-type by a mutant-type of higher scalar fitness with an evolutionary game between wild-type and mutant-type based on their frequency-dependent fitness function. To analyze this model, I develop a game landscape view of evolution, as a generalization of the classic fitness landscape approach that is popular in biology.I show that this model of eco-evo dynamics on game landscapes can provide an exponential speed-up over the purely evolutionary dynamics of the strict algorithmic Darwinism proposed by Valiant. In particular, I prove that the noisy-Parity environment – which is known to be not adaptable-to under strict algorithmic Darwinism (and conjectured to be not PAC-learnable) – is adaptable-to by eco-evo dynamics. Thus, the ecology of frequency-dependent selection does not just increase the tempo of evolution, but fundamentally transforms its mode.The eco-evo dynamic for adapting to the noisy-Parity environment proceeds by two stages: (1) a quick stage of point-mutations that moves the population to one of exponentially many local fitness peaks; followed by (2) a slower stage where each ‘step’ follows a double-mutation by a point-mutation. This second stage allows the population to hop between local fitness peaks to reach the unique global fitness peak in polynomial time. The evolutionary game dynamics of finite populations are essential for finding a short adaptive path to the global fitness peak during the second stage of the adaptation process. This highlights the rich interface between computational learning theory, evolutionary games, and long-term evolution.



2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Aloni Cohen ◽  
Kobbi Nissim

We briefly report on a successful linear program reconstruction attack performed on a production statistical queries system and using a real dataset. The attack was deployed in test environment in the course of the Aircloak Challenge bug bounty program and is based on the reconstruction algorithm of Dwork, McSherry, and Talwar. We empirically evaluate the effectiveness of the algorithm and a related algorithm by Dinur and Nissim with various dataset sizes, error rates, and numbers of queries in a Gaussian noise setting.



Author(s):  
David Froelicher ◽  
Juan R. Troncoso-Pastoriza ◽  
Joao Sa Sousa ◽  
Jean-Pierre Hubaux
Keyword(s):  


2019 ◽  
Vol 2019 (3) ◽  
pp. 170-190
Author(s):  
Archita Agarwal ◽  
Maurice Herlihy ◽  
Seny Kamara ◽  
Tarik Moataz

Abstract The problem of privatizing statistical databases is a well-studied topic that has culminated with the notion of differential privacy. The complementary problem of securing these differentially private databases, however, has—as far as we know—not been considered in the past. While the security of private databases is in theory orthogonal to the problem of private statistical analysis (e.g., in the central model of differential privacy the curator is trusted) the recent real-world deployments of differentially-private systems suggest that it will become a problem of increasing importance. In this work, we consider the problem of designing encrypted databases (EDB) that support differentially-private statistical queries. More precisely, these EDBs should support a set of encrypted operations with which a curator can securely query and manage its data, and a set of private operations with which an analyst can privately analyze the data. Using such an EDB, a curator can securely outsource its database to an untrusted server (e.g., on-premise or in the cloud) while still allowing an analyst to privately query it. We show how to design an EDB that supports private histogram queries. As a building block, we introduce a differentially-private encrypted counter based on the binary mechanism of Chan et al. (ICALP, 2010). We then carefully combine multiple instances of this counter with a standard encrypted database scheme to support differentially-private histogram queries.



2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Mark Bun ◽  
Thomas Steinke ◽  
Jonathan Ullman

We consider the problem of answering queries about a sensitive dataset subject to differential privacy. The queries may be chosen adversarially from a larger set $Q$ of allowable queries in one of three ways, which we list in order from easiest to hardest to answer:Offline: The queries are chosen all at once and the differentially private mechanism answers the queries in a single batch. Online: The queries are chosen all at once, but the mechanism only receives the queries in a streaming fashion and must answer each query before seeing the next query. Adaptive: The queries are chosen one at a time and the mechanism must answer each query before the next query is chosen. In particular, each query may depend on the answers given to previous queries.Many differentially private mechanisms are just as efficient in the adaptive model as they are in the offline model. Meanwhile, most lower bounds for differential privacy hold in the offline setting. This suggests that the three models may be equivalent. We prove that these models are all, in fact, distinct. Specifically, we show that there is a family of statistical queries such that exponentially more queries from this family can be answered in the offline model than in the online model. We also exhibit a family of search queries such that exponentially more queries from this family can be answered in the online model than in the adaptive model. We also investigate whether such separations might hold for simple queries like threshold queries over the real line.



Sign in / Sign up

Export Citation Format

Share Document