Big data, differential privacy and national statistical organisations

2020 ◽  
Vol 36 (4) ◽  
pp. 1067-1074
Author(s):  
James Bailie

Differential privacy (DP) has emerged in the computer science literature as a measure of the impact on an individual’s privacy resulting from the publication of a statistical output such as a frequency table. This paper provides an introduction to DP for official statisticians and discuss its relevance, benefits and challenges from a National Statistical Organisation (NSO) perspective. We motivate our study by examining how privacy is evolving in the era of big data and how this might prompt a shift from traditional statistical disclosure techniques used in official statistics – which are generally applied on a cell-by-cell or table-by-table basis – to formal privacy methods, like DP, which are applied from a perspective encompassing the totality of the outputs generated from a given dataset. We identify an important interplay between DP’s holistic privacy risk measure and the difficulty for NSOs in implementing DP, showing that DP’s major advantage is also DP’s major challenge. This paper provides new work addressing two key DP research areas for NSOs: DP’s application to survey data and its incorporation within the Five Safes framework.

Author(s):  
Natalie Shlomo ◽  
Chris J. Skinner

Statistical agencies release microdata from social surveys as public-use files after applying statistical disclosure limitation (SDL) techniques. Disclosure risk is typically assessed in terms of identification risk, where it is supposed that small counts on cross-classified identifying key variables, i.e. a key, could be used to make an identification and confidential information may be learnt. In this paper we explore the application of definitions of privacy from the computer science literature to the same problem, with a focus on sampling and a form of perturbation which can be represented as misclassification. We consider two privacy definitions: differential privacy and probabilistic differential privacy. Chaudhuri and Mishra (2006) have shown that sampling does not guarantee differential privacy, but that, under certain conditions, it may ensure probabilistic differential privacy. We discuss these definitions and conditions in the context of survey microdata. We then extend this discussion to the case of perturbation. We show that differential privacy can be ensured if and only if the perturbation employs a misclassification matrix with no zero entries. We also show that probabilistic differential privacy is a viable alternative to differential privacy when there are zeros in the misclassification matrix. We discuss some common examples of SDL methods where in some cases zeros may be prevalent in the misclassification matrix.


Author(s):  
Hye-Chung Kum ◽  
Prannay Jain

ABSTRACTObjectiveInformation privacy theory demonstrates mathematically that privacy is a budget constrained problem and that privacy preserving algorithms(e.g., differential privacy) must rely on a budgeting system. Thus, we design a privacy measure as a function of information disclosed to support incremental information disclosure required for safe interactive record linkage. The privacy measure will determine the increase in the privacy risk for any given information disclosed during record linkage. ApproachMathematically, the identity disclosure risk is inversely proportional to the number of entities in the population that share the information disclosed. If the information refers to one and only one person in the population, then the identity of the person has been fully disclosed by the information revealed. On the other hand, if the information disclosed is identical for multiple people(say n), then the information is less revealing as it could refer to any one of the n people. The larger the n, the lower the privacy risk. Thus, the anonymity-set size is defined as the number of people in the population that share the same identifying information. The privacy risk measure has one prespecified parameter k, which represents the minimum anonymity-set size to guarantee no privacy risk. That is, for any disclosed information, if the anonymity-set size is less than k, then a privacy risk is present and the risk score will be calculated. A commonly accepted threshold for k is 5 or 10. On the one hand, when all entities have anonymity-set size less than k, the privacy risk would be 100%. On the other hand, if all entities have anonymity-set size greater than or equal to k, the privacy risk would be 0%. ResultsThe budgeting system contributes to the much-needed methods for protecting privacy while still supporting high quality interactive record linkage by allowing safer manual resolution of uncertain linkages. The budgeting system supports refining effective visual encoding techniques for incrementally revealing only the required information on an as-needed basis during manual resolution of uncertain linkages as well as refining the design for a visual interface to facilitate privacy preserving data standardization, cleaning, and conflict resolution for interactive record linkage. We evaluate the budgeting system with the NC voter registry data. ConclusionThe k-anonymity based privacy risk budgeting system provides a mechanism where we can concretely reason about the tradeoff between the privacy risks due to information disclosed, accuracy gained, and biases reduced during interactive record linkage.


2019 ◽  
Vol 10 (4) ◽  
pp. 106
Author(s):  
Bader A. Alyoubi

Big Data is gaining rapid popularity in e-commerce sector across the globe. There is a general consensus among experts that Saudi organisations are late in adopting new technologies. It is generally believed that the lack of research in latest technologies that are specific to Saudi Arabia that is culturally, socially, and economically different from the West, is one of the key factors for the delay in technology adoption in Saudi Arabia. Hence, to fill this gap to a certain extent and create awareness about Big Data technology, the primary goal of this research was to identify the impact of Big Data on e-commerce organisations in Saudi Arabia. Internet has changed the business environment of Saudi Arabia too. E-commerce is set for achieving new heights due to latest technological advancements. A qualitative research approach was used by conducting interviews with highly experienced professional to gather primary data. Using multiple sources of evidence, this research found out that traditional databases are not capable of handling massive data. Big Data is a promising technology that can be adopted by e-commerce companies in Saudi Arabia. Big Data’s predictive analytics will certainly help e-commerce companies to gain better insight of the consumer behaviour and thus offer customised products and services. The key finding of this research is that Big Data has a significant impact in e-commerce organisations in Saudi Arabia on various verticals like customer retention, inventory management, product customisation, and fraud detection.


Risks ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 60
Author(s):  
Cláudia Simões ◽  
Luís Oliveira ◽  
Jorge M. Bravo

Protecting against unexpected yield curve, inflation, and longevity shifts are some of the most critical issues institutional and private investors must solve when managing post-retirement income benefits. This paper empirically investigates the performance of alternative immunization strategies for funding targeted multiple liabilities that are fixed in timing but random in size (inflation-linked), i.e., that change stochastically according to consumer price or wage level indexes. The immunization procedure is based on a targeted minimax strategy considering the M-Absolute as the interest rate risk measure. We investigate to what extent the inflation-hedging properties of ILBs in asset liability management strategies targeted to immunize multiple liabilities of random size are superior to that of nominal bonds. We use two alternative datasets comprising daily closing prices for U.S. Treasuries and U.S. inflation-linked bonds from 2000 to 2018. The immunization performance is tested over 3-year and 5-year investment horizons, uses real and not simulated bond data and takes into consideration the impact of transaction costs in the performance of immunization strategies and in the selection of optimal investment strategies. The results show that the multiple liability immunization strategy using inflation-linked bonds outperforms the equivalent strategy using nominal bonds and is robust even in a nearly zero interest rate scenario. These results have important implications in the design and structuring of ALM liability-driven investment strategies, particularly for retirement income providers such as pension schemes or life insurance companies.


2021 ◽  
Vol 13 (10) ◽  
pp. 5726
Author(s):  
Aleksandra Wewer ◽  
Pinar Bilge ◽  
Franz Dietrich

Electromobility is a new approach to the reduction of CO2 emissions and the deceleration of global warming. Its environmental impacts are often compared to traditional mobility solutions based on gasoline or diesel engines. The comparison pertains mostly to the single life cycle of a battery. The impact of multiple life cycles remains an important, and yet unanswered, question. The aim of this paper is to demonstrate advances of 2nd life applications for lithium ion batteries from electric vehicles based on their energy demand. Therefore, it highlights the limitations of a conventional life cycle analysis (LCA) and presents a supplementary method of analysis by providing the design and results of a meta study on the environmental impact of lithium ion batteries. The study focuses on energy demand, and investigates its total impact for different cases considering 2nd life applications such as (C1) material recycling, (C2) repurposing and (C3) reuse. Required reprocessing methods such as remanufacturing of batteries lie at the basis of these 2nd life applications. Batteries are used in their 2nd lives for stationary energy storage (C2, repurpose) and electric vehicles (C3, reuse). The study results confirm that both of these 2nd life applications require less energy than the recycling of batteries at the end of their first life and the production of new batteries. The paper concludes by identifying future research areas in order to generate precise forecasts for 2nd life applications and their industrial dissemination.


Sign in / Sign up

Export Citation Format

Share Document