scholarly journals Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-Level Data

2014 ◽  
Vol 2014 ◽  
pp. 1-14 ◽  
Author(s):  
Paul A. Zandbergen

Public health datasets increasingly use geographic identifiers such as an individual’s address. Geocoding these addresses often provides new insights since it becomes possible to examine spatial patterns and associations. Address information is typically considered confidential and is therefore not released or shared with others. Publishing maps with the locations of individuals, however, may also breach confidentiality since addresses and associated identities can be discovered through reverse geocoding. One commonly used technique to protect confidentiality when releasing individual-level geocoded data is geographic masking. This typically consists of applying a certain amount of random perturbation in a systematic manner to reduce the risk of reidentification. A number of geographic masking techniques have been developed as well as methods to quantity the risk of reidentification associated with a particular masking method. This paper presents a review of the current state-of-the-art in geographic masking, summarizing the various methods and their strengths and weaknesses. Despite recent progress, no universally accepted or endorsed geographic masking technique has emerged. Researchers on the other hand are publishing maps using geographic masking of confidential locations. Any researcher publishing such maps is advised to become familiar with the different masking techniques available and their associated reidentification risks.

Author(s):  
Alexander Diederich ◽  
Christophe Bastien ◽  
Karthikeyan Ekambaram ◽  
Alexis Wilson

The introduction of automated L5 driving technologies will revolutionise the design of vehicle interiors and seating configurations, improving occupant comfort and experience. It is foreseen that pre-crash emergency braking and swerving manoeuvres will affect occupant posture, which could lead to an interaction with a deploying airbag. This research addresses the urgent safety need of defining the occupant’s kinematics envelope during that pre-crash phase, considering rotated seat arrangements and different seatbelt configurations. The research used two different sets of volunteer tests experiencing L5 vehicle manoeuvres, based in the first instance on 22 50th percentile fit males wearing a lap-belt (OM4IS), while the other dataset is based on 87 volunteers with a BMI range of 19 to 67 kg/m2 wearing a 3-point belt (UMTRI). Unique biomechanics kinematics corridors were then defined, as a function of belt configuration and vehicle manoeuvre, to calibrate an Active Human Model (AHM) using a multi-objective optimisation coupled with a Correlation and Analysis (CORA) rating. The research improved the AHM omnidirectional kinematics response over current state of the art in a generic lap-belted environment. The AHM was then tested in a rotated seating arrangement under extreme braking, highlighting that maximum lateral and frontal motions are comparable, independent of the belt system, while the asymmetry of the 3-point belt increased the occupant’s motion towards the seatbelt buckle. It was observed that the frontal occupant kinematics decrease by 200 mm compared to a lap-belted configuration. This improved omnidirectional AHM is the first step towards designing safer future L5 vehicle interiors.


Author(s):  
Ahlam Fuad ◽  
Amany bin Gahman ◽  
Rasha Alenezy ◽  
Wed Ateeq ◽  
Hend Al-Khalifa

Plural of paucity is one type of broken plural used in the classical Arabic. It is used when the number of people or objects ranges from three to 10. Based on our evaluation of four current state-of-the-art Arabic morphological analyzers, there is a lack of identification of broken plural words, specifically the plural of paucity. Therefore, this paper presents “[Formula: see text]” Qillah (paucity), a morphological extension that is built on top of other morphological analyzers and uses a hybrid rule-based and lexicon-based approach to enhance the identification of plural of paucity. Two versions of the Qillah were developed, one is based on FARASA morphological analyzer and the other is based on CALIMA Star analyzer, as these are some of the best-performing morphological analyzers. We designed two experiments to evaluate the effectiveness of our proposed solution based on a collection of 402 different Arabic words. The version based on CALIMA Star achieved a maximum accuracy of 93% in identifying the plural-of-paucity words compared to the baselines. It also achieved a maximum accuracy of 98% compared to the baselines in identifying the plurality of the words.


2020 ◽  
Vol 22 (2) ◽  
pp. 218-226
Author(s):  
Jing Guan ◽  
J. D. Tena

Estimating the causal impact of sport or physical activity on health and well-being is an issue of great relevance in the sport and health literature. The increasing availability of individual level data has encouraged this interest. However, this analysis requires dealing with two types of simultaneity problem: (1) between exercise and response variables; and (2) across the different response variables. This note discusses how the previous literature has dealt with these two questions with particular attention paid to the use of seemingly aseptic econometric models proposed by some recent empirical papers. Regardless of the approach, identification necessarily requires the use of untestable hypotheses. We provide some recommendations based on analyzing the robustness of the estimation results to changes in the adopted identification assumptions.


2005 ◽  
Vol 5 (2) ◽  
pp. 168-181 ◽  
Author(s):  
Robert J. Lacey

Do salient ballot initiatives stimulate voting? Recent studies have shown that initiatives increase voter turnout, but some methodological concerns still linger. These studies have either relied solely on aggregate data to make inferences about individual-level behavior or used a flawed measure of initiative salience. Using individual-level data from the National Election Studies, I find that ballot question salience indeed stimulated voting in the midterm elections of 1990 and 1994. In an election with moderately salient ballot questions, a person's likelihood of voting can increase by as much as 30 percent in a midterm election. On the other hand, consistent with most prior research, I find no statistically significant relationship between ballot question salience and voting in presidential elections.


Author(s):  
Alexander Troussov ◽  
František Dařena ◽  
Jan Žižka ◽  
Denis Parra ◽  
Peter Brusilovsky

Spreading Activation is a family of graph-based algorithms widely used in areas such as information retrieval, epidemic models, and recommender systems. In this paper we introduce a novel Spreading Activation (SA) method that we call Vectorised Spreading Activation (VSA). VSA algorithms, like “traditional” SA algorithms, iteratively propagate the activation from the initially activated set of nodes to the other nodes in a network through outward links. The level of the node’s activation could be used as a centrality measurement in accordance with dynamic model-based view of centrality that focuses on the outcomes for nodes in a network where something is flowing from node to node across the edges. Representing the activation by vectors allows the use of the information about various dimensionalities of the flow and the dynamic of the flow. In this capacity, VSA algorithms can model multitude of complex multidimensional network flows. We present the results of numerical simulations on small synthetic social networks and multi­dimensional network models of folksonomies which show that the results of VSA propagation are more sensitive to the positions of the initial seed and to the community structure of the network than the results produced by traditional SA algorithms. We tentatively conclude that the VSA methods could be instrumental to develop scalable and computationally efficient algorithms which could achieve synergy between computation of centrality indexes with detection of community structures in networks. Based on our preliminary results and on improvements made over previous studies, we foresee advances and applications in the current state of the art of this family of algorithms and their applications to centrality measurement.


Author(s):  
Devesh Bhasin ◽  
Daniel A. McAdams

Abstract The development of multi-functional designs is one of the prime reasons to adopt bio-inspired design in engineering design. However, the development of multi-functional bio-inspired designs is mostly solution-driven, in the sense that an available multi-functional solution drives the search for a problem that can be solved by implementing the available solution. The solution-driven nature of the approach restricts the engineering designers to the use of the function combinations found in nature. On the other hand, a problem-driven approach to multi-functional designs allows the designers to form some combination of functions best suited for the problem at hand. However, few works exist in the literature that focus on the development of multi-functional bio-inspired solutions from a problem-driven perspective. In this work, we analyze the existing works that aid the designers in combining multiple biological strategies to develop multi-functional bio-inspired designs. The analysis is carried out by comparing and contrasting the existing frameworks that support multi-functional bio-inspired design generation. The criteria of comparison are derived from the steps involved in the unified problem-driven biomimetic approach. In addition, we qualitatively compare the multi-functional bio-inspired designs developed using existing frameworks to the multi-functional designs existing in biology. Our aim is to explore the capabilities and limitations of current methods to support the generation multi-functional bio-inspired designs.


2020 ◽  
Vol 34 (05) ◽  
pp. 9354-9361
Author(s):  
Kun Xu ◽  
Linfeng Song ◽  
Yansong Feng ◽  
Yan Song ◽  
Dong Yu

Existing entity alignment methods mainly vary on the choices of encoding the knowledge graph, but they typically use the same decoding method, which independently chooses the local optimal match for each source entity. This decoding method may not only cause the “many-to-one” problem but also neglect the coordinated nature of this task, that is, each alignment decision may highly correlate to the other decisions. In this paper, we introduce two coordinated reasoning methods, i.e., the Easy-to-Hard decoding strategy and joint entity alignment algorithm. Specifically, the Easy-to-Hard strategy first retrieves the model-confident alignments from the predicted results and then incorporates them as additional knowledge to resolve the remaining model-uncertain alignments. To achieve this, we further propose an enhanced alignment model that is built on the current state-of-the-art baseline. In addition, to address the many-to-one problem, we propose to jointly predict entity alignments so that the one-to-one constraint can be naturally incorporated into the alignment prediction. Experimental results show that our model achieves the state-of-the-art performance and our reasoning methods can also significantly improve existing baselines.


1997 ◽  
Author(s):  
J. W. Watts

Abstract Reservoir simulation is a mature technology, and nearly all major reservoir development decisions are based in some way on simulation results. Despite this maturity, the technology is changing rapidly. It is important for both providers and users of reservoir simulation software to understand where this change is leading. This paper takes a long-term view of reservoir simulation, describing where it has been and where it is now. It closes with a prediction of what the reservoir simulation state of the art will be in 2007 and speculation regarding certain aspects of simulation in 2017. Introduction Today, input from reservoir simulation is used in nearly all major reservoir development decisions. This has come about in part through technology improvements that make it easier to simulate reservoirs on one hand and possible to simulate them more realistically on the other; however, although reservoir simulation has come a long way from its beginnings in the 1950's, substantial further improvement is needed, and this is stimulating continual change in how simulation is performed. Given that this change is occurring, both developers and users of simulation have an interest in understanding where it is leading. Obviously, developers of new simulation capabilities need this understanding in order to keep their products relevant and competitive. However, people that use simulation also need this understanding; how else can they be confident that the organizations that provide their simulators are keeping up with advancing technology and moving in the right direction? In order to understand where we are going, it is helpful to know where we have been. Thus, this paper begins with a discussion of historical developments in reservoir simulation. Then it briefly describes the current state of the art in terms of how simulation is performed today. Finally, it closes with some general predictions.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Changgee Chang ◽  
Yi Deng ◽  
Xiaoqian Jiang ◽  
Qi Long

Abstract Distributed health data networks (DHDNs) leverage data from multiple sources or sites such as electronic health records (EHRs) from multiple healthcare systems and have drawn increasing interests in recent years, as they do not require sharing of subject-level data and hence lower the hurdles for collaboration between institutions considerably. However, DHDNs face a number of challenges in data analysis, particularly in the presence of missing data. The current state-of-the-art methods for handling incomplete data require pooling data into a central repository before analysis, which is not feasible in DHDNs. In this paper, we address the missing data problem in distributed environments such as DHDNs that has not been investigated previously. We develop communication-efficient distributed multiple imputation methods for incomplete data that are horizontally partitioned. Since subject-level data are not shared or transferred outside of each site in the proposed methods, they enhance protection of patient privacy and have the potential to strengthen public trust in analysis of sensitive health data. We investigate, through extensive simulation studies, the performance of these methods. Our methods are applied to the analysis of an acute stroke dataset collected from multiple hospitals, mimicking a DHDN where health data are horizontally partitioned across hospitals and subject-level data cannot be shared or sent to a central data repository.


1985 ◽  
Vol 13 (3) ◽  
pp. 323-339 ◽  
Author(s):  
Bernard Hennessy

Analyses of 1880 Census samples of 21-plus male citizens show a turnout of 50% to 69% for California, but nearly 90% for Ohio. Registration was required in California in 1866. A sample of 690 names from the June 1880 Census was checked against the “Alameda County Great Register,” October 1880, and 51% were found to be registered. Of the 12,359 registered, 80% actually voted. Thus, the Alameda turnout of potentially eligible voters was probably below 50%. On the other hand, a 100% sample of eligible males in Clay Township, Highland County, Ohio (N = 342), June 1880, checked against the 1880 Poll Book (list of actual voters, compiled at the end of election day) showed a turnout of 87.7%. Burnham's and others' assertion of high turnouts 1876 to 1896 is supported with respect to Ohio, but unsupported with respect to California; these findings are contrary to Burnham's belief that in 1876–1896 there was “a concentration of participation in the most densely populated and socioeconomically developed parts of the country”; the effects of the first registration laws may have been greater than the 10% currently estimated, but we need to find and use individual-level data to sharpen estimates from aggregate data.


Sign in / Sign up

Export Citation Format

Share Document