Advanced methods for linking complex historical birth, death, marriage and census data

ABSTRACT ObjectiveRecent years have seen the development of novel techniques for linking complex types of data that contain records about different types of entities, for example bibliographic databases with records about authors, publications, and venues. Advanced approaches have been devised to link individuals and groups of records. These approaches exploit both the similarities between record attributes as well as the relationships between entities. Rather than linking records about different types of entities, in this work we study the novel problem of linking records where the same entity can have different roles and where these roles can change over time. ApproachWe specifically develop novel techniques for linking historical birth, death, marriage, and census certificates with the aim to reconstruct the population covered by these certificates over a period of time. Our techniques make use of constraints that consider roles, relationships, as well as time. Our first technique links certificates based on the specific roles of their individuals, and greedily selects pairs of certificates with the highest overall similarity while also considering 1-to-1 and 1-to-many linkage constraints. Our second hybrid technique combines graph, group, and temporal linkage, and also considers relationship information between individuals and groups. We compare these techniques with state-of-the-art group, collective, and graph-based linkage approaches. ResultsWe evaluate our proposed techniques on real Scottish data from 1861 to 1901 that cover the population of the Isle of Skye. In total, these data sets contain 119,042 certificates for 234,365 individuals. As ground truth we have a set of life-segments of records manually linked by domain experts. Our results indicate that even advanced techniques have difficulty in achieving high linkage quality compared to careful manual linkage. Two reasons for this are the very small name pool in our data and the changing nature of people's personal details over time. Both our proposed techniques, however, significantly outperform traditional pair-wise attribute similarity and group linkage approaches, with the greedy role-based technique achieving better results than the hybrid technique. ConclusionOur experiments on real data show that even with advanced linkage techniques that employ group, graph, relationship, and temporal approaches it is challenging to achieve high quality links from complex data such as birth, death, marriage and census certificates that span several decades. As future work we will improve all steps of our techniques with the goal of developing highly accurate, scalable, and automatic techniques for linking large-scale complex population databases.

Download Full-text

Regularization strategies for deep-learning-based salt model building

Interpretation ◽

10.1190/int-2018-0229.1 ◽

2019 ◽

Vol 7 (4) ◽

pp. T911-T922

Author(s):

Satyakee Sen ◽

Sribharath Kainkaryam ◽

Cen Ong ◽

Arvind Sharma

Keyword(s):

Deep Learning ◽

Large Scale ◽

Model Building ◽

Ground Truth ◽

Real Data ◽

Test Time ◽

Generalization Error ◽

Training Set ◽

Production Scale ◽

Ensemble Strategy

Salt model building has long been considered a severe bottleneck for large-scale 3D seismic imaging projects. It is one of the most time-consuming, labor-intensive, and difficult-to-automate processes in the entire depth imaging workflow requiring significant intervention by domain experts to manually interpret the salt bodies on noisy, low-frequency, and low-resolution seismic images at each iteration of the salt model building process. The difficulty and need for automating this task is well-recognized by the imaging community and has propelled the use of deep-learning-based convolutional neural network (CNN) architectures to carry out this task. However, significant challenges remain for reliable production-scale deployment of CNN-based methods for salt model building. This is mainly due to the poor generalization capabilities of these networks. When used on new surveys, never seen by the CNN models during the training stage, the interpretation accuracy of these models drops significantly. To remediate this key problem, we have introduced a U-shaped encoder-decoder type CNN architecture trained using a specialized regularization strategy aimed at reducing the generalization error of the network. Our regularization scheme perturbs the ground truth labels in the training set. Two different perturbations are discussed: one that randomly changes the labels of the training set, flipping salt labels to sediments and vice versa and the second that smooths the labels. We have determined that such perturbations act as a strong regularizer preventing the network from making highly confident predictions on the training set and thus reducing overfitting. An ensemble strategy is also used for test time augmentation that is shown to further improve the accuracy. The robustness of our CNN models, in terms of reduced generalization error and improved interpretation accuracy is demonstrated with real data examples from the Gulf of Mexico.

Download Full-text

Spatio-temporal evolution of thin Alfven resonance layer

Journal of Plasma Physics ◽

10.1017/s002237781000022x ◽

2010 ◽

Vol 76 (5) ◽

pp. 709-734

Author(s):

I. S. DMITRIENKO

Keyword(s):

Large Scale ◽

Temporal Evolution ◽

Spatial Structures ◽

One Dimensional ◽

Resonance Detuning ◽

Different Types ◽

Temporal Structures ◽

Spatio Temporal ◽

Over Time ◽

Resonance Equation

AbstractWe describe the spatio-temporal evolution of one-dimensional Alfven resonance disturbance in the presence of various factors of resonance detuning: dispersion and absorption of Alfven disturbance, nonstationarity of large-scale wave generating resonant disturbance. Using analytical solutions to the resonance equation, we determine conditions for forming qualitatively different spatial and temporal structures of resonant Alfven disturbances. We also present analytical descriptions of quasi-stationary and non-stationary spatial structures formed in the resonant layer, and their evolution over time for cases of drivers of different types corresponding to large-scale waves localized in the direction of inhomogeneity and to nonlocalized large-scale waves.

Download Full-text

The ABCDE of Big Data: Assessing Biases in Call-Detail Records for Development Estimates

The World Bank Economic Review ◽

10.1093/wber/lhz039 ◽

2019 ◽

Vol 34 (Supplement_1) ◽

pp. S89-S97 ◽

Cited By ~ 1

Author(s):

Gabriel Pestre ◽

Emmanuel Letouzé ◽

Emilio Zagheni

Keyword(s):

Population Density ◽

Census Data ◽

Developing World ◽

Ground Truth ◽

Ground Truth Data ◽

Cell Phone Use ◽

Development Indicators ◽

Changes Over Time ◽

Over Time ◽

Call Detail Records

Abstract This article contributes to improving our understanding of biases in estimates of demographic indicators, in the developing world, based on Call Detail Records (CDRs). CDRs represent an important and largely untapped source of data for the developing world. However, they are not representative of the underlying population. We combine CDRs and census data for Senegal in 2013 to evaluate biases related to estimates of population density. We show that: (i) there are systematic relationships between cell-phone use and socio-economic and geographic characteristics that can be leveraged to improve estimates of population density; (ii) when no ‘ground truth’ data is available, a difference-in-difference approach can be used to reduce bias and infer relative changes over time in population size at the subnational level; (iii) indicators of development, including urbanization and internal, circular, and temporary migration, can be monitored by integrating census data and CDRs. The paper is intended to offer a methodological contribution and examples of applications related to combining new and traditional data sources to improve our ability to monitor development indicators over time and space.

Download Full-text

Comprehensive Comparative Analysis of Local False Discovery Rate Control Methods

Metabolites ◽

10.3390/metabo11010053 ◽

2021 ◽

Vol 11 (1) ◽

pp. 53

Author(s):

Shin June Kim ◽

Youngjae Oh ◽

Jaesik Jeong

Keyword(s):

False Discovery Rate ◽

Rate Control ◽

Large Scale ◽

Real Data ◽

Complex Data ◽

Local False Discovery Rate ◽

Comparison Study ◽

Advanced Technique ◽

One Dimensional ◽

False Discovery

Due to the advance in technology, the type of data is getting more complicated and large-scale. To analyze such complex data, more advanced technique is required. In case of omics data from two different groups, it is interesting to find significant biomarkers between two groups while controlling error rate such as false discovery rate (FDR). Over the last few decades, a lot of methods that control local false discovery rate have been developed, ranging from one-dimensional to k-dimensional FDR procedure. For comparison study, we select three of them, which have unique and significant properties: Efron’s approach, Ploner’s approach, and Kim’s approach in chronological order. The first approach is one-dimensional approach while the other two are two-dimensional ones. Furthermore, we consider two more variants of Ploner’s approach. We compare the performance of those methods on both simulated and real data.

Download Full-text

Measuring Agency Change Across the Domain of Hypnosis

10.31234/osf.io/apceh ◽

2019 ◽

Author(s):

Vince Polito ◽

Amanda Barnier ◽

Erik Woody

Keyword(s):

Rating Scale ◽

Sense Of Agency ◽

Pass Rates ◽

Current Experiment ◽

Multidimensional Construct ◽

Time Points ◽

Hypnotic Induction ◽

Different Types ◽

Over Time ◽

Postexperimental Inquiry

Building on Hilgard’s (1965) classic work, the domain of hypnosis has been conceptualised by Barnier, Dienes, and Mitchell (2008) as comprising three levels: (1) classic hypnotic items, (2) responding between and within items, and (3) state and trait. The current experiment investigates sense of agency across each of these three levels. Forty-six high hypnotisable participants completed an ideomotor (arm levitation), a challenge (arm rigidity) and a cognitive (anosmia) item either following a hypnotic induction (hypnosis condition) or without a hypnotic induction (wake condition). In a postexperimental inquiry, participants rated their feelings of control at three time points for each item: during the suggestion, test and cancellation phases. They also completed the Sense of Agency Rating Scale (Polito, Barnier, & Woody, 2013) for each item. Pass rates, control ratings, and agency scores fluctuated across the different types of items and for the three phases of each item; also, control ratings and agency scores often differed across participants who passed versus failed each item. Interestingly, whereas a hypnotic induction influenced the likelihood of passing items, it had no direct effect on agentive experiences. These results suggest that altered sense of agency is not a unidimensional or static quality “switched on” by hypnotic induction, but a dynamic multidimensional construct that varies across items, over time and according to whether individuals pass or fail suggestions.

Download Full-text

Model and Method for Contributor’s Quality Assessment in Community Image Tagging Systems

Information and Control Systems ◽

10.31799/1684-8853-2018-4-45-51 ◽

2018 ◽

pp. 45-51

Author(s):

A. V. Ponomarev

Keyword(s):

Large Scale ◽

Wide Spectrum ◽

Preference Relation ◽

Pairwise Comparison ◽

Ground Truth ◽

Comparison Method ◽

Characteristic Matrix ◽

Image Tagging ◽

Proposed Model

Introduction: Large-scale human-computer systems involving people of various skills and motivation into the information processing process are currently used in a wide spectrum of applications. An acute problem in such systems is assessing the expected quality of each contributor; for example, in order to penalize incompetent or inaccurate ones and to promote diligent ones.Purpose: To develop a method of assessing the expected contributor’s quality in community tagging systems. This method should only use generally unreliable and incomplete information provided by contributors (with ground truth tags unknown).Results:A mathematical model is proposed for community image tagging (including the model of a contributor), along with a method of assessing the expected contributor’s quality. The method is based on comparing tag sets provided by different contributors for the same images, being a modification of pairwise comparison method with preference relation replaced by a special domination characteristic. Expected contributors’ quality is evaluated as a positive eigenvector of a pairwise domination characteristic matrix. Community tagging simulation has confirmed that the proposed method allows you to adequately estimate the expected quality of community tagging system contributors (provided that the contributors' behavior fits the proposed model).Practical relevance: The obtained results can be used in the development of systems based on coordinated efforts of community (primarily, community tagging systems).

Download Full-text

Overview of Subsynchronous Oscillation in Grid-connected Wind Farm

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096513999200407103526 ◽

2020 ◽

Vol 13 (7) ◽

pp. 969-979

Author(s):

Xu Pei-Zhen ◽

Lu Yong-Geng ◽

Cao Xi-Min

Keyword(s):

Power Systems ◽

Wind Turbine ◽

Large Scale ◽

Wind Farm ◽

Wind Farms ◽

Induction Generator ◽

Future Research ◽

Stable Operation ◽

Subsynchronous Oscillation ◽

Different Types

Background: Over the past few years, the subsynchronous oscillation (SSO) caused by the grid-connected wind farm had a bad influence on the stable operation of the system and has now become a bottleneck factor restricting the efficient utilization of wind power. How to mitigate and suppress the phenomenon of SSO of wind farms has become the focus of power system research. Methods: This paper first analyzes the SSO of different types of wind turbines, including squirrelcage induction generator based wind turbine (SCIG-WT), permanent magnet synchronous generator- based wind turbine (PMSG-WT), and doubly-fed induction generator based wind turbine (DFIG-WT). Then, the mechanisms of different types of SSO are proposed with the aim to better understand SSO in large-scale wind integrated power systems, and the main analytical methods suitable for studying the SSO of wind farms are summarized. Results: On the basis of results, using additional damping control suppression methods to solve SSO caused by the flexible power transmission devices and the wind turbine converter is recommended. Conclusion: The current development direction of the SSO of large-scale wind farm grid-connected systems is summarized and the current challenges and recommendations for future research and development are discussed.

Download Full-text

Situational Breakdowns

10.1093/oso/9780190922061.001.0001 ◽

2019 ◽

Cited By ~ 8

Author(s):

Anne Nassauer

Keyword(s):

United States ◽

Large Scale ◽

Contextual Factors ◽

The United States ◽

Social Outcomes ◽

Dynamic Processes ◽

Collective Decisions ◽

Different Types ◽

The Impact

This book provides an account of how and why routine interactions break down and how such situational breakdowns lead to protest violence and other types of surprising social outcomes. It takes a close-up look at the dynamic processes of how situations unfold and compares their role to that of motivations, strategies, and other contextual factors. The book discusses factors that can draw us into violent situations and describes how and why we make uncommon individual and collective decisions. Covering different types of surprise outcomes from protest marches and uprisings turning violent to robbers failing to rob a store at gunpoint, it shows how unfolding situations can override our motivations and strategies and how emotions and culture, as well as rational thinking, still play a part in these events. The first chapters study protest violence in Germany and the United States from 1960 until 2010, taking a detailed look at what happens between the start of a protest and the eruption of violence or its peaceful conclusion. They compare the impact of such dynamics to the role of police strategies and culture, protesters’ claims and violent motivations, the black bloc and agents provocateurs. The analysis shows how violence is triggered, what determines its intensity, and which measures can avoid its outbreak. The book explores whether we find similar situational patterns leading to surprising outcomes in other types of small- and large-scale events: uprisings turning violent, such as Ferguson in 2014 and Baltimore in 2015, and failed armed store robberies.

Download Full-text

Imagery in the Book of Revelation

The Oxford Handbook of the Book of Revelation ◽

10.1093/oxfordhb/9780190655433.013.3 ◽

2020 ◽

pp. 52-67

Author(s):

Konrad Huber

Keyword(s):

Reception History ◽

Book Of Revelation ◽

Imperial Cult ◽

Emotional Effect ◽

Greco Roman ◽

Different Types ◽

Figurative Speech ◽

Theological Message ◽

Over Time ◽

The Way

The chapter first surveys different types of figurative speech in Revelation, including simile, metaphor, symbol, and narrative image. Second, it considers the way images are interrelated in the narrative world of the book. Third, it notes how the images draw associations from various backgrounds, including biblical and later Jewish sources, Greco-Roman myths, and the imperial cult, and how this enriches the understanding of the text. Fourth, the chapter looks at the rhetorical impact of the imagery on readers and stresses in particular its evocative, persuasive, and parenetic function together with its emotional effect. And fifth, it looks briefly at the way reception history shows how the imagery has engaged readers over time. Thus, illustrated by numerous examples, it becomes clear how essentially the imagery of the book of Revelation constitutes and determines its theological message.

Download Full-text

Predictors of historical change in drug treatment coverage among people who inject drugs in 90 large metropolitan areas in the USA, 1993–2007

Substance Abuse Treatment Prevention and Policy ◽

10.1186/s13011-019-0235-0 ◽

2020 ◽

Vol 15 (1) ◽

Cited By ~ 1

Author(s):

Barbara Tempalski ◽

Leslie D. Williams ◽

Brooke S. West ◽

Hannah L. F. Cooper ◽

Stephanie Beane ◽

...

Keyword(s):

Public Health ◽

Drug Treatment ◽

Resource Availability ◽

People Who Inject Drugs ◽

Census Data ◽

Poverty Rate ◽

Community Action ◽

Historical Change ◽

Treatment Coverage ◽

Over Time

Abstract Background Adequate access to effective treatment and medication assisted therapies for opioid dependence has led to improved antiretroviral therapy adherence and decreases in morbidity among people who inject drugs (PWID), and can also address a broad range of social and public health problems. However, even with the success of syringe service programs and opioid substitution programs in European countries (and others) the US remains historically low in terms of coverage and access with regard to these programs. This manuscript investigates predictors of historical change in drug treatment coverage for PWID in 90 US metropolitan statistical areas (MSAs) during 1993–2007, a period in which, overall coverage did not change. Methods Drug treatment coverage was measured as the number of PWID in drug treatment, as calculated by treatment entry and census data, divided by numbers of PWID in each MSA. Variables suggested by the Theory of Community Action (i.e., need, resource availability, institutional opposition, organized support, and service symbiosis) were analyzed using mixed-effects multivariate models within dependent variables lagged in time to study predictors of later change in coverage. Results Mean coverage was low in 1993 (6.7%; SD 3.7), and did not increase by 2007 (6.4%; SD 4.5). Multivariate results indicate that increases in baseline unemployment rate (β = 0.312; pseudo-p < 0.0002) predict significantly higher treatment coverage; baseline poverty rate (β = − 0.486; pseudo-p < 0.0001), and baseline size of public health and social work workforce (β = 0.425; pseudo-p < 0.0001) were predictors of later mean coverage levels, and baseline HIV prevalence among PWID predicted variation in treatment coverage trajectories over time (baseline HIV * Time: β = 0.039; pseudo-p < 0.001). Finally, increases in black/white poverty disparity from baseline predicted significantly higher treatment coverage in MSAs (β = 1.269; pseudo-p < 0.0001). Conclusions While harm reduction programs have historically been contested and difficult to implement in many US communities, and despite efforts to increase treatment coverage for PWID, coverage has not increased. Contrary to our hypothesis, epidemiologic need, seems not to be associated with change in treatment coverage over time. Resource availability and institutional opposition are important predictors of change over time in coverage. These findings suggest that new ways have to be found to increase drug treatment coverage in spite of economic changes and belt-tightening policy changes that will make this difficult.

Download Full-text