SOEP-RV: Linking German Socio-Economic Panel Data to Pension Records

Abstract The aim of the project SOEP-RV is to link data from participants in the German Socio-Economic Panel (SOEP) survey to their individual Deutsche Rentenversicherung (German Pension Insurance) records. For all SOEP respondents who give explicit consent to record linkage, SOEP-RV creates a linked dataset that combines the comprehensive multi-topic SOEP data with detailed cross-sectional and longitudinal data on social security pension records covering the individual’s entire insurance history. This article provides an overview of the record linkage project, highlights potentials for analysis of the linked data, compares key SOEP and pension insurance variables, and suggests a re-weighting procedure that corrects for selectivity. It concludes with details on the process of obtaining the data for scientific use.

Download Full-text

Accuracy of reporting of Aboriginality on administrative health data collections using linked data in NSW, Australia

BMC Medical Research Methodology ◽

10.1186/s12874-020-01152-2 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Michael A. Nelson ◽

Kim Lim ◽

Jason Boyd ◽

Damien Cordery ◽

Allan Went ◽

...

Keyword(s):

Record Linkage ◽

Linked Data ◽

Aboriginal People ◽

Patient Survey ◽

Health Data ◽

Weight Of Evidence ◽

Cross Sectional ◽

Predictive Values ◽

Administrative Health Data ◽

Data Collections

Abstract Background Aboriginal people are under-reported on administrative health data in Australia. Various approaches have been used or proposed to improve reporting of Aboriginal people using linked records. This cross-sectional study used self-reported Aboriginality from the NSW Patient Survey Program (PSP) as a reference standard to assess the accuracy of reporting of Aboriginal people on NSW Admitted Patient (APDC) and Emergency Department Data Collections (EDDC), and compare the accuracy of selected approaches to enhance reporting Aboriginality using linked data. Methods Ten PSP surveys were linked to five administrative health data collections, including APDC, EDDC, perinatal, and birth and death registration records. Accuracy of reporting of Aboriginality was assessed using sensitivity, specificity, and positive and negative predictive values (PPVs and NPVs) and F score for the EDDC and APDC as baseline and four enhancement approaches using linked records: “Most recent linked record”, “Ever reported as Aboriginal”, and two approaches using a weight of evidence, “Enhanced Reporting of Aboriginality (ERA) algorithm” and “Multi-stage median (MSM)”. Results There was substantial under-reporting of Aboriginality on APDC and EDDC records (sensitivities 84 and 77% respectively) with PPVs of 95% on both data collections. Overall, specificities and NPVs were above 98%. Of people who were reported as Aboriginal on the PSP, 16% were not reported as Aboriginal on any of their linked records. Record linkage approaches generally increased sensitivity, accompanied by decrease in PPV with little change in overall F score for the APDC and an increase in F score for the EDDC. The “ERA algorithm” and “MSM” approaches provided the best overall accuracy. Conclusions Weight of evidence approaches are preferred when record linkage is used to improve reporting of Aboriginality on administrative health data collections. However, as a substantial number of Aboriginal people are not reported as Aboriginal on any of their linked records, improvements in reporting are incomplete and should be taken into account when interpreting results of any analyses. Enhancement of reporting of Aboriginality using record linkage should not replace efforts to improve recording of Aboriginal people at the point of data collection and addressing barriers to self-identification for Aboriginal people.

Download Full-text

Testing for Error Cross-Sectional Independence in a Two-Way Error Components Panel Data Model

SSRN Electronic Journal ◽

10.2139/ssrn.2708185 ◽

2015 ◽

Author(s):

Guangyu Mao

Keyword(s):

Panel Data ◽

Data Model ◽

Panel Data Model ◽

Cross Sectional ◽

Error Components

Download Full-text

An Examination of Audit Report Lag for Banks: A Panel Data Approach

Auditing A Journal of Practice & Theory ◽

10.2308/aud.2000.19.2.159 ◽

2000 ◽

Vol 19 (2) ◽

pp. 159-174 ◽

Cited By ~ 43

Author(s):

B. Charlene Henderson ◽

Steven E. Kaplan

Keyword(s):

Data Analysis ◽

Panel Data ◽

Panel Data Analysis ◽

Explanatory Power ◽

Omitted Variables ◽

Audit Report ◽

Cross Sectional ◽

Cross Sectional Analysis ◽

The Cross ◽

Sectional Analysis

This study investigates the determinants of audit report lag (ARL) for a sample of banks. Researchers have been interested in the determinants of ARL, in part, because it impacts the timeliness of public disclosures. However, prior ARL research has relied exclusively on regression analysis of cross-sectional samples of companies from many industries. In addition to focusing exclusively on banks, panel data analysis is introduced and compared with cross-sectional analysis to demonstrate its power in dynamic settings and its potential to improve estimation. Results reveal important differences between cross-sectional analysis and panel data analysis. First, bank size is negatively related to ARL in cross-section but positively related to ARL using panel data analysis. The cross-sectional size estimate is subject to omitted variables bias, and furthermore, cross-sectional analysis fails to capture variation in size over time in relation to ARL. Panel data analysis both accounts for omitted variables and captures the dynamics of the relationship between size and ARL. As well, the panel data model's explanatory power far exceeds that of the cross-sectional model. This is primarily due to the panel model's use of firm-specific intercepts that both capture the role of reporting tradition and eliminate heterogeneity bias. Thus, panel data analysis proves to be a powerful tool in the analysis of ARL.

Download Full-text

Just Like Leaves in the Wind? Exploring the Effect of the Interplay of Media Coverage and Personal Characteristics on Issue Salience

10.1093/oso/9780198792130.003.0003 ◽

2017 ◽

Author(s):

Agatha Kratz ◽

Harald Schoen

Keyword(s):

Individual Differences ◽

Panel Data ◽

Financial Crisis ◽

Media Coverage ◽

News Coverage ◽

Personal Characteristics ◽

Election Campaign ◽

Issue Salience ◽

Cross Sectional ◽

The Impact

This chapter explores the effect of the interplay of personal characteristics and news coverage on issue salience during the 2009 to 2015 period and during the election campaign in 2013. We selected four topics that played a considerable role during this period: the labor market, pensions and healthcare, immigration, and the financial crisis. The evidence from pooled cross-sectional data and panel data supports the notion that news coverage affects citizens’ issue salience. For obtrusive issues, news coverage does not play as large a role as for rather remote topics like the financial crisis and immigration. The results also lend credence to the idea that political predilections and other individual differences are related to issue salience and constrain the impact of news coverage on voters’ issue salience. However, the evidence for the interplay of individual differences and media coverage proved mild at best.

Download Full-text

Record linkage of routine data with cohorts’ data of infants under European and Portuguese law

European Journal of Public Health ◽

10.1093/eurpub/ckaa166.178 ◽

2020 ◽

Vol 30 (Supplement_5) ◽

Author(s):

J Doetsch ◽

I Lopes ◽

R Redinha ◽

H Barros

Keyword(s):

Big Data ◽

Data Processing ◽

Data Protection ◽

Record Linkage ◽

Data Science ◽

Personal Data ◽

Routine Data ◽

Cohort Data ◽

Education Data ◽

Explicit Consent

Abstract The usage and exchange of “big data” is at the forefront of the data science agenda where Record Linkage plays a prominent role in biomedical research. In an era of ubiquitous data exchange and big data, Record Linkage is almost inevitable, but raises ethical and legal problems, namely personal data and privacy protection. Record Linkage refers to the general merging of data information to consolidate facts about an individual or an event that are not available in a separate record. This article provides an overview of ethical challenges and research opportunities in linking routine data on health and education with cohort data from very preterm (VPT) infants in Portugal. Portuguese, European and International law has been reviewed on data processing, protection and privacy. A three-stage analysis was carried out: i) interplay of threefold law-levelling for Record Linkage at different levels; ii) impact of data protection and privacy rights for data processing, iii) data linkage process' challenges and opportunities for research. A framework to discuss the process and its implications for data protection and privacy was created. The GDPR functions as utmost substantial legal basis for the protection of personal data in Record Linkage, and explicit written consent is considered the appropriate basis for the processing sensitive data. In Portugal, retrospective access to routine data is permitted if anonymised; for health data if it meets data processing requirements declared with an explicit consent; for education data if the data processing rules are complied. Routine health and education data can be linked to cohort data if rights of the data subject and requirements and duties of processors and controllers are respected. A strong ethical context through the application of the GDPR in all phases of research need to be established to achieve Record Linkage between cohort and routine collected records for health and education data of VPT infants in Portugal. Key messages GDPR is the most important legal framework for the protection of personal data, however, its uniform approach granting freedom to its Member states hampers Record Linkage processes among EU countries. The question remains whether the gap between data protection and privacy is adequately balanced at three legal levels to guarantee freedom for research and the improvement of health of data subjects.

Download Full-text

E-bike user groups and substitution effects: evidence from longitudinal travel data in the Netherlands

Transportation ◽

10.1007/s11116-021-10195-3 ◽

2021 ◽

Author(s):

Mathijs de Haas ◽

Maarten Kroesen ◽

Caspar Chorus ◽

Sascha Hoogendoorn-Lanser ◽

Serge Hoogendoorn

Keyword(s):

The Netherlands ◽

Longitudinal Data ◽

Latent Class ◽

Good Alternative ◽

Substitution Effects ◽

Or Education ◽

Cross Sectional ◽

User Groups ◽

Using Data ◽

Mode Of Transport

AbstractIn recent years, the e-bike has become increasingly popular in many European countries. With higher speeds and less effort needed, the e-bike is a promising mode of transport to many, and it is considered a good alternative for certain car trips by policy-makers and planners. A major limitation of many studies that investigate such substitution effects of the e-bike, is their reliance on cross-sectional data which do not allow an assessment of within-person travel mode changes. As a consequence, there is currently no consensus about the e-bike’s potential to replace car trips. Furthermore, there has been little research focusing on heterogeneity among e-bike users. In this respect, it is likely that different groups exist that use the e-bike for different reasons (e.g. leisure vs commute travel), something which will also influence possible substitution patterns. This paper contributes to the literature in two ways: (1) it presents a statistical analysis to assess the extent to which e-bike trips are substituting trips by other travel modes based on longitudinal data; (2) it reveals different user groups among the e-bike population. A Random Intercept Cross-Lagged Panel Model is estimated using five waves of data from the Netherlands Mobility Panel. Furthermore, a Latent Class Analysis is performed using data from the Dutch national travel survey. Results show that, when using longitudinal data, the substitution effects between e-bike and the competing travel modes of car and public transport are not as significant as reported in earlier research. In general, e-bike trips only significantly reduce conventional bicycle trips in the Netherlands, which can be regarded an unwanted effect from a policy-viewpoint. For commuting, the e-bike also substitutes car trips. Furthermore, results show that there are five different user groups with their own distinct behaviour patterns and socio-demographic characteristics. They also show that groups that use the e-bike primarily for commuting or education are growing at a much higher rate than groups that mainly use the e-bike for leisure and shopping purposes.

Download Full-text

Component analysis in cross-sectional and longitudinal data

Psychometrika ◽

10.1007/bf02294198 ◽

1988 ◽

Vol 53 (1) ◽

pp. 123-134 ◽

Cited By ~ 61

Author(s):

Roger E. Millsap ◽

William Meredith

Keyword(s):

Longitudinal Data ◽

Component Analysis ◽

Cross Sectional

Download Full-text

P3-132: Motivational reserve as risk factor in the development of mild cognitive impairment and Alzheimer's disease: Cross-sectional and longitudinal data

Alzheimer s & Dementia ◽

10.1016/j.jalz.2009.04.1107 ◽

2009 ◽

Vol 5 (4S_Part_13) ◽

pp. P383-P383

Author(s):

Simon Forstmeier ◽

Michael Wagner ◽

Wolfgang Maier ◽

Hendrik Van Den Bussche ◽

Birgitt Wiese ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Risk Factor ◽

Cognitive Impairment ◽

Mild Cognitive Impairment ◽

Longitudinal Data ◽

Cross Sectional

Download Full-text

Nonparametric panel data regression with parametric cross-sectional dependence

Econometrics Journal ◽

10.1093/ectj/utab016 ◽

2021 ◽

Author(s):

Alexandra Soberon ◽

Juan M Rodriguez-Poo ◽

Peter M Robinson

Keyword(s):

Panel Data ◽

Monte Carlo Study ◽

Generalized Least Squares ◽

Linear Estimator ◽

Dependence Structure ◽

Finite Sample ◽

Cross Sectional ◽

Panel Data Regression ◽

Local Linear Estimator ◽

Cross Sectional Dependence

Abstract In this paper, we consider efficiency improvement in a nonparametric panel data model with cross-sectional dependence. A Generalized Least Squares (GLS)-type estimator is proposed by taking into account this dependence structure. Parameterizing the cross-sectional dependence, a local linear estimator is shown to be dominated by this type of GLS estimator. Also, possible gains in terms of rate of convergence are studied. Asymptotically optimal bandwidth choice is justified. To assess the finite sample performance of the proposed estimators, a Monte Carlo study is carried out. Further, some empirical applications are conducted with the aim of analyzing the implications of the European Monetary Union for its member countries.

Download Full-text

Detecting common breaks in the means of high dimensional cross-dependent panels

Econometrics Journal ◽

10.1093/ectj/utab028 ◽

2021 ◽

Author(s):

Lajos Horváth ◽

Zhenya Liu ◽

Gregory Rice ◽

Yuqian Zhao

Keyword(s):

Panel Data ◽

Common Factors ◽

Real Data ◽

Change Points ◽

High Dimensional ◽

Asymptotic Results ◽

Cross Sectional ◽

Data Set ◽

Monte Carlo Simulation Study ◽

Cross Sectional Dependence

Abstract The problem of detecting change points in the mean of high dimensional panel data with potentially strong cross–sectional dependence is considered. Under the assumption that the cross–sectional dependence is captured by an unknown number of common factors, a new CUSUM type statistic is proposed. We derive its asymptotic properties under three scenarios depending on to what extent the common factors are asymptotically dominant. With panel data consisting of N cross sectional time series of length T, the asymptotic results hold under the mild assumption that min {N, T} → ∞, with an otherwise arbitrary relationship between N and T, allowing the results to apply to most panel data examples. Bootstrap procedures are proposed to approximate the sampling distribution of the test statistics. A Monte Carlo simulation study showed that our test outperforms several other existing tests in finite samples in a number of cases, particularly when N is much larger than T. The practical application of the proposed results are demonstrated with real data applications to detecting and estimating change points in the high dimensional FRED-MD macroeconomic data set.

Download Full-text