scholarly journals Social Data Linkage Environment

Author(s):  
Richard Trudeau

ABSTRACTObjectivesThe Social Data Linkage Environment (SDLE) at Statistics Canada promotes the innovative use of existing administrative and survey data to address important research questions and inform socio-economic policy through record linkage. It expands the potential of data integration across multiple domains, such as health, justice, education and income, through the creation of linked analytical data files without the need to collect additional data from Canadians.ApproachAt the core of the SDLE is a Derived Record Depository (DRD), essentially a national dynamic relational data base containing only basic personal identifiers. The DRD is created by linking selected Statistics Canada source index files for the purpose of producing a list of unique individuals. These files are brought into the environment, processed and linked only once to the DRD. Each individual in the DRD is assigned an SDLE identifier. Some of the source index files used to build the DRD include tax records, vital statistics registration records (births and deaths), and immigrant data. Updates to these data files are linked to the DRD on an ongoing basis. Only basic personal identifiers are stored in the DRD. Examples of personal identifiers stored in the DRD include surnames, given names, date of birth, sex, insurance numbers, parents' names, marital status, addresses (including postal codes), telephone numbers, immigration date, emigration date and date of death. The paired SDLE identifiers and source index file record IDs resulting from the record linkage are stored in a Key Registry. To reduce the risk of privacy intrusiveness and to minimize the risk of disclosure, source files are separated into source index files and source data files. Employees performing the record linkages in SDLE have access to only the basic personal identifiers needed for linkage. Employees who build the analytical files for research have access only to the data stripped of personal identifiers.ResultsThe SDLE is a highly secure environment that facilitates the creation of linked population data files for social analysis. It is not a large integrated data base.ConclusionThe SDLE program facilitates pan-Canadian social and economic statistical research. It is a record linkage environment that: increases the relevance of existing surveys without collecting new data; substantially increases the use of administrative data; generates new information without additional data collection; maintains the highest privacy and data security standards; and promotes a standardized approach to record linkage processes and methods.

Author(s):  
Colin Babyak ◽  
Abdelnasser Saidi

ABSTRACTObjectivesThe objectives of this talk are to introduce Statistics Canada’s Social Data Linkage Environment (SDLE) and to explain the methodology behind the creation of the central depository and how both deterministic and probabilistic record linkage techniques are used to maintain and expand the environment.ApproachWe will start with a brief overview of the SDLE and then continue with a discussion of how both deterministic linkages and probabilistic linkages (using Statistic Canada’s generalized record linkage software, G-Link) have been combined to create and maintain a very large central depository, which can in turn be linked to virtually any social data source for the ultimate end goal of analysis.ResultsAlthough Canada has a population of about 36 million people, the central depository contains some 300 million records to represent them, due to multiple addresses, names, etc. Although this allows for a significant reduction in missing links, it raises the spectre of additional false positive matches and has added computational complexity which we have had to overcome.ConclusionThe combination of deterministic and probabilistic record linkage strategies has been effective in creating the central depository for the SDLE. As more and more data are linked to the environment and we continue to refine our methodology, we can now move on to the ultimate goal of the SDLE, which is to analyze this vast wealth of linked data.


Author(s):  
Li Xue

There has been an increasing demand for analytics and research related to cross-cutting and horizontal issues in Canada, such as in the domains of housing, aging and immigration. Very often policy makers and stakeholders are posing a full spectrum of questions around a specific topic, requiring multidisciplinary evidence and data. Statistics Canada has a long history of record linkage. Over the past decade, the number of record linkage projects has increased exponentially. Several established platforms have been developed to facilitate linkage – Canadian Employer and Employer Database which brings together tax and employment records from both employees and employers; the Social Data Linkage Environment created to support linkages at the individuals level across a broad spectrum of social data (health, justice, education, socio-economic); and the Linkable File Environment for business data. The breadth of our data holdings married with record linkage capabilities allows the creation of data sets that crosses disciplines and areas or research. This presentation will showcase the innovative data integration approaches that Statistics Canada has advanced to meet the inter-disciplinary data needs. Statistics Canada are pioneering in some innovative linkages across various domains to help answer cross-cutting questions. For example, Longitudinal Administrative Databank linking longitudinal tax records to numerous other data files including tax records of spouses and children in the household, longitudinal Immigration Database linkage key and health records, is used to study economic impact of hospitalization, as well as better understand health outcomes of immigrants by various dimensions including socio-economic status. Other examples include the pilot projects linking Canadian Financial Capability Survey to tax records, to gauge the relationship between financial literacy and annual retirement savings behavior and Intergenerational Income Database being linked to Census to understand socio-economic factors affecting the intergenerational mobility. Rapid growth in data availability for research also poses new challenges on IM/IT, governance, access, capacity building, etc. As Statistics Canada has moved on a path of modernization, data integration is key to the development of new data sources to fill information gaps as we move forward.


Author(s):  
Richard Trudeau

ABSTRACTObjectivesIn April 2015, a Working Group on Record Linkage was created at Statistics Canada with the objective of achieving a common understanding of the concepts and processes involved in record linkage projects at Statistics Canada. ApproachA generic record linkage process model was mapped to reflect the general practices and activities involved in record linkage at Statistics Canada. The model was developed with a view for more general use by other statistical agencies involved in record linkage. It was built on the Generic Statistical Business Process Model v5.0 developed by the Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) for survey purposes. It also builds on international models of record linkage from Australia and the United States as well as record linkage methodology used at Statistics Canada. In addition, it was informed by the relevant legal and policy frameworks that govern all of Statistics Canada statistical activities. Over one hundred people involved in all aspects of record linkage at Statistics Canada were consulted during this process.ResultsAn activity-oriented Record Linkage Project Process Model was drafted and proposed as a standard for the agency. It breaks down the record linkage process into three meta-phases: project planning, record linkage, post-linkage activities. Each meta-phase is further divided into phases and sub-phases that describe the activities of the record linkage project from specification of needs to project close-out and evaluation. An additional feature of the model is a description of the outcome of each phase that can be used as a milestone marker or as a gateway to the next phase. ConclusionAs a descriptive model, the Record Linkage Project Process Model will inform management on the range of activities related to a record linkage project that go well beyond the function of matching records between two data files. It can also be used as a prescriptive model that will provide guidance to individuals engaging in a record linkage project.


1991 ◽  
Vol 30 (02) ◽  
pp. 117-123 ◽  
Author(s):  
L. L. Roos ◽  
A. Wajda

AbstractRecord linkage techniques can help identify the same patient for matching diverse files (hospital discharge abstracts, insurance claims, registries, Vital Statistics data) which contain similar identifiers. Prior knowledge of whether a linkage is feasible is important to prevent wasted effort (additional data collection or data manipulation), which decreases the cost-effectiveness of the linkage. Using examples generated by linking the Manitoba Health Services Commission data with Vital Statistics files, a method of estimating the information in each data set is presented first. Further, the feasibility of several different record linkage strategies is described and tested, given varying amounts of information. At the margin, relatively small amounts of information (having just one more variable to match with) can make a great difference. Probabilistic linkage’s great advantage was found in those situations where only a moderate amount of extra information was available.By using the above techniques when working with one or both files in a proposed record linkage project, a much more informed judgement can now be made as to whether a linkage will or will not work. In facilitating record linkage, flexibility of both software and the strategy for matching is very important.


2006 ◽  
Vol 9 (6) ◽  
pp. 712-717 ◽  
Author(s):  
Jessica D. Y. Lee ◽  
Lyle J. Palmer

AbstractThe Western Australian Twin Register (WATR) was established in 1997 to study the health of all child multiples born in Western Australia (WA). The Register has until recently consisted of all multiples born in WA between 1980 and 1997. Using unique record linkage capacities available through the WA data linkage system, we have subsequently been able to identify all multiple births born in WA since 1974. New affiliations with the Australian Twin Registry and the WA Institute for Medical Research are further enabled by the use of the WA Genetic Epidemiology Resource — a high-end bioinformatics infrastructure that allows efficient management of health datasets and facilitates collaborative research capabilities. In addition to this infrastructure, funding provided by these institutions has allowed the extension of the WATR to include a greater number of WA multiples, including those born between 1974 and 1979, and from 1998 onwards. These resources are in the process of being enabled for national and international access.


2018 ◽  
Vol 97 (4) ◽  
pp. 375-377
Author(s):  
Irina V. Egorysheva

The article is devoted to the participation of the outstanding dental hygienist F. F. Erisman in the development of the Moscow low territorial sanitary organization. Under his leadership, there was carried out a large-scale study of the impact of conditions of the work and life on the health of plant workers, served as a model for similar types of sanitary-statistical research in a number of rural provinces. F. F. Erisman actively participated in the work of the sanitary organization of the Moscow gubernia Zemstvo, the creation of the first district sanitary Bureau.


2017 ◽  
Vol 25 (1) ◽  
pp. 149-160 ◽  
Author(s):  
Giovanni Benedetto ◽  
Alessia Di Prima ◽  
Salvatore Sciacca ◽  
Giuseppe Grosso

We described the design of a web-based application (the Software Integrated Cancer Registry—SWInCaRe) used to administer data in a cancer registry and tested its validity and usability. A sample of 11,680 records was considered to compare the manual and automatic procedures. Sensibility and specificity, the Health IT Usability Evaluation Scale, and a cost-efficiency analysis were tested. Several data sources were used to build data packages through text-mining and record linkage algorithms. The automatic procedure showed small yet measurable improvements in both data linkage process and cancer cases estimation. Users perceived the application as useful to improve the time of coding and difficulty of the process: both time and cost-analysis were in favor of the automatic procedure. The web-based application resulted in a useful tool for the cancer registry, but some improvements are necessary to overcome limitations observed and to further automatize the process.


Author(s):  
Astrid Guttmann ◽  
Maria Chiu ◽  
Michael Lebenbaum ◽  
Kelvin Lam ◽  
Nelson Chong ◽  
...  

ABSTRACTObjectives Ontario, the most populous province in Canada, has a universal healthcare system that routinely collects health administrative data on its 13 million legal residents that is used for health research. Record linkage has become a vital tool for this research by enriching this data with the Immigration, Refugees and Citizenship Canada (IRCC) Permanent Resident database and the Office of the Registrar General’s Vital Statistics-Death (VSD) registry. Our objectives were to estimate linkage rates and compare characteristics of individuals in the linked versus unlinked files. Approach We used both deterministic and probabilistic linkage methods to link the IRCC database (1985-2012) and VSD registry (1990-2012) to the Ontario’s Registered Persons Database. Linkage rates were estimated and standardized differences were used to assess differences in socio-demographic and other characteristics between the linked and unlinked records. Results The overall linkage rates for the IRCC database and VSD registry were 86.4% and 96.2%, respectively. The majority (68.2%) of the record linkages in IRCC were achieved after the three deterministic passes with the remaining 18.2% being linked probabilistically. Similarly the majority (79.8%) of the record linkages in the ORGD were linked using deterministic record linkage and the remaining 16.3% were linked after probabilistic and manual review. Unlinked and linked files were similar for most characteristics, such as age and marital status for IRCC and sex and most causes of death for VSD. However, lower linkage rates were observed among people born in East Asia (78%) in the IRCC database and certain causes of death in the VSD registry, namely perinatal conditions (61.3%) and congenital anomalies (81.3%). Conclusion The linkages of immigration and vital statistics data to existing population-based healthcare data in Ontario, Canada will enable many novel cross-sectional and longitudinal studies to be conducted. Analytic techniques to account for sub-optimal linkage rates may be required in studies of certain ethnic groups or certain causes of death among children and infants.


2021 ◽  
Author(s):  
Stierman Bryan ◽  
Joseph Afful ◽  
Margaret Carroll ◽  
Chen Te-Ching ◽  
Davy Orlando ◽  
...  

This report explains the creation of the 2017–March 2020 Pre-Pandemic Data Files, provides recommendations for and limitations of the files’ use, and presents prevalence estimates for select health outcomes based on the files.


2021 ◽  
pp. e2020046
Author(s):  
Pierre Brochu

To balance researchers’ need for detailed information with respondents’ confidentiality concerns, statistical agencies such as Statistics Canada commonly offer two versions of the same dataset: a public use file that is readily available and a master file with richer information but to which access is restricted. This article examines the choice of using public use versus master files of the Labour Force Survey (LFS). The article also provides researchers with a unified source of LFS information, including a thorough discussion of the structure of the LFS and its implication for research, such as the creation of mini-panels.


Sign in / Sign up

Export Citation Format

Share Document