scholarly journals Population Data Science: The science of data about people

Author(s):  
Kim McGrail ◽  
Kerina Jones

IntroductionSocietal and individual benefits of data-intensive science are substantial but raise challenges of balancing individual privacy and public good, while building appropriate governance and socio-technical systems to support data-intensive science. We set out to define a new field of inquiry to move collective interests forward. Objectives and ApproachOur objectives were: 1. To create a concise definition of the emerging field of Population Data Science; 2. To highlight the characteristics and challenges of Population Data Science; 3. To differentiate Population Data Science from existing fields of data science and informatics; and 4. To discuss the implications and future opportunities for Population Data Science. Objectives 1 and 2 were met largely through International Population Data Linkage Network (IPDLN) member engagement, Objective 3 was evaluated via literature review, and Objective 4 was achieved through iterative and collective work on a peer-reviewed position paper. ResultsWe define Population Data Science succinctly as the science of data about people. It is related to, but distinct from, the fields of data science and informatics. A broader definition includes four characteristics of: i) data use for positive impact on individuals and populations; ii) bringing together and analyzing data from multiple sources; iii) identifying population-level insights; and iv) developing safe, privacy-sensitive and ethical infrastructure to support research. One implication of these characteristics is that few individuals or organisations possess all of the requisite knowledge and skills comprising Population Data Science, so this is by nature a multi-disciplinary “team science” field. There is a need to advance various aspects of science, such as data linkage technology, various forms of analytics, and methods of public engagement. Conclusion/ImplicationsThese implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field, will catalyze significant advances in our understanding of society, health, and human behavior and increase the impact of our research.

Author(s):  
Kim McGrail ◽  
Kerina Jones ◽  
Ashley Akbari ◽  
Tell Bennett ◽  
Andrew Boyd ◽  
...  

Information is increasingly digital, creating opportunities to respond to pressing issues about human populations in near real time using linked datasets that are large, complex, and diverse. The potential social and individual benefits that can come from data-intensive science are large, but raise challenges of balancing individual privacy and the public good, building appropriate socio-technical systems to support data-intensive science, and determining whether defining a new field of inquiry might help move those collective interests and activities forward. A combination of expert engagement, literature review, and iterative conversations led to our conclusion that defining the field of Population Data Science (challenge 3) will help address the other two challenges as well. We define Population Data Science succinctly as the science of data about people and note that it is related to but distinct from the fields of data science and informatics. A broader definition names four characteristics of: data use for positive impact on citizens and society; bringing together and analyzing data from multiple sources; finding population-level insights; and developing safe, privacy-sensitive and ethical infrastructure to support research. One implication of these characteristics is that few people possess all of the requisite knowledge and skills of Population Data Science, so this is by nature a multi-disciplinary field. Other implications include the need to advance various aspects of science, such as data linkage technology, various forms of analytics, and methods of public engagement. These implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field, can catalyze significant advances in our understanding of trends in society, health, and human behavior. 


BMJ Open ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. e043010
Author(s):  
Jane Lyons ◽  
Ashley Akbari ◽  
Fatemeh Torabi ◽  
Gareth I Davies ◽  
Laura North ◽  
...  

IntroductionThe emergence of the novel respiratory SARS-CoV-2 and subsequent COVID-19 pandemic have required rapid assimilation of population-level data to understand and control the spread of infection in the general and vulnerable populations. Rapid analyses are needed to inform policy development and target interventions to at-risk groups to prevent serious health outcomes. We aim to provide an accessible research platform to determine demographic, socioeconomic and clinical risk factors for infection, morbidity and mortality of COVID-19, to measure the impact of COVID-19 on healthcare utilisation and long-term health, and to enable the evaluation of natural experiments of policy interventions.Methods and analysisTwo privacy-protecting population-level cohorts have been created and derived from multisourced demographic and healthcare data. The C20 cohort consists of 3.2 million people in Wales on the 1 January 2020 with follow-up until 31 May 2020. The complete cohort dataset will be updated monthly with some individual datasets available daily. The C16 cohort consists of 3 million people in Wales on the 1 January 2016 with follow-up to 31 December 2019. C16 is designed as a counterfactual cohort to provide contextual comparative population data on disease, health service utilisation and mortality. Study outcomes will: (a) characterise the epidemiology of COVID-19, (b) assess socioeconomic and demographic influences on infection and outcomes, (c) measure the impact of COVID-19 on short -term and longer-term population outcomes and (d) undertake studies on the transmission and spatial spread of infection.Ethics and disseminationThe Secure Anonymised Information Linkage-independent Information Governance Review Panel has approved this study. The study findings will be presented to policy groups, public meetings, national and international conferences, and published in peer-reviewed journals.


2021 ◽  
Vol 6 ◽  
pp. 209
Author(s):  
Emily Dema ◽  
Andrew J Copas ◽  
Soazig Clifton ◽  
Anne Conolly ◽  
Margaret Blake ◽  
...  

Background: Britain’s National Surveys of Sexual Attitudes and Lifestyles (Natsal) have been undertaken decennially since 1990 and provide a key data source underpinning sexual and reproductive health (SRH) policy. The COVID-19 pandemic disrupted many aspects of sexual lifestyles, triggering an urgent need for population-level data on sexual behaviour, relationships, and service use at a time when gold-standard in-person, household-based surveys with probability sampling were not feasible. We designed the Natsal-COVID study to understand the impact of COVID-19 on the nation’s SRH and assessed the sample representativeness. Methods: Natsal-COVID Wave 1 data collection was conducted four months (29/7-10/8/2020) after the announcement of Britain’s first national lockdown (23/03/2020). This was an online web-panel survey administered by survey research company, Ipsos MORI. Eligible participants were resident in Britain, aged 18-59 years, and the sample included a boost of those aged 18-29. Questions covered participants’ sexual behaviour, relationships, and SRH service use. Quotas and weighting were used to achieve a quasi-representative sample of the British general population. Participants meeting criteria of interest and agreeing to recontact were selected for qualitative follow-up interviews. Comparisons were made with contemporaneous national probability surveys and Natsal-3 (2010-12) to understand bias. Results: 6,654 participants completed the survey and 45 completed follow-up interviews. The weighted Natsal-COVID sample was similar to the general population in terms of gender, age, ethnicity, rurality, and, among sexually-active participants, numbers of sexual partners in the past year. However, the sample was more educated, contained more sexually-inexperienced people, and included more people in poorer health. Conclusions: Natsal-COVID Wave 1 rapidly collected quasi-representative population data to enable evaluation of the early population-level impact of COVID-19 and lockdown measures on SRH in Britain and inform policy. Although sampling was less representative than the decennial Natsals, Natsal-COVID will complement national surveillance data and Natsal-4 (planned for 2022).


Author(s):  
Арслан Константинович Балтыков ◽  
Юлия Сергеевна Гермашева

Авторы статьи придают особую важность задаче развития навыков работы с информацией в учебном процессе. Они основывались на идеях педагогического конструктивизма и системе взаимного обучения. В статье делается акцент на проблеме изложения как неотъемлемой части учебного процесса. Авторы смогли выделить в привычной схеме традиционного изложения материала целый ряд недостатков, что побудило их к тому, чтобы провести работу над излагаемым материалом, добавив уровни и последовательность их представления. В работе описывается влияние фактора ответственности на степень вовлеченности обучающихся. Исходя из результатов этих исследований, авторы статьи смогли создать инновационную схему проведения учебного занятия. При расчете продолжительности активного внимания у студентов в создании схемы они учли такую современную тенденцию, как клиповое мышление. Благодаря предложенной схеме, привычная передача информации от одного нескольким преобразовалась в коллективную работу. В статье описывается положительное влияние эксперимента на отношения в группе. В ходе исследования авторы отмечают, что поэтапное выполнение работы в специальных ролевых звеньях стимулирует принятие ответственности обучающегося за свое обучение. Помимо этого, рассмотрены особенности влияния групповой ответственности и преимущества применения этой схемы для дистанционного обучения. The authors of the paper attach special importance to the development of information skills in the educational process. They were inspired by the ideas of pedagogical constructivism and the system of mutual learning. The paper focuses on the problem of presentation of learning material as an integral part of the educational process. The authors of the paper were able to identify a number of shortcomings in the usual scheme of traditional presentation of the material, which prompted them to work on the material being presented, adding levels and sequence of their presentation. The paper describes the influence of the responsibility factor on the degree of involvement of students. Based on the results of these studies, the authors were able to create an innovative scheme for conducting training sessions. When calculating the duration of active attention of students in creating the scheme, they took into account such a modern trend as clip thinking. Thanks to the proposed scheme, the usual transfer of information from one to several has been transformed into collective work. The paper describes the positive impact of the experiment on the relationships in the group. In the course of the study, the authors note that step-by-step performance of work in special role links encourages students to take responsibility for their training. In addition, the paper discusses the impact of group responsibility and the advantages of using this scheme for distance learning.


Author(s):  
Tavinder Kaur Ark ◽  
Sarah Kesselring ◽  
Brent Hills ◽  
Kim McGrail

BackgroundPopulation Data BC (PopData) was established as a multi-university data and education resourceto support training and education, data linkage, and access to individual level, de-identified data forresearch in a wide variety of areas including human and community development and well-being. ApproachA combination of deterministic and probabilistic linkage is conducted based on the quality andavailability of identifiers for data linkage. PopData utilizes a harmonized data request and approvalprocess for data stewards and researchers to increase efficiency and ease of access to linked data.Researchers access linked data through a secure research environment (SRE) that is equipped witha wide variety of tools for analysis. The SRE also allows for ongoing management and control ofdata. PopData continues to expand its data holdings and to evolve its services as well as governanceand data access process. DiscussionPopData has provided efficient and cost-effective access to linked data sets for research. After twodecades of learning, future planned developments for the organization include, but are not limitedto, policies to facilitate programs of research, access to reusable datasets, evaluation and use of newdata linkage techniques such as privacy preserving record linkage (PPRL). ConclusionPopData continues to maintain and grow the number and type of data holdings available for research.Its existing models support a number of large-scale research projects and demonstrate the benefitsof having a third-party data linkage and provisioning center for research purposes. Building furtherconnections with existing data holders and governing bodies will be important to ensure ongoingaccess to data and changes in policy exist to facilitate access for researchers.


2020 ◽  
Author(s):  
Therese Nordberg Hanvold ◽  
Petter Kristensen ◽  
Karina Corbett ◽  
Rachel Louise Hasting ◽  
Ingrid Sivesind Mehlum

Abstract Background The study objective was to evaluate the impact of a population-level intervention (the IA Agreement) on the one-year risk for long-term sickness absence spells (LSAS) among young and middle aged workers in Norway. Methods Using an observational design, we conducted a quasi-experimental study to analyse registry data on individual LSAS for all employed individuals in 2000 (n=298 690) and 2005 (n=352 618), born in Norway between 1976 and 1967. The intervention of interest was the tripartite agreement for a more inclusive working life (the IA Agreement). We estimated difference in pre-post differences (DID) in LSAS between individuals working in IA companies with the intervention and companies without, in 2000 and 2005. We used logistic regression models and present odds ratios (DID OR) with accompanying 95% CI. We stratified analyses by sex, industry and company size. Results . We found no significant change in the overall risk of long-term sickness absence spells after implementing the intervention among young and middle aged workers. Stratified by sex, the intervention resulted in a slight decrease in LSAS risk among female workers (DID OR 0.93 (0.91-0.96)) while the intervention showed no impact among male workers (DID OR 1.01 (0.97-1.06)). We found that companies signing the IA Agreement were large (≥50­ employees) and often within the manufacturing and health and social sectors. In large manufacturing companies, we found a reduction in LSAS, among workers both in companies with and without the intervention, resulting in no statistically significant impact of the IA intervention. In large health and social companies, we found an increase in LSAS among workers both in companies with and without the intervention. The increase was smaller among the workers in companies offering the IA intervention compared with workers in companies without, resulting in a positive impact of the IA intervention in the health and social industry. This impact was statistically significant only among female workers. Conclusions The results indicate that the impact of the IA Agreement on the risk of long-term sickness absence spells varies considerably depending on sex and industry. These findings suggest that reducing LSAS may warrant industry-specific interventions.


Author(s):  
William A Ghali ◽  
Michael J Schull

We write to you, here in the pages of the International Journal of Population Data Science, for the second time in our capacity of co-directors of the International Population Data Linkage Network (IPDLN – www.ipdln.org). Time has certainly passed quickly since our first communication, where we introduced ourselves, and discussed planned initiatives for our tenure as leads of the IPDLN. Our network’s scientific community is steadily growing and thriving in an era of heightened interest around all things ‘data’. Indeed, there is great enthusiasm for all initiatives that explore ways of harnessing information systems and multisource data to enhance collective knowledge of health matters so that better decisions can be made by governments, system planners, providers, and patients. Never before have such initiatives attracted more attention. It is in this context of heightened interest and relevance around IPDLN and its science that we prepare to convene in Banff, Alberta, Canada for the 5th biennial IPDLN Conference – September 11-14. The conference, to be held at the inspiring Banff Centre (www.banffcentre.ca), is almost sold out, with only limited space remaining for late registrants. A tremendous program has been created through the oversight of Scientific Program co-chairs, Drs. Astrid Guttman and Hude Quan. A compelling roster of plenary lectures from Drs. Diane Watson, Jennifer Walker, and Osmar Zaïane is eagerly anticipated, as are topical panel discussions, an entertaining Science Slam session, and a terrific social program. These sessions will be surrounded by rich scientific oral and poster presentations arising from the more than 450 scientific abstracts submitted for review. We are so pleased to see this vibrant scientific engagement from the IPDLN membership and students, and look forward to hosting all delegates in Banff. The Banff conference will also be the venue at which we announce the new Directorship of the IPDLN for the next two years (2019 and 2020). As co-directors, we engaged with a number of individuals and organizations with interest in leading the IPDLN. In the end, two compelling Directorship applications were submitted – one a joint bid from Australia’s Population Health Research Network and the South Australia Northern Territory DataLink, and the other from the US-based Actionable Intelligence for Social Policy. IPDLN members submitted votes on these strong leadership bids through an online voting process, and while the excellence and appeal of both bids was apparent in strong voter support for both, a winning bid has been confirmed, and it will (as mentioned) be announced at the upcoming September conference. As we look forward to the Banff meeting with great anticipation, we are compelled to acknowledge the growing IPDLN legacy created by past directors. We are particularly indebted to our immediate predecessor, Dr. David Ford, and his team at Swansea University. Their work in hosting the 2016 IPDLN conference has been an inspiration to us in the planning of this year’s conference, and their crucial and foundational work in creating an IT platform for the IPDLN website, the membership database, and the new International Journal for Population Data Science has brought the IPDLN to a new level of organizational sophistication. Over the last 18 months, our co-directorship teams from the Institute for Clinical Evaluative Sciences in Ontario and the O’Brien Institute for Public Health at the University of Calgary have built on the foundation established by prior directors to update/enhance the IPDLN website and membership database. The IPDLN has more members than ever before representing a greater number of countries, and we have a more formalized governance structure with the creation of an Executive Committee that will include immediate past-Directors in order to better ensure continuity. A new Executive Committee will be elected by the IPDLN membership following the Banff conference. The waiting is almost over and IPDLN 2018 is upon us! Our scientific domain has never had the prominence or level of anticipation that we currently see. And the IPDLN has grown in its size, vibrancy and scientific scope. The opportunities for us are boundless, and the timing of our upcoming conference could not be better. We are honoured, with our respective organizations, to have had this opportunity to serve as co-directors over the past two years, and look forward to seeing many of you very soon. For those of you who are unable to travel to Canada’s Rocky Mountains this year, we look forward to connecting with you at a later time in the IPDLN’s continuing upward journey.


2007 ◽  
Vol 12 (3) ◽  
pp. 255-270 ◽  
Author(s):  
Robert Brulle ◽  
Liesel Turner ◽  
Jason Carmichael ◽  
J. Jenkins

Population-level analyses of SMOs typically have relied on a single source for data, most commonly the Encyclopedia of Associations (EoA). However, the validity of this procedure has been drawn into question by recent organizational studies. To examine the impact of using different sources to estimate SMO populations, we compile a comprehensive population dataset of national and regional U.S. environmental movement organizations (or EMOs) over a 100-year time period using 155 different sources. We use this data to evaluate the accuracy and selection biases in five major compilations of U.S. EMOs. The analysis shows that all single sources are selective, tapping specific sections of the environmental movement. Multiple sources are needed to capture a comprehensive population of EMOs. Researchers should be aware of the limitations of specific sources before drawing conclusions about population parameters.


Author(s):  
Rebecca Ritte ◽  
Jane Freemantle ◽  
Fiona Mensah ◽  
Mary Sullivan

ABSTRACTObjectivesAn accurate picture of infant mortality informs society of its social progress. It is a key indicator of how effective public health policies and programs are in caring for the most vulnerable in our society. Currently, at the population level, Victorian data on Aboriginal and Torres Strait Islander births and deaths are excluded from Australian vital statistics. The Victorian Aboriginal Mortality Study aimed to provide a more complete and accurate population profile of Aboriginal births in Victoria using population data linkage of Victorian statutory and administrative datasets. ApproachTwo population statutory datasets, the Victorian Perinatal Data Collection (VPDC) and Victorian Registry of Births, Deaths and Marriages (RBDM) were linked, using probabilistic matching with mother’s name and surname, child’s date of birth and sex, for all births that occurred in Victoria between 1988 and 2008, inclusive to more accurately ascertain births to mothers and fathers who identified as Aboriginal and/or Torres Strait Islander (hereafter respectfully ‘Aboriginal’).ResultsOver 1.34 million files, reporting births between 1988 and 2008, were linked. However, due to data integrity issues for Indigenous identification prior to 1998, the years between 1999 and 2008 only were used in the development of the birth cohort. Matching the VPDC with the RBDM resulted in identifying an additional 4,333 live births where mother and/or father identified as Aboriginal, representing an 87% increase in the number of births previously recorded as Aboriginal by the VPDC*. The largest increase (186%) in the number of births where mother and/or father identified as Aboriginal births was observed within the Victorian metropolitan areas. ConclusionThis is the first time that the VPDC and RBDM birth data were linked in Victoria. The matched birth information established a more complete population profile of Aboriginal and/or Torres Strait Islander births. These data will provide a more accurate baseline to enhance the Victorian and Australian governments’ ability to plan services, allocate resources and evaluate funded activities aimed at eliminating disparity experienced by Aboriginal and/or Torres Strait Islander peoples. Importantly, it has established a more accurate denominator from which to calculate Aboriginal infant mortality rates for Victoria, Australia. *Until 2009, the mother’s Indigenous identification only was recorded in the VPDC


2021 ◽  
Vol 50 (Supplement_1) ◽  
Author(s):  
Angela van der Plas ◽  
Matthew Hankins ◽  
Annie Heremans

Abstract Focus of Presentation Real-world data (RWD) is readily available in Japan through multiple sources. Considering the crucial need to substantiate the beneficial effects of switching from cigarette smoking to heat-not-burn (HNB) products, both at the individual and population levels, the use of RWD seems a viable option. For instance, ecological studies using RWD have assessed the impact of population-level interventions such as smoking bans and their effects on smoking-related diseases and their endpoints As a proof of concept, ecological analyses were performed to assess the rates of chronic obstructive pulmonary disease (COPD) exacerbations and acute Ischemic Heart Disease (IHD) hospitalizations before and after the introduction of and HNB product in the Japanese market. Findings Hospital admissions associated with ICD codes for COPD and IHD from 2008 to 2019—5 years before and 4 years after the introduction of the target HNB product in the Japanese market—were retrieved from the MDV database. Referrals were below those predicted from pre-launch trends. Conclusions The use of RWD in assessment of HNB products is viable for ecological studies with the well-known caveats. Their use in exposure-specific studies will become feasible once the systematic collection of exposure to tobacco products and use history is guaranteed. This will greatly increase the range and robustness of the evidence base. Key messages The use of RWD is a practical way of assessing the impact of HNB products in the population as a whole.


Sign in / Sign up

Export Citation Format

Share Document