scholarly journals Position Statement on Population Data Science:

Author(s):  
Kim McGrail ◽  
Kerina Jones ◽  
Ashley Akbari ◽  
Tell Bennett ◽  
Andrew Boyd ◽  
...  

Information is increasingly digital, creating opportunities to respond to pressing issues about human populations in near real time using linked datasets that are large, complex, and diverse. The potential social and individual benefits that can come from data-intensive science are large, but raise challenges of balancing individual privacy and the public good, building appropriate socio-technical systems to support data-intensive science, and determining whether defining a new field of inquiry might help move those collective interests and activities forward. A combination of expert engagement, literature review, and iterative conversations led to our conclusion that defining the field of Population Data Science (challenge 3) will help address the other two challenges as well. We define Population Data Science succinctly as the science of data about people and note that it is related to but distinct from the fields of data science and informatics. A broader definition names four characteristics of: data use for positive impact on citizens and society; bringing together and analyzing data from multiple sources; finding population-level insights; and developing safe, privacy-sensitive and ethical infrastructure to support research. One implication of these characteristics is that few people possess all of the requisite knowledge and skills of Population Data Science, so this is by nature a multi-disciplinary field. Other implications include the need to advance various aspects of science, such as data linkage technology, various forms of analytics, and methods of public engagement. These implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field, can catalyze significant advances in our understanding of trends in society, health, and human behavior. 

Author(s):  
Kim McGrail ◽  
Kerina Jones

IntroductionSocietal and individual benefits of data-intensive science are substantial but raise challenges of balancing individual privacy and public good, while building appropriate governance and socio-technical systems to support data-intensive science. We set out to define a new field of inquiry to move collective interests forward. Objectives and ApproachOur objectives were: 1. To create a concise definition of the emerging field of Population Data Science; 2. To highlight the characteristics and challenges of Population Data Science; 3. To differentiate Population Data Science from existing fields of data science and informatics; and 4. To discuss the implications and future opportunities for Population Data Science. Objectives 1 and 2 were met largely through International Population Data Linkage Network (IPDLN) member engagement, Objective 3 was evaluated via literature review, and Objective 4 was achieved through iterative and collective work on a peer-reviewed position paper. ResultsWe define Population Data Science succinctly as the science of data about people. It is related to, but distinct from, the fields of data science and informatics. A broader definition includes four characteristics of: i) data use for positive impact on individuals and populations; ii) bringing together and analyzing data from multiple sources; iii) identifying population-level insights; and iv) developing safe, privacy-sensitive and ethical infrastructure to support research. One implication of these characteristics is that few individuals or organisations possess all of the requisite knowledge and skills comprising Population Data Science, so this is by nature a multi-disciplinary “team science” field. There is a need to advance various aspects of science, such as data linkage technology, various forms of analytics, and methods of public engagement. Conclusion/ImplicationsThese implications are the beginnings of a research agenda for Population Data Science, which if approached as a collective field, will catalyze significant advances in our understanding of society, health, and human behavior and increase the impact of our research.


Author(s):  
Robert Lipton ◽  
D. M. Gorman ◽  
William F. Wieczorek ◽  
Aniruddha Banerjee ◽  
Paul Gruenewald

From John Snow’s pioneering work on cholera in the 19th century until the present day, placing illness and disease within the context of a geographic framework has been an integral, if understated, part of the practice of public health. Indeed, geographical/spatial methods are an increasingly important tool in understanding public health issues. Spatial analysis addresses a seemingly obvious yet relatively misunderstood aspect of public health, namely, studying the dynamics of people in places. As advances in computer technology increase almost exponentially, computer intensive spatial methods (including mapping) have become an appealing way to understand the manner in which the individual relates to larger frameworks that compose the human community and the physical nature of human environments (streets with intersections, dense vs. sparse neighborhoods, high or low densities of liquor stores or restaurants, etc.). Spatial methods are extremely data intensive, often pulling together information from disparate sources that have been collected for other purposes such as research, business practice, governmental policy, and law enforcement. Although initially more demanding in regard to data manipulation compared to typical population level methods, the ability to compile and compare data in a spatial framework provides information about human populations that lies beyond typical survey or census research. We will discuss general methods of spatial analysis and mapping that will help to elucidate when and how spatial analysis might be used in a public health setting. This discussion will include a method for transforming arbitrary administrative units, such as zip codes, into a more useable uniform grid structure. In addition, a practical research example will be discussed focusing on the relationship between alcohol and violence. A relatively new Bayesian spatial method will be part of this example.


Author(s):  
Robert Lipton ◽  
D. M. Gorman ◽  
William F. Wieczorek ◽  
Paul Gruenewald

Spatial methods are an increasingly important tool in understanding public health issues. Spatial analysis addresses an often forgotten or misunderstood aspect of public health, namely, studying the dynamics of people in places. As advances in computer technology have continued apace, spatial methods have become an appealing way to understand the manner in which the individual relates to larger frameworks that compose the human community and the physical nature of human environments (streets with intersections, dense vs. sparse neighborhoods, high or low densities of liquor stores or restaurants, etc.). Spatial methods are extremely data-intensive, often pulling together information from disparate sources that have been collected for other purposes, such as research, business practice, governmental policy, and law enforcement. Although initially more demanding in regard to data manipulation compared to typical population level methods, the ability to compile and compare data in a spatial framework provides much additional information about human populations that lies beyond typical survey or census research. We will discuss general methods of spatial analysis and mapping which will help to elucidate when and how spatial analysis might be used in a public health setting. Further, we will discuss a practical research example focusing on the relationship between alcohol and violence.


BMJ Open ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. e043010
Author(s):  
Jane Lyons ◽  
Ashley Akbari ◽  
Fatemeh Torabi ◽  
Gareth I Davies ◽  
Laura North ◽  
...  

IntroductionThe emergence of the novel respiratory SARS-CoV-2 and subsequent COVID-19 pandemic have required rapid assimilation of population-level data to understand and control the spread of infection in the general and vulnerable populations. Rapid analyses are needed to inform policy development and target interventions to at-risk groups to prevent serious health outcomes. We aim to provide an accessible research platform to determine demographic, socioeconomic and clinical risk factors for infection, morbidity and mortality of COVID-19, to measure the impact of COVID-19 on healthcare utilisation and long-term health, and to enable the evaluation of natural experiments of policy interventions.Methods and analysisTwo privacy-protecting population-level cohorts have been created and derived from multisourced demographic and healthcare data. The C20 cohort consists of 3.2 million people in Wales on the 1 January 2020 with follow-up until 31 May 2020. The complete cohort dataset will be updated monthly with some individual datasets available daily. The C16 cohort consists of 3 million people in Wales on the 1 January 2016 with follow-up to 31 December 2019. C16 is designed as a counterfactual cohort to provide contextual comparative population data on disease, health service utilisation and mortality. Study outcomes will: (a) characterise the epidemiology of COVID-19, (b) assess socioeconomic and demographic influences on infection and outcomes, (c) measure the impact of COVID-19 on short -term and longer-term population outcomes and (d) undertake studies on the transmission and spatial spread of infection.Ethics and disseminationThe Secure Anonymised Information Linkage-independent Information Governance Review Panel has approved this study. The study findings will be presented to policy groups, public meetings, national and international conferences, and published in peer-reviewed journals.


Author(s):  
Adrien Oliva ◽  
Raymond Tobler ◽  
Alan Cooper ◽  
Bastien Llamas ◽  
Yassine Souilmi

Abstract The current standard practice for assembling individual genomes involves mapping millions of short DNA sequences (also known as DNA ‘reads’) against a pre-constructed reference genome. Mapping vast amounts of short reads in a timely manner is a computationally challenging task that inevitably produces artefacts, including biases against alleles not found in the reference genome. This reference bias and other mapping artefacts are expected to be exacerbated in ancient DNA (aDNA) studies, which rely on the analysis of low quantities of damaged and very short DNA fragments (~30–80 bp). Nevertheless, the current gold-standard mapping strategies for aDNA studies have effectively remained unchanged for nearly a decade, during which time new software has emerged. In this study, we used simulated aDNA reads from three different human populations to benchmark the performance of 30 distinct mapping strategies implemented across four different read mapping software—BWA-aln, BWA-mem, NovoAlign and Bowtie2—and quantified the impact of reference bias in downstream population genetic analyses. We show that specific NovoAlign, BWA-aln and BWA-mem parameterizations achieve high mapping precision with low levels of reference bias, particularly after filtering out reads with low mapping qualities. However, unbiased NovoAlign results required the use of an IUPAC reference genome. While relevant only to aDNA projects where reference population data are available, the benefit of using an IUPAC reference demonstrates the value of incorporating population genetic information into the aDNA mapping process, echoing recent results based on graph genome representations.


Animals ◽  
2021 ◽  
Vol 11 (8) ◽  
pp. 2323
Author(s):  
Lloyd A. Courtenay ◽  
Darío Herranz-Rodrigo ◽  
José Yravedra ◽  
José Mª Vázquez-Rodríguez ◽  
Rosa Huguet ◽  
...  

Human populations have been known to develop complex relationships with large carnivore species throughout time, with evidence of both competition and collaboration to obtain resources throughout the Pleistocene. From this perspective, many archaeological and palaeontological sites present evidence of carnivore modifications to bone. In response to this, specialists in the study of microscopic bone surface modifications have resorted to the use of 3D modeling and data science techniques for the inspection of these elements, reaching novel limits for the discerning of carnivore agencies. The present research analyzes the tooth mark variability produced by multiple Iberian wolf individuals, with the aim of studying how captivity may affect the nature of tooth marks left on bone. In addition to this, four different populations of both wild and captive Iberian wolves are also compared for a more in-depth comparison of intra-species variability. This research statistically shows that large canid tooth pits are the least affected by captivity, while tooth scores appear more superficial when produced by captive wolves. The superficial nature of captive wolf tooth scores is additionally seen to correlate with other metric features, thus influencing overall mark morphologies. In light of this, the present study opens a new dialogue on the reasons behind this, advising caution when using tooth scores for carnivore identification and contemplating how elements such as stress may be affecting the wolves under study.


2010 ◽  
Vol 16 (12) ◽  
pp. 1422-1431 ◽  
Author(s):  
Bruce V Taylor ◽  
John F Pearson ◽  
Glynnis Clarke ◽  
Deborah F Mason ◽  
David A Abernethy ◽  
...  

Background: The prevalence of multiple sclerosis (MS) is not uniform, with a latitudinal gradient of prevalence present in most studies. Understanding the drivers of this gradient may allow a better understanding of the environmental factors involved in MS pathogenesis. Method: The New Zealand national MS prevalence study (NZMSPS) is a cross-sectional study of people with definite MS (DMS) (McDonald criteria 2005) resident in New Zealand on census night, 7 March 2006, utilizing multiple sources of notification. Capture—recapture analysis (CRA) was used to estimate missing cases. Results: Of 2917 people with DMS identified, the crude prevalence was 72.4 per 100,000 population, and 73.1 per 100,000 when age-standardized to the European population. CRA estimated that 96.7% of cases were identified. A latitudinal gradient was seen with MS prevalence increasing three-fold from the North (35°S) to the South (48°S). The gradient was non-uniform; females with relapsing—remitting/secondary-progressive (RRMS/SPMS) disease have a gradient 11 times greater than males with primary-progressive MS ( p < 1 × 10-7). DMS was significantly less common among those of Māori ethnicity. Conclusions: This study confirms the presence of a robust latitudinal gradient of MS prevalence in New Zealand. This gradient is largely driven by European females with the RRMS/SPMS phenotype. These results indicate that the environmental factors that underlie the latitudinal gradient act differentially by gender, ethnicity and MS phenotype. A better understanding of these factors may allow more targeted MS therapies aimed at modifiable environmental triggers at the population level.


2018 ◽  
Vol 5 (6) ◽  
Author(s):  
Jonathan Colasanti ◽  
Jeri Sumitani ◽  
C Christina Mehta ◽  
Yiran Zhang ◽  
Minh Ly Nguyen ◽  
...  

Abstract Background Rapid entry programs (REPs) improve time to antiretroviral therapy (ART) initiation (TAI) and time to viral suppression (TVS). We assessed the feasibility and effectiveness of a REP in a large HIV clinic in Atlanta, Georgia, serving a predominately un- or underinsured population. Methods The Rapid Entry and ART in Clinic for HIV (REACH) program was implemented on May 16, 2016. We performed a retrospective cohort study with the main independent variable being period of enrollment: January 1, 2016, through May 15, 2016 (pre-REACH); May 16, 2016, through July 31, 2016 (post-REACH). Included individuals were HIV-infected and new to the clinic with detectable HIV-1 RNA. Six-month follow-up data were collected for each participant. Survival analyses were conducted for TVS. Logistic and linear regression analyses were used to evaluate secondary outcomes: attendance at first clinic visit, viral suppression, TAI, and time to first attended provider visit. Results There were 117 pre-REACH and 90 post-REACH individuals. Median age (interquartile range [IQR]) was 35 (25–45) years, 80% were male, 91% black, 60% men who have sex with men, 57% uninsured, and 44% active substance users. TVS decreased from 77 (62–96) to 57 (41–70) days (P &lt; .0022). Time to first attended provider visit decreased from 17 to 5 days, and TAI from 21 to 7 days (P &lt; .0001), each remaining significant in adjusted models. Conclusions This is the largest rapid entry cohort described in the United States and suggests that rapid entry is feasible and could have a positive impact on HIV transmission at the population level.


Author(s):  
Kerina H Jones ◽  
Arron S Lacey ◽  
Brian L Perkins ◽  
Mark I Rees

ABSTRACTObjectivesData safe havens can bring together and combine a rich array of anonymised person-based data for research and policy evaluation within a secure setting. To date, the majority of available datasets have been structured micro-data derived from routine health-related records. Possibilities are opening up for the greater reuse of genomic data such as Genome Wide Association studies (GWAS) and Whole Exome/Genome Sequencing (WES or WGS). However, there are considerable challenges to be addressed if the benefits of using these data in combination with health-related data are to be realized safely. ApproachWe explore the benefits and challenges of using genomic datasets with health-related data, and using the Secure Anonymised Information Linkage (SAIL) system as a case study, the implications and way forward for Data Safe Havens in seeking to incorporate genomic data for use with health-related data. ResultsThe benefits of using GWAS, WES and WGS data in conjunction with health-related data include the potential to explore genetics at a population level and open up novel research areas. These include the ability to increasingly stratify and personalize how medical indications are detected and treated through precision medicine by understanding rare conditions and adding socioeconomic and environmental context to genomic data. Among the challenges are: data availability, computing capacity, technical solutions, legal and regulatory frameworks, public perceptions, individual privacy and organizational risk. Many of the challenges within these areas are common to person-based data in general, and often Data Safe Havens have been designed to address these. But there are also aspects of these challenges, and other challenges, specific to genomic data. These include issues due to the unknown clinical significance of genomic information now or in the future, with corresponding risks for privacy and impact on individuals. ConclusionGenomic data sets contain vast amounts of valuable information, some of which is currently undefined, but which may have direct bearing on individual health at some point. The use of these data in combination with health-related data has the potential to bring great benefits, better clinical trial stratification, epidemiology project design and clinical improvements. It is, therefore, essential that such data are surrounded by a properly-designed, robust governance framework including technical and procedural access controls that enable the data to be used safely.


2019 ◽  
Author(s):  
Mia Partlow ◽  
Karen Ciccone ◽  
Margaret Peak

Presentation given at TRLN Annual Meeting, Durham, North Carolina, July 1, 2019. The Hunt Library Dataspace was launched in August 2018 to provide students with access to the tools and support they need to develop critical data skills and perform data intensive tasks. It is outfitted with specialized computing hardware and software and staffed by graduate student Data Science Consultants who provide drop-in support for programming, data analysis, statistical analysis, visualization, and other data-related topics.Prior to launching the Dataspace the Libraries’ Director of Planning and Research worked with the Data &amp; Visualization Services department to develop a plan for assessing the new Dataspace services. The process began with identifying relevant goals based on NC State University and the NC State University Libraries’ strategic priorities. Next we identified measures that would assess our success in relation to those goals. This talk describes the assessment planning process, the measures and methods employed, outcomes, and how this information will be used to improve our services and inform new service development.


Sign in / Sign up

Export Citation Format

Share Document