scholarly journals Privacy-preserving generative deep neural networks support clinical data sharing

2017 ◽  
Author(s):  
Brett K. Beaulieu-Jones ◽  
Zhiwei Steven Wu ◽  
Chris Williams ◽  
Ran Lee ◽  
Sanjeev P. Bhavnani ◽  
...  

AbstractBackgroundData sharing accelerates scientific progress but sharing individual level data while preserving patient privacy presents a barrier.Methods and ResultsUsing pairs of deep neural networks, we generated simulated, synthetic “participants” that closely resemble participants of the SPRINT trial. We showed that such paired networks can be trained with differential privacy, a formal privacy framework that limits the likelihood that queries of the synthetic participants’ data could identify a real a participant in the trial. Machine-learning predictors built on the synthetic population generalize to the original dataset. This finding suggests that the synthetic data can be shared with others, enabling them to perform hypothesis-generating analyses as though they had the original trial data.ConclusionsDeep neural networks that generate synthetic participants facilitate secondary analyses and reproducible investigation of clinical datasets by enhancing data sharing while preserving participant privacy.


2021 ◽  
Vol 5 (3) ◽  
pp. 1-10
Author(s):  
Melih Öz ◽  
Taner Danışman ◽  
Melih Günay ◽  
Esra Zekiye Şanal ◽  
Özgür Duman ◽  
...  

The human eye contains valuable information about an individual’s identity and health. Therefore, segmenting the eye into distinct regions is an essential step towards gathering this useful information precisely. The main challenges in segmenting the human eye include low light conditions, reflections on the eye, variations in the eyelid, and head positions that make an eye image hard to segment. For this reason, there is a need for deep neural networks, which are preferred due to their success in segmentation problems. However, deep neural networks need a large amount of manually annotated data to be trained. Manual annotation is a labor-intensive task, and to tackle this problem, we used data augmentation methods to improve synthetic data. In this paper, we detail the exploration of the scenario, which, with limited data, whether performance can be enhanced using similar context data with image augmentation methods. Our training and test set consists of 3D synthetic eye images generated from the UnityEyes application and manually annotated real-life eye images, respectively. We examined the effect of using synthetic eye images with the Deeplabv3+ network in different conditions using image augmentation methods on the synthetic data. According to our experiments, the network trained with processed synthetic images beside real-life images produced better mIoU results than the network, which only trained with real-life images in the Base dataset. We also observed mIoU increase in the test set we created from MICHE II competition images.



2021 ◽  
Vol 40 (10) ◽  
pp. 751-758
Author(s):  
Fabien Allo ◽  
Jean-Philippe Coulon ◽  
Jean-Luc Formento ◽  
Romain Reboul ◽  
Laure Capar ◽  
...  

Deep neural networks (DNNs) have the potential to streamline the integration of seismic data for reservoir characterization by providing estimates of rock properties that are directly interpretable by geologists and reservoir engineers instead of elastic attributes like most standard seismic inversion methods. However, they have yet to be applied widely in the energy industry because training DNNs requires a large amount of labeled data that is rarely available. Training set augmentation, routinely used in other scientific fields such as image recognition, can address this issue and open the door to DNNs for geophysical applications. Although this approach has been explored in the past, creating realistic synthetic well and seismic data representative of the variable geology of a reservoir remains challenging. Recently introduced theory-guided techniques can help achieve this goal. A key step in these hybrid techniques is the use of theoretical rock-physics models to derive elastic pseudologs from variations of existing petrophysical logs. Rock-physics theories are already commonly relied on to generalize and extrapolate the relationship between rock and elastic properties. Therefore, they are a useful tool to generate a large catalog of alternative pseudologs representing realistic geologic variations away from the existing well locations. While not directly driven by rock physics, neural networks trained on such synthetic catalogs extract the intrinsic rock-physics relationships and are therefore capable of directly estimating rock properties from seismic amplitudes. Neural networks trained on purely synthetic data are applied to a set of 2D poststack seismic lines to characterize a geothermal reservoir located in the Dogger Formation northeast of Paris, France. The goal of the study is to determine the extent of porous and permeable layers encountered at existing geothermal wells and ultimately guide the location and design of future geothermal wells in the area.



Author(s):  
Sebastian Ruder ◽  
Joachim Bingel ◽  
Isabelle Augenstein ◽  
Anders Søgaard

Multi-task learning (MTL) allows deep neural networks to learn from related tasks by sharing parameters with other networks. In practice, however, MTL involves searching an enormous space of possible parameter sharing architectures to find (a) the layers or subspaces that benefit from sharing, (b) the appropriate amount of sharing, and (c) the appropriate relative weights of the different task losses. Recent work has addressed each of the above problems in isolation. In this work we present an approach that learns a latent multi-task architecture that jointly addresses (a)–(c). We present experiments on synthetic data and data from OntoNotes 5.0, including four different tasks and seven different domains. Our extension consistently outperforms previous approaches to learning latent architectures for multi-task problems and achieves up to 15% average error reductions over common approaches to MTL.



2020 ◽  
Vol 48 (5) ◽  
pp. 876-890
Author(s):  
Sergiu Gherghina ◽  
Aurelian Plopeanu

AbstractThe research focusing on return migration from the perspective of migrants’ relationship with the country of origin has emphasized the emotional and economic ties. Quite often, these ties have been examined separately and there is little indication of what counts more. This article addresses this gap in the literature and analyzes the extent to which the sense of belonging, media consumption, networks of friends, and regular visits in the country of origin could affect the intention to return. It controls for remittances, voting in the elections of their home country, and age. The empirical analysis uses an original dataset including individual level data. This was collected through an online survey in January 2018 on a sample of 1,839 first generation migrants from Romania.



2021 ◽  
Vol 9 ◽  
Author(s):  
Arianna Maever L. Amit ◽  
Veincent Christian F. Pepito ◽  
Bernardo Gutierrez ◽  
Thomas Rawson

Background: When a new pathogen emerges, consistent case reporting is critical for public health surveillance. Tracking cases geographically and over time is key for understanding the spread of an infectious disease and effectively designing interventions to contain and mitigate an epidemic. In this paper we describe the reporting systems on COVID-19 in Southeast Asia during the first wave in 2020, and highlight the impact of specific reporting methods.Methods: We reviewed key epidemiological variables from various sources including a regionally comprehensive dataset, national trackers, dashboards, and case bulletins for 11 countries during the first wave of the epidemic in Southeast Asia. We recorded timelines of shifts in epidemiological reporting systems and described the differences in how epidemiological data are reported across countries and timepoints.Results: Our findings suggest that countries in Southeast Asia generally reported precise and detailed epidemiological data during the first wave of the pandemic. Changes in reporting rarely occurred for demographic data, while reporting shifts for geographic and temporal data were frequent. Most countries provided COVID-19 individual-level data daily using HTML and PDF, necessitating scraping and extraction before data could be used in analyses.Conclusion: Our study highlights the importance of more nuanced analyses of COVID-19 epidemiological data within and across countries because of the frequent shifts in reporting. As governments continue to respond to impacts on health and the economy, data sharing also needs to be prioritised given its foundational role in policymaking, and in the implementation and evaluation of interventions.



Author(s):  
Jessica A. F. Thompson

Much of the controversy evoked by the use of deep neural networks as models of biological neural systems amount to debates over what constitutes scientific progress in neuroscience. In order to discuss what constitutes scientific progress, one must have a goal in mind (progress towards what?). One such long term goal is to produce scientific explanations of intelligent capacities (e.g., object recognition, relational reasoning). I argue that the most pressing philosophical questions at the intersection of neuroscience and artificial intelligence are ultimately concerned with defining the phenomena to be explained and with what constitute valid explanations of such phenomena. I propose that a foundation in the philosophy of scientific explanation and understanding can scaffold future discussions about how an integrated science of intelligence might progress. Towards this vision, I review relevant theories of scientific explanation and discuss strategies for unifying the scientific goals of neuroscience and AI.



2016 ◽  
Vol 118 (12) ◽  
pp. 1-29
Author(s):  
Kevin C. Bastian ◽  
C. Kevin Fortner ◽  
Alisa Chapman ◽  
M. Jayne Fleener ◽  
Ellen Mcintyre ◽  
...  

Background/Context Teacher preparation programs (TPPs) face increasing pressure from the federal government, states, and accreditation agencies to improve the quality of their practices and graduates, yet they often do not possess enough data to make evidence-based reforms. Purpose/Objective This manuscript has four objectives: (a) to present the strengths and shortcomings of accountability-based TPP evaluation systems; (b) to detail the individual-level data being shared with TPPs at public universities in North Carolina; (c) to describe how data sharing can lead to TPP improvement and the challenges that programs will need to overcome; and (d) to detail how three TPPs are using the data for program improvement. Setting North Carolina public schools and schools of education at public universities in North Carolina. Importantly, this individual-level data sharing system can be instituted among TPPs in other states. Population/Participants/Subjects Teachers initially-prepared by public universities in North Carolina. Research Design With individual-level data on program graduates, TPPs can conduct a range of analyses—e.g., regression analyses with program data, primary data collection with interviews, and rubric-based observations—designed to aid program improvement efforts. Conclusions/Recommendations Teacher preparation programs and researchers or state education agencies need to establish partnerships to share individual-level data on program graduates with TPPs. This individual-level data sharing would help TPPs to develop systems of continuous improvement by examining whether their preparation practices align with the types of environments in which their graduates teach and how graduates’ preparation experiences predict their characteristics and performance as Teachers of Record. Unlike other initiatives targeted at TPP improvement, individual-level data sharing, and its focus on within-program variability, can benefit TPPs at all levels of performance.



2020 ◽  
Author(s):  
Arianna Maever L. Amit ◽  
Veincent Christian F. Pepito ◽  
Bernardo Gutierrez ◽  
Thomas Rawson

SummaryBackgroundWhen a new pathogen emerges, consistent case reporting is critical for public health surveillance. Tracking cases geographically and over time is key for understanding the spread of an infectious disease and how to effectively design interventions to contain and mitigate an epidemic. In this paper we describe the reporting systems on COVID-19 in Southeast Asia during the first wave in 2020, and highlight the impact of specific reporting methods.MethodsWe reviewed key epidemiological variables from various sources including a regionally comprehensive dataset, national trackers, dashboards, and case bulletins for 11 countries during the first wave of the epidemic in Southeast Asia. We recorded timelines of shifts in epidemiological reporting systems. We further described the differences in how epidemiological data are reported across countries and timepoints, and the accessibility of epidemiological data.FindingsOur findings suggest that countries in Southeast Asia generally reported precise and detailed epidemiological data during the first wave of the COVID-19 pandemic. However, changes in reporting were frequent and varied across data and countries. Changes in reporting rarely occurred for demographic data such as age and sex, while reporting shifts for geographic and temporal data were frequent. We also found that most countries provided COVID-19 individual-level data daily using HTML and PDF, necessitating scraping and extraction before data could be used in analyses.InterpretationCountries have different reporting systems and different capacities for maintaining consistent reporting of epidemiological data. As the pandemic progresses, governments may also change their priorities in data sharing. Our study thus highlights the importance of more nuanced analyses of epidemiological data of COVID-19 within and across countries because of the frequent shifts in reporting. Further, most countries provide data on a daily basis but not always in a readily usable format. As governments continue to respond to the impacts of COVID-19 on health and the economy, data sharing also needs to be prioritised given its foundational role in policymaking, and the implementation and evaluation of interventions.FundingThe work was supported through an Engineering and Physical Sciences Research Council (EPSRC) (https://epsrc.ukri.org/) Systems Biology studentship award (EP/G03706X/1) to TR. This project was also supported in part by the Oxford Martin School.The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.



2021 ◽  
Author(s):  
Ali Hatamizadeh ◽  
Hongxu Yin ◽  
Pavlo Molchanov ◽  
Andriy Myronenko ◽  
Wenqi Li ◽  
...  

Abstract Federated learning (FL) allows the collaborative training of AI models without needing to share raw data. This capability makes it especially interesting for healthcare applications where patient and data privacy is of utmost concern. However, recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data. In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack that works for more realistic scenarios where the clients’ training involves updating the Batch Normalization (BN) statistics. Furthermore, we present new ways to measure and visualize potential data leakage in FL. Our work is a step towards establishing reproducible methods of measuring data leakage in FL and could help determine the optimal tradeoffs between privacy-preserving techniques, such as differential privacy, and model accuracy based on quantifiable metrics.



Author(s):  
Brett K. Beaulieu-Jones ◽  
Zhiwei Steven Wu ◽  
Chris Williams ◽  
Ran Lee ◽  
Sanjeev P. Bhavnani ◽  
...  


Sign in / Sign up

Export Citation Format

Share Document