Verification of statistical oncological endpoints on encrypted data: Confirming the feasibility of real-world data sharing without the need to reveal protected patient information.

2021 ◽  
Vol 39 (15_suppl) ◽  
pp. e18725-e18725
Author(s):  
Ravit Geva ◽  
Barliz Waissengrin ◽  
Dan Mirelman ◽  
Felix Bokstein ◽  
Deborah T. Blumenthal ◽  
...  

e18725 Background: Healthcare data sharing is important for the creation of diverse and large data sets, supporting clinical decision making, and accelerating efficient research to improve patient outcomes. This is especially vital in the case of real world data analysis. However, stakeholders are reluctant to share their data without ensuring patients’ privacy, proper protection of their data sets and the ways they are being used. Homomorphic encryption is a cryptographic capability that can address these issues by enabling computation on encrypted data without ever decrypting it, so the analytics results are obtained without revealing the raw data. The aim of this study is to prove the accuracy of analytics results and the practical efficiency of the technology. Methods: A real-world data set of colorectal cancer patients’ survival data, following two different treatment interventions, including 623 patients and 24 variables, amounting to 14,952 items of data, was encrypted using leveled homomorphic encryption implemented in the PALISADE software library. Statistical analysis of key oncological endpoints was blindly performed on both the raw data and the homomorphically-encrypted data using descriptive statistics and survival analysis with Kaplan-Meier curves. Results were then compared with an accuracy goal of two decimals. Results: The difference between the raw data and the homomorphically encrypted data results, regarding all variables analyzed was within the pre-determined accuracy range goal, as well as the practical efficiency of the encrypted computation measured by run time, are presented in table. Conclusions: This study demonstrates that data encrypted with Homomorphic Encryption can be statistical analyzed with a precision of at least two decimal places, allowing safe clinical conclusions drawing while preserving patients’ privacy and protecting data owners’ data assets. Homomorphic encryption allows performing efficient computation on encrypted data non-interactively and without requiring decryption during computation time. Utilizing the technology will empower large-scale cross-institution and cross- stakeholder collaboration, allowing safe international collaborations. Clinical trial information: 0048-19-TLV. [Table: see text]

2020 ◽  
Vol 267 (S1) ◽  
pp. 185-196
Author(s):  
J. Gerb ◽  
S. A. Ahmadi ◽  
E. Kierig ◽  
B. Ertl-Wagner ◽  
M. Dieterich ◽  
...  

Abstract Background Objective and volumetric quantification is a necessary step in the assessment and comparison of endolymphatic hydrops (ELH) results. Here, we introduce a novel tool for automatic volumetric segmentation of the endolymphatic space (ELS) for ELH detection in delayed intravenous gadolinium-enhanced magnetic resonance imaging of inner ear (iMRI) data. Methods The core component is a novel algorithm based on Volumetric Local Thresholding (VOLT). The study included three different data sets: a real-world data set (D1) to develop the novel ELH detection algorithm and two validating data sets, one artificial (D2) and one entirely unseen prospective real-world data set (D3). D1 included 210 inner ears of 105 patients (50 male; mean age 50.4 ± 17.1 years), and D3 included 20 inner ears of 10 patients (5 male; mean age 46.8 ± 14.4 years) with episodic vertigo attacks of different etiology. D1 and D3 did not differ significantly concerning age, gender, the grade of ELH, or data quality. As an artificial data set, D2 provided a known ground truth and consisted of an 8-bit cuboid volume using the same voxel-size and grid as real-world data with different sized cylindrical and cuboid-shaped cutouts (signal) whose grayscale values matched the real-world data set D1 (mean 68.7 ± 7.8; range 48.9–92.8). The evaluation included segmentation accuracy using the Sørensen-Dice overlap coefficient and segmentation precision by comparing the volume of the ELS. Results VOLT resulted in a high level of performance and accuracy in comparison with the respective gold standard. In the case of the artificial data set, VOLT outperformed the gold standard in higher noise levels. Data processing steps are fully automated and run without further user input in less than 60 s. ELS volume measured by automatic segmentation correlated significantly with the clinical grading of the ELS (p < 0.01). Conclusion VOLT enables an open-source reproducible, reliable, and automatic volumetric quantification of the inner ears’ fluid space using MR volumetric assessment of endolymphatic hydrops. This tool constitutes an important step towards comparable and systematic big data analyses of the ELS in patients with the frequent syndrome of episodic vertigo attacks. A generic version of our three-dimensional thresholding algorithm has been made available to the scientific community via GitHub as an ImageJ-plugin.


2012 ◽  
Vol 22 (04) ◽  
pp. 305-325 ◽  
Author(s):  
MRIDUL AANJANEYA ◽  
FREDERIC CHAZAL ◽  
DANIEL CHEN ◽  
MARC GLISSE ◽  
LEONIDAS GUIBAS ◽  
...  

Many real-world data sets can be viewed of as noisy samples of special types of metric spaces called metric graphs.19 Building on the notions of correspondence and Gromov-Hausdorff distance in metric geometry, we describe a model for such data sets as an approximation of an underlying metric graph. We present a novel algorithm that takes as an input such a data set, and outputs a metric graph that is homeomorphic to the underlying metric graph and has bounded distortion of distances. We also implement the algorithm, and evaluate its performance on a variety of real world data sets.


Author(s):  
I. Weber ◽  
J. Bongartz ◽  
R. Roscher

Abstract. Detecting objects in aerial images is an important task in different environmental and infrastructure-related applications. Deep learning object detectors like RetinaNet offer decent detection performance; however, they require a large amount of annotated training data. It is well known that the collection of annotated data is a time consuming and tedious task, which often cannot be performed sufficiently well for remote sensing tasks since the required data must cover a wide variety of scenes and objects. In this paper, we analyze the performance of such a network given a limited amount of training data and address the research question of whether artificially generated training data can be used to overcome the challenge of real-world data sets with a small amount of training data. For our experiments, we use the ISPRS 2D Semantic Labeling Contest Potsdam data set for vehicle detection, where we derive object-bounding boxes of vehicles suitable for our task. We generate artificial data based on vehicle blueprints and show that networks trained only on generated data may have a lower performance, but are still able to detect most of the vehicles found in the real data set. Moreover, we show that adding generated data to real-world data sets with a limited amount of training data, the performance can be increased significantly, and in some cases, almost reach baseline performance levels.


2021 ◽  
Vol 17 (3) ◽  
pp. 68-80
Author(s):  
Nitesh Sukhwani ◽  
Venkateswara Rao Kagita ◽  
Vikas Kumar ◽  
Sanjaya Kumar Panda

Skyline recommendation with uncertain preferences has drawn AI researchers' attention in recent years due to its wide range of applications. The naive approach of skyline recommendation computes the skyline probability of all objects and ranks them accordingly. However, in many applications, the interest is in determining top-k objects rather than their ranking. The most efficient algorithm to determine an object's skyline probability employs the concepts of zero-contributing set and prefix-based k-level absorption. The authors show that the performance of these methods highly depends on the arrangement of objects in the database. In this paper, the authors propose a method for determining top-k skyline objects without computing the skyline probability of all the objects. They also propose and analyze different methods of ordering the objects in the database. Finally, they empirically show the efficacy of the proposed approaches on several synthetic and real-world data sets.


Entropy ◽  
2021 ◽  
Vol 23 (5) ◽  
pp. 507
Author(s):  
Piotr Białczak ◽  
Wojciech Mazurczyk

Malicious software utilizes HTTP protocol for communication purposes, creating network traffic that is hard to identify as it blends into the traffic generated by benign applications. To this aim, fingerprinting tools have been developed to help track and identify such traffic by providing a short representation of malicious HTTP requests. However, currently existing tools do not analyze all information included in the HTTP message or analyze it insufficiently. To address these issues, we propose Hfinger, a novel malware HTTP request fingerprinting tool. It extracts information from the parts of the request such as URI, protocol information, headers, and payload, providing a concise request representation that preserves the extracted information in a form interpretable by a human analyst. For the developed solution, we have performed an extensive experimental evaluation using real-world data sets and we also compared Hfinger with the most related and popular existing tools such as FATT, Mercury, and p0f. The conducted effectiveness analysis reveals that on average only 1.85% of requests fingerprinted by Hfinger collide between malware families, what is 8–34 times lower than existing tools. Moreover, unlike these tools, in default mode, Hfinger does not introduce collisions between malware and benign applications and achieves it by increasing the number of fingerprints by at most 3 times. As a result, Hfinger can effectively track and hunt malware by providing more unique fingerprints than other standard tools.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Simona D’Amore ◽  
Kathleen Page ◽  
Aimée Donald ◽  
Khadijeh Taiyari ◽  
Brian Tom ◽  
...  

Abstract Background The Gaucher Investigative Therapy Evaluation is a national clinical cohort of 250 patients aged 5–87 years with Gaucher disease in the United Kingdom—an ultra-rare genetic disorder. To inform clinical decision-making and improve pathophysiological understanding, we characterized the course of Gaucher disease and explored the influence of costly innovative medication and other interventions. Retrospective and prospective clinical, laboratory and radiological information including molecular analysis of the GBA1 gene and comprising > 2500 variables were collected systematically into a relational database with banking of collated biological samples in a central bioresource. Data for deep phenotyping and life-quality evaluation, including skeletal, visceral, haematological and neurological manifestations were recorded for a median of 17.3 years; the skeletal and neurological manifestations are the main focus of this study. Results At baseline, 223 of the 250 patients were classified as type 1 Gaucher disease. Skeletal manifestations occurred in most patients in the cohort (131 of 201 specifically reported bone pain). Symptomatic osteonecrosis and fragility fractures occurred respectively in 76 and 37 of all 250 patients and the first osseous events occurred significantly earlier in those with neuronopathic disease. Intensive phenotyping in a subgroup of 40 patients originally considered to have only systemic features, revealed neurological involvement in 18: two had Parkinson disease and 16 had clinical signs compatible with neuronopathic Gaucher disease—indicating a greater than expected prevalence of neurological features. Analysis of longitudinal real-world data enabled Gaucher disease to be stratified with respect to advanced therapies and splenectomy. Splenectomy was associated with an increased hazard of fragility fractures, in addition to osteonecrosis and orthopaedic surgery; there were marked gender differences in fracture risk over time since splenectomy. Skeletal disease was a heavy burden of illness, especially where access to specific therapy was delayed and in patients requiring orthopaedic surgery. Conclusion Gaucher disease has been explored using real-world data obtained in an era of therapeutic transformation. Introduction of advanced therapies and repeated longitudinal measures enabled this heterogeneous condition to be stratified into obvious clinical endotypes. The study reveals diverse and changing phenotypic manifestations with systemic, skeletal and neurological disease as inter-related sources of disability.


2021 ◽  
Author(s):  
Gregory M Miller ◽  
Austin J Ellis ◽  
Rangaprasad Sarangarajan ◽  
Amay Parikh ◽  
Leonardo O Rodrigues ◽  
...  

Objective: The COVID-19 pandemic generated a massive amount of clinical data, which potentially holds yet undiscovered answers related to COVID-19 morbidity, mortality, long term effects, and therapeutic solutions. The objective of this study was to generate insights on COVID-19 mortality-associated factors and identify potential new therapeutic options for COVID-19 patients by employing artificial intelligence analytics on real-world data. Materials and Methods: A Bayesian statistics-based artificial intelligence data analytics tool (bAIcis®) within Interrogative Biology® platform was used for network learning, inference causality and hypothesis generation to analyze 16,277 PCR positive patients from a database of 279,281 inpatients and outpatients tested for SARS-CoV-2 infection by antigen, antibody, or PCR methods during the first pandemic year in Central Florida. This approach generated causal networks that enabled unbiased identification of significant predictors of mortality for specific COVID-19 patient populations. These findings were validated by logistic regression, regression by least absolute shrinkage and selection operator, and bootstrapping. Results: We found that in the SARS-CoV-2 PCR positive patient cohort, early use of the antiemetic agent ondansetron was associated with increased survival in mechanically ventilated patients. Conclusions: The results demonstrate how real world COVID-19 focused data analysis using artificial intelligence can generate valid insights that could possibly support clinical decision-making and minimize the future loss of lives and resources.


Sign in / Sign up

Export Citation Format

Share Document