VOLT: a novel open-source pipeline for automatic segmentation of endolymphatic space in inner ear MRI

Abstract Background Objective and volumetric quantification is a necessary step in the assessment and comparison of endolymphatic hydrops (ELH) results. Here, we introduce a novel tool for automatic volumetric segmentation of the endolymphatic space (ELS) for ELH detection in delayed intravenous gadolinium-enhanced magnetic resonance imaging of inner ear (iMRI) data. Methods The core component is a novel algorithm based on Volumetric Local Thresholding (VOLT). The study included three different data sets: a real-world data set (D1) to develop the novel ELH detection algorithm and two validating data sets, one artificial (D2) and one entirely unseen prospective real-world data set (D3). D1 included 210 inner ears of 105 patients (50 male; mean age 50.4 ± 17.1 years), and D3 included 20 inner ears of 10 patients (5 male; mean age 46.8 ± 14.4 years) with episodic vertigo attacks of different etiology. D1 and D3 did not differ significantly concerning age, gender, the grade of ELH, or data quality. As an artificial data set, D2 provided a known ground truth and consisted of an 8-bit cuboid volume using the same voxel-size and grid as real-world data with different sized cylindrical and cuboid-shaped cutouts (signal) whose grayscale values matched the real-world data set D1 (mean 68.7 ± 7.8; range 48.9–92.8). The evaluation included segmentation accuracy using the Sørensen-Dice overlap coefficient and segmentation precision by comparing the volume of the ELS. Results VOLT resulted in a high level of performance and accuracy in comparison with the respective gold standard. In the case of the artificial data set, VOLT outperformed the gold standard in higher noise levels. Data processing steps are fully automated and run without further user input in less than 60 s. ELS volume measured by automatic segmentation correlated significantly with the clinical grading of the ELS (p < 0.01). Conclusion VOLT enables an open-source reproducible, reliable, and automatic volumetric quantification of the inner ears’ fluid space using MR volumetric assessment of endolymphatic hydrops. This tool constitutes an important step towards comparable and systematic big data analyses of the ELS in patients with the frequent syndrome of episodic vertigo attacks. A generic version of our three-dimensional thresholding algorithm has been made available to the scientific community via GitHub as an ImageJ-plugin.

Download Full-text

Verification of statistical oncological endpoints on encrypted data: Confirming the feasibility of real-world data sharing without the need to reveal protected patient information.

Journal of Clinical Oncology ◽

10.1200/jco.2021.39.15_suppl.e18725 ◽

2021 ◽

Vol 39 (15_suppl) ◽

pp. e18725-e18725

Author(s):

Ravit Geva ◽

Barliz Waissengrin ◽

Dan Mirelman ◽

Felix Bokstein ◽

Deborah T. Blumenthal ◽

...

Keyword(s):

Data Sharing ◽

Real World ◽

Clinical Decision Making ◽

Homomorphic Encryption ◽

Data Sets ◽

Real World Data ◽

Data Set ◽

Raw Data ◽

Encrypted Data ◽

World Data

e18725 Background: Healthcare data sharing is important for the creation of diverse and large data sets, supporting clinical decision making, and accelerating efficient research to improve patient outcomes. This is especially vital in the case of real world data analysis. However, stakeholders are reluctant to share their data without ensuring patients’ privacy, proper protection of their data sets and the ways they are being used. Homomorphic encryption is a cryptographic capability that can address these issues by enabling computation on encrypted data without ever decrypting it, so the analytics results are obtained without revealing the raw data. The aim of this study is to prove the accuracy of analytics results and the practical efficiency of the technology. Methods: A real-world data set of colorectal cancer patients’ survival data, following two different treatment interventions, including 623 patients and 24 variables, amounting to 14,952 items of data, was encrypted using leveled homomorphic encryption implemented in the PALISADE software library. Statistical analysis of key oncological endpoints was blindly performed on both the raw data and the homomorphically-encrypted data using descriptive statistics and survival analysis with Kaplan-Meier curves. Results were then compared with an accuracy goal of two decimals. Results: The difference between the raw data and the homomorphically encrypted data results, regarding all variables analyzed was within the pre-determined accuracy range goal, as well as the practical efficiency of the encrypted computation measured by run time, are presented in table. Conclusions: This study demonstrates that data encrypted with Homomorphic Encryption can be statistical analyzed with a precision of at least two decimal places, allowing safe clinical conclusions drawing while preserving patients’ privacy and protecting data owners’ data assets. Homomorphic encryption allows performing efficient computation on encrypted data non-interactively and without requiring decryption during computation time. Utilizing the technology will empower large-scale cross-institution and cross- stakeholder collaboration, allowing safe international collaborations. Clinical trial information: 0048-19-TLV. [Table: see text]

Download Full-text

METRIC GRAPH RECONSTRUCTION FROM NOISY DATA

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195912600072 ◽

2012 ◽

Vol 22 (04) ◽

pp. 305-325 ◽

Cited By ~ 17

Author(s):

MRIDUL AANJANEYA ◽

FREDERIC CHAZAL ◽

DANIEL CHEN ◽

MARC GLISSE ◽

LEONIDAS GUIBAS ◽

...

Keyword(s):

Real World ◽

Metric Spaces ◽

Data Sets ◽

Metric Geometry ◽

Metric Graphs ◽

Real World Data ◽

Data Set ◽

World Data ◽

Metric Graph ◽

Graph Reconstruction

Many real-world data sets can be viewed of as noisy samples of special types of metric spaces called metric graphs.19 Building on the notions of correspondence and Gromov-Hausdorff distance in metric geometry, we describe a model for such data sets as an approximation of an underlying metric graph. We present a novel algorithm that takes as an input such a data set, and outputs a metric graph that is homeomorphic to the underlying metric graph and has bounded distortion of distances. We also implement the algorithm, and evaluate its performance on a variety of real world data sets.

Download Full-text

LEARNING WITH REAL-WORLD AND ARTIFICIAL DATA FOR IMPROVED VEHICLE DETECTION IN AERIAL IMAGERY

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-917-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 917-924

Author(s):

I. Weber ◽

J. Bongartz ◽

R. Roscher

Keyword(s):

Real World ◽

Research Question ◽

Vehicle Detection ◽

Training Data ◽

Aerial Images ◽

Data Sets ◽

Artificial Data ◽

Real World Data ◽

Data Set ◽

World Data

Abstract. Detecting objects in aerial images is an important task in different environmental and infrastructure-related applications. Deep learning object detectors like RetinaNet offer decent detection performance; however, they require a large amount of annotated training data. It is well known that the collection of annotated data is a time consuming and tedious task, which often cannot be performed sufficiently well for remote sensing tasks since the required data must cover a wide variety of scenes and objects. In this paper, we analyze the performance of such a network given a limited amount of training data and address the research question of whether artificially generated training data can be used to overcome the challenge of real-world data sets with a small amount of training data. For our experiments, we use the ISPRS 2D Semantic Labeling Contest Potsdam data set for vehicle detection, where we derive object-bounding boxes of vehicles suitable for our task. We generate artificial data based on vehicle blueprints and show that networks trained only on generated data may have a lower performance, but are still able to detect most of the vehicles found in the real data set. Moreover, we show that adding generated data to real-world data sets with a limited amount of training data, the performance can be increased significantly, and in some cases, almost reach baseline performance levels.

Download Full-text

Efficient Computation of Top-K Skyline Objects in Data Set With Uncertain Preferences

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2021070104 ◽

2021 ◽

Vol 17 (3) ◽

pp. 68-80

Author(s):

Nitesh Sukhwani ◽

Venkateswara Rao Kagita ◽

Vikas Kumar ◽

Sanjaya Kumar Panda

Keyword(s):

Real World ◽

Efficient Algorithm ◽

Data Sets ◽

Efficient Computation ◽

Real World Data ◽

Data Set ◽

World Data ◽

Wide Range ◽

Uncertain Preferences ◽

Naive Approach

Skyline recommendation with uncertain preferences has drawn AI researchers' attention in recent years due to its wide range of applications. The naive approach of skyline recommendation computes the skyline probability of all objects and ranks them accordingly. However, in many applications, the interest is in determining top-k objects rather than their ranking. The most efficient algorithm to determine an object's skyline probability employs the concepts of zero-contributing set and prefix-based k-level absorption. The authors show that the performance of these methods highly depends on the arrangement of objects in the database. In this paper, the authors propose a method for determining top-k skyline objects without computing the skyline probability of all the objects. They also propose and analyze different methods of ordering the objects in the database. Finally, they empirically show the efficacy of the proposed approaches on several synthetic and real-world data sets.

Download Full-text

Hfinger: Malware HTTP Request Fingerprinting

Entropy ◽

10.3390/e23050507 ◽

2021 ◽

Vol 23 (5) ◽

pp. 507

Author(s):

Piotr Białczak ◽

Wojciech Mazurczyk

Keyword(s):

Real World ◽

Network Traffic ◽

Experimental Evaluation ◽

Data Sets ◽

Real World Data ◽

Malicious Software ◽

Default Mode ◽

World Data ◽

Effectiveness Analysis ◽

Http Protocol

Malicious software utilizes HTTP protocol for communication purposes, creating network traffic that is hard to identify as it blends into the traffic generated by benign applications. To this aim, fingerprinting tools have been developed to help track and identify such traffic by providing a short representation of malicious HTTP requests. However, currently existing tools do not analyze all information included in the HTTP message or analyze it insufficiently. To address these issues, we propose Hfinger, a novel malware HTTP request fingerprinting tool. It extracts information from the parts of the request such as URI, protocol information, headers, and payload, providing a concise request representation that preserves the extracted information in a form interpretable by a human analyst. For the developed solution, we have performed an extensive experimental evaluation using real-world data sets and we also compared Hfinger with the most related and popular existing tools such as FATT, Mercury, and p0f. The conducted effectiveness analysis reveals that on average only 1.85% of requests fingerprinted by Hfinger collide between malware families, what is 8–34 times lower than existing tools. Moreover, unlike these tools, in default mode, Hfinger does not introduce collisions between malware and benign applications and achieves it by increasing the number of fingerprints by at most 3 times. As a result, Hfinger can effectively track and hunt malware by providing more unique fingerprints than other standard tools.

Download Full-text

Empirical evaluation of feature subset selection based on a real-world data set

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2004.03.005 ◽

2004 ◽

Vol 17 (3) ◽

pp. 285-288 ◽

Cited By ~ 5

Author(s):

Petra Perner ◽

Chid Apte

Keyword(s):

Real World ◽

Empirical Evaluation ◽

Subset Selection ◽

Feature Subset Selection ◽

Feature Subset ◽

Real World Data ◽

Data Set ◽

World Data

Download Full-text

Boosting Instance Segmentation with Synthetic Data: A study to overcome the limits of real world data sets

10.1109/iccvw54120.2021.00110 ◽

2021 ◽

Author(s):

Florentin Poucin ◽

Andrea Kraus ◽

Martin Simon

Keyword(s):

Real World ◽

Synthetic Data ◽

Data Sets ◽

Real World Data ◽

World Data ◽

Instance Segmentation

Download Full-text

Comparison of M2M Traffic Models Against Real World Data Sets

2018 IEEE 23rd International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) ◽

10.1109/camad.2018.8515000 ◽

2018 ◽

Cited By ~ 1

Author(s):

Marco Sansoni ◽

Giuseppe Ravagnani ◽

Daniel Zucchetto ◽

Chiara Pielli ◽

Andrea Zanella ◽

...

Keyword(s):

Real World ◽

Data Sets ◽

Real World Data ◽

Traffic Models ◽

World Data

Download Full-text

Consensus Development of a Modern Ontology of Emergency Department Presenting Problems—The Hierarchical Presenting Problem Ontology (HaPPy)

Applied Clinical Informatics ◽

10.1055/s-0039-1691842 ◽

2019 ◽

Vol 10 (03) ◽

pp. 409-420 ◽

Cited By ~ 3

Author(s):

Steven Horng ◽

Nathaniel R. Greenbaum ◽

Larry A. Nathanson ◽

James C. McClay ◽

Foster R. Goss ◽

...

Keyword(s):

Emergency Department ◽

Real World ◽

Emergency Department Patient ◽

Snomed Ct ◽

Presenting Problems ◽

Real World Data ◽

Validation Data ◽

Data Set ◽

World Data ◽

Academic Level

Objective Numerous attempts have been made to create a standardized “presenting problem” or “chief complaint” list to characterize the nature of an emergency department visit. Previous attempts have failed to gain widespread adoption as they were not freely shareable or did not contain the right level of specificity, structure, and clinical relevance to gain acceptance by the larger emergency medicine community. Using real-world data, we constructed a presenting problem list that addresses these challenges. Materials and Methods We prospectively captured the presenting problems for 180,424 consecutive emergency department patient visits at an urban, academic, Level I trauma center in the Boston metro area. No patients were excluded. We used a consensus process to iteratively derive our system using real-world data. We used the first 70% of consecutive visits to derive our ontology, followed by a 6-month washout period, and the remaining 30% for validation. All concepts were mapped to Systematized Nomenclature of Medicine–Clinical Terms (SNOMED CT). Results Our system consists of a polyhierarchical ontology containing 692 unique concepts, 2,118 synonyms, and 30,613 nonvisible descriptions to correct misspellings and nonstandard terminology. Our ontology successfully captured structured data for 95.9% of visits in our validation data set. Discussion and Conclusion We present the HierArchical Presenting Problem ontologY (HaPPy). This ontology was empirically derived and then iteratively validated by an expert consensus panel. HaPPy contains 692 presenting problem concepts, each concept being mapped to SNOMED CT. This freely sharable ontology can help to facilitate presenting problem-based quality metrics, research, and patient care.

Download Full-text

Structure Identification-Based Clustering According to Density Consistency

Mathematical Problems in Engineering ◽

10.1155/2011/890901 ◽

2011 ◽

Vol 2011 ◽

pp. 1-14 ◽

Cited By ~ 1

Author(s):

Chunzhong Li ◽

Zongben Xu

Keyword(s):

High Dimension ◽

Real World ◽

Clustering Algorithm ◽

Density Difference ◽

Structure Identification ◽

Data Sets ◽

Critical Importance ◽

Real World Data ◽

Data Set ◽

High Dimension Data

Structure of data set is of critical importance in identifying clusters, especially the density difference feature. In this paper, we present a clustering algorithm based on density consistency, which is a filtering process to identify same structure feature and classify them into same cluster. This method is not restricted by the shapes and high dimension data set, and meanwhile it is robust to noises and outliers. Extensive experiments on synthetic and real world data sets validate the proposed the new clustering algorithm.

Download Full-text