file structures Latest Research Papers

Multicopy search structures such as log-structured merge (LSM) trees are optimized for high insert/update/delete (collectively known as upsert) performance. In such data structures, an upsert on key k , which adds ( k , v ) where v can be a value or a tombstone, is added to the root node even if k is already present in other nodes. Thus there may be multiple copies of k in the search structure. A search on k aims to return the value associated with the most recent upsert. We present a general framework for verifying linearizability of concurrent multicopy search structures that abstracts from the underlying representation of the data structure in memory, enabling proof-reuse across diverse implementations. Based on our framework, we propose template algorithms for (a) LSM structures forming arbitrary directed acyclic graphs and (b) differential file structures, and formally verify these templates in the concurrent separation logic Iris. We also instantiate the LSM template to obtain the first verified concurrent in-memory LSM tree implementation.

Download Full-text

Image File Structures in Nuclear Medicine

10.1201/9780429489556-12 ◽

2021 ◽

pp. 237-250

Author(s):

Charles Herbst

Keyword(s):

Nuclear Medicine ◽

Image File ◽

File Structures

Download Full-text

File Structures for a Node Capability File (NCF) and LIN Description File (LDF)

10.4271/j2602-3_202110 ◽

2021 ◽

Author(s):

Keyword(s):

Description File ◽

File Structures

Download Full-text

Automatic de-identification of data download packages

Data Science ◽

10.3233/ds-210035 ◽

2021 ◽

pp. 1-20

Author(s):

Laura Boeschoten ◽

Roos Voorvaart ◽

Ruben Van Den Goorbergh ◽

Casper Kaandorp ◽

Martine De Vos

Keyword(s):

Private Information ◽

Personal Data ◽

Identification Algorithm ◽

Social Scientists ◽

Public And Private ◽

General Data Protection Regulation ◽

File Formats ◽

File Structures ◽

The Right ◽

Textual Content

The General Data Protection Regulation (GDPR) grants all natural persons the right to access their personal data if this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private entities during the course of a citizens’ digital life and form a treasure trove for social scientists. However, the data can be deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed a de-identification algorithm that is able to handle typical characteristics of DDPs. These include regularly changing file structures, visual and textual content, differing file formats, differing file structures and private information like usernames. We investigate the performance of the algorithm and illustrate how the algorithm can be tailored towards specific DDP structures.

Download Full-text

A field-wide assessment of differential high throughput sequencing reveals widespread bias

10.1101/2021.01.04.424681 ◽

2021 ◽

Author(s):

Taavi Päll ◽

Hannes Luidalepp ◽

Tanel Tenson ◽

Ülo Maiväli

Keyword(s):

High Throughput ◽

Marked Improvement ◽

High Throughput Sequencing ◽

Experimental Treatment ◽

Data Repository ◽

P Value ◽

P Values ◽

Upper Limit ◽

File Structures ◽

Over Time

AbstractHere we assess reproducibility and inferential quality in the field of differential HT-seq, based on analysis of datasets submitted 2008-2019 to the NCBI GEO data repository. Analysis of GEO submission file structures places an overall 59% upper limit to reproducibility. We further show that only 23% of experiments resulted in theoretically expected p value histogram shapes, although both reproducibility and p value distributions show marked improvement over time. Uniform p value histogram shapes, indicative of <100 true effects, were extremely few. Our calculations of π0, the fraction of true nulls, showed that 36% of experiments have π0 <0.5, meaning that in over a third of experiments most RNA-s were estimated to change their expression level upon experimental treatment. Both the fraction of different p value histogram types and π0 values are strongly associated with the software used for calculating these p values by the original authors, indicating widespread bias.

Download Full-text

Segmentation and analysis of mother machine data: SAM

10.1101/2020.10.01.322685 ◽

2020 ◽

Author(s):

Deb Sankar Banerjee ◽

Godwin Stephenson ◽

Suman G. Das

Keyword(s):

Bacterial Growth ◽

Image Data ◽

Time Lapse ◽

Cell Dynamics ◽

Temporal Precision ◽

Micro Channels ◽

Typical Experiment ◽

File Structures ◽

Set Up ◽

Time Lapse Imaging

Time-lapse imaging of bacteria growing in micro-channels in a controlled environment has been instrumental in studying the single cell dynamics of bacterial growth. This kind of a microfluidic setup with growth chambers is popularly known as mother machine [1]. In a typical experiment with such a set-up, bacterial growth can be studied for numerous generations with high resolution and temporal precision using image processing. However, as in any other experiment involving imaging, the image data from a typical mother machine experiment has considerable intensity fluctuations, cell intrusion, cell overlapping, filamentation etc. The large amount of data produced in such experiments makes it hard for manual analysis and correction of such unwanted aberrations. We have developed a modular code for segmentation and analysis of mother machine data (SAM) for rod shaped bacteria where we can detect such aberrations and correctly treat them without manual supervision. We track cumulative cell size and use an adaptive segmentation method to avoid faulty detection of cell division. SAM is currently written and compiled using MATLAB. It is fast (∼ 15 min/GB of image) and can be efficiently coupled with shell scripting to process large amount of data with systematic creation of output file structures and graphical results. It has been tested for many different experimental data and is publicly available in Github.

Download Full-text

Index Creation and File Structures

Encyclopedia of Database Systems ◽

10.1007/978-1-4614-8265-9_944 ◽

2018 ◽

pp. 1842-1845

Author(s):

Steven M. Beitzel ◽

Eric C. Jensen ◽

Ophir Frieder

Keyword(s):

File Structures

Download Full-text

Index Creation and File Structures

Encyclopedia of Database Systems ◽

10.1007/978-1-4899-7993-3_944-2 ◽

2016 ◽

pp. 1-3

Author(s):

Steven M. Beitzel ◽

Eric C. Jensen ◽

Ophir Frieder

Keyword(s):

File Structures

Download Full-text

An Analysis of the File System for Linux

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.380-384.2195 ◽

2013 ◽

Vol 380-384 ◽

pp. 2195-2199

Author(s):

Cheng Jiong Wang

Keyword(s):

File System ◽

Index Point ◽

Disk Layout ◽

File Structures ◽

Directory Structure

This paper discusses about types of file structures in Linux, points out that EXT2 is the most commonly used file system in Linux, analyzes the disk layout, index point and directory structure of EXT2, and studies the method to access files in EXT2 by name, which makes the access faster and more efficient.

Download Full-text