file structures
Recently Published Documents


TOTAL DOCUMENTS

53
(FIVE YEARS 6)

H-INDEX

5
(FIVE YEARS 0)

2021 ◽  
Vol 5 (OOPSLA) ◽  
pp. 1-32
Author(s):  
Nisarg Patel ◽  
Siddharth Krishna ◽  
Dennis Shasha ◽  
Thomas Wies

Multicopy search structures such as log-structured merge (LSM) trees are optimized for high insert/update/delete (collectively known as upsert) performance. In such data structures, an upsert on key k , which adds ( k , v ) where v can be a value or a tombstone, is added to the root node even if k is already present in other nodes. Thus there may be multiple copies of k in the search structure. A search on k aims to return the value associated with the most recent upsert. We present a general framework for verifying linearizability of concurrent multicopy search structures that abstracts from the underlying representation of the data structure in memory, enabling proof-reuse across diverse implementations. Based on our framework, we propose template algorithms for (a) LSM structures forming arbitrary directed acyclic graphs and (b) differential file structures, and formally verify these templates in the concurrent separation logic Iris. We also instantiate the LSM template to obtain the first verified concurrent in-memory LSM tree implementation.


Data Science ◽  
2021 ◽  
pp. 1-20
Author(s):  
Laura Boeschoten ◽  
Roos Voorvaart ◽  
Ruben Van Den Goorbergh ◽  
Casper Kaandorp ◽  
Martine De Vos

The General Data Protection Regulation (GDPR) grants all natural persons the right to access their personal data if this is being processed by data controllers. The data controllers are obliged to share the data in an electronic format and often provide the data in a so called Data Download Package (DDP). These DDPs contain all data collected by public and private entities during the course of a citizens’ digital life and form a treasure trove for social scientists. However, the data can be deeply private. To protect the privacy of research participants while using their DDPs for scientific research, we developed a de-identification algorithm that is able to handle typical characteristics of DDPs. These include regularly changing file structures, visual and textual content, differing file formats, differing file structures and private information like usernames. We investigate the performance of the algorithm and illustrate how the algorithm can be tailored towards specific DDP structures.


2021 ◽  
Author(s):  
Taavi Päll ◽  
Hannes Luidalepp ◽  
Tanel Tenson ◽  
Ülo Maiväli

AbstractHere we assess reproducibility and inferential quality in the field of differential HT-seq, based on analysis of datasets submitted 2008-2019 to the NCBI GEO data repository. Analysis of GEO submission file structures places an overall 59% upper limit to reproducibility. We further show that only 23% of experiments resulted in theoretically expected p value histogram shapes, although both reproducibility and p value distributions show marked improvement over time. Uniform p value histogram shapes, indicative of <100 true effects, were extremely few. Our calculations of π0, the fraction of true nulls, showed that 36% of experiments have π0 <0.5, meaning that in over a third of experiments most RNA-s were estimated to change their expression level upon experimental treatment. Both the fraction of different p value histogram types and π0 values are strongly associated with the software used for calculating these p values by the original authors, indicating widespread bias.


2020 ◽  
Author(s):  
Deb Sankar Banerjee ◽  
Godwin Stephenson ◽  
Suman G. Das

Time-lapse imaging of bacteria growing in micro-channels in a controlled environment has been instrumental in studying the single cell dynamics of bacterial growth. This kind of a microfluidic setup with growth chambers is popularly known as mother machine [1]. In a typical experiment with such a set-up, bacterial growth can be studied for numerous generations with high resolution and temporal precision using image processing. However, as in any other experiment involving imaging, the image data from a typical mother machine experiment has considerable intensity fluctuations, cell intrusion, cell overlapping, filamentation etc. The large amount of data produced in such experiments makes it hard for manual analysis and correction of such unwanted aberrations. We have developed a modular code for segmentation and analysis of mother machine data (SAM) for rod shaped bacteria where we can detect such aberrations and correctly treat them without manual supervision. We track cumulative cell size and use an adaptive segmentation method to avoid faulty detection of cell division. SAM is currently written and compiled using MATLAB. It is fast (∼ 15 min/GB of image) and can be efficiently coupled with shell scripting to process large amount of data with systematic creation of output file structures and graphical results. It has been tested for many different experimental data and is publicly available in Github.


2018 ◽  
pp. 1842-1845
Author(s):  
Steven M. Beitzel ◽  
Eric C. Jensen ◽  
Ophir Frieder
Keyword(s):  

Author(s):  
Steven M. Beitzel ◽  
Eric C. Jensen ◽  
Ophir Frieder
Keyword(s):  

2013 ◽  
Vol 380-384 ◽  
pp. 2195-2199
Author(s):  
Cheng Jiong Wang

This paper discusses about types of file structures in Linux, points out that EXT2 is the most commonly used file system in Linux, analyzes the disk layout, index point and directory structure of EXT2, and studies the method to access files in EXT2 by name, which makes the access faster and more efficient.


Sign in / Sign up

Export Citation Format

Share Document