Cooler: scalable storage for Hi-C data and other genomically-labeled arrays

Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. We developed a file format called cooler, based on a sparse data model, that can support genomically-labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns, and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Cooler is cross-platform, BSD-licensed, and can be installed from the Python Package Index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler.

Download Full-text

Cooler: scalable storage for Hi-C data and other genomically labeled arrays

Bioinformatics ◽

10.1093/bioinformatics/btz540 ◽

2019 ◽

Cited By ~ 22

Author(s):

Nezar Abdennur ◽

Leonid A Mirny

Keyword(s):

Data Storage ◽

Full Range ◽

Random Access ◽

Supplementary Information ◽

Scalable Algorithms ◽

Data Resolution ◽

Data Density ◽

Cross Platform ◽

Data Collections ◽

Python Package

Abstract Motivation Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. Results We developed a file format called cooler, based on a sparse data model, that can support genomically labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Availability and implementation Cooler is cross-platform, BSD-licensed and can be installed from the Python package index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deoxyribonucleic Acid as a Tool for Digital Information Storage: An Overview

THE INDIAN JOURNAL OF VETERINARY SCIENCES AND BIOTECHNOLOGY ◽

10.21887/ijvsbt.15.1.1 ◽

2019 ◽

Vol 15 (01) ◽

pp. 1-8

Author(s):

Ashish C Patel ◽

C G Joshi

Keyword(s):

Data Storage ◽

Dna Sequences ◽

Consensus Sequence ◽

Random Access ◽

Information Storage ◽

Digital Data ◽

Digital Information ◽

Multiple Sequence ◽

Digital World ◽

Digital File

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.

Download Full-text

Analysis of 6T SRAM Cell in Different Technologies

Circulation in Computer Science ◽

10.22632/ccs-2017-mcsp026 ◽

2017 ◽

Vol MCSP2017 (01) ◽

pp. 7-10 ◽

Cited By ~ 2

Author(s):

Subhashree Rath ◽

Siba Kumar Panda

Keyword(s):

Low Power ◽

Data Storage ◽

Low Voltage ◽

Random Access ◽

Storage Device ◽

Cmos Process ◽

Process Technology ◽

Digital Devices ◽

Noise Margin ◽

Sram Cell

Static random access memory (SRAM) is an important component of embedded cache memory of handheld digital devices. SRAM has become major data storage device due to its large storage density and less time to access. Exponential growth of low power digital devices has raised the demand of low voltage low power SRAM. This paper presents design and implementation of 6T SRAM cell in 180 nm, 90 nm and 45 nm standard CMOS process technology. The simulation has been done in Cadence Virtuoso environment. The performance analysis of SRAM cell has been evaluated in terms of delay, power and static noise margin (SNM).

Download Full-text

Nanostructure Analysis of Sol-gel PZT Thin Films Derived from Different Chemical Routes

Microscopy and Microanalysis ◽

10.1017/s1431927609990729 ◽

2009 ◽

Vol 15 (S3) ◽

pp. 53-54

Author(s):

Aiying Wu ◽

P. M. Vilarinho

Keyword(s):

Thin Films ◽

Data Storage ◽

Chemical Reactivity ◽

Sol Gel ◽

Random Access ◽

Lead Zirconate ◽

Piezoelectric Response ◽

Metal Alkoxides ◽

Pzt Thin Films ◽

Wide Range

AbstractLead zirconate - lead titanate (PZT) materials are commercially important piezoelectric and ferroelectrics in a wide range of applications, such as data storage (dynamic access and ferroelectric random access memories) and sensing and actuating devices. PZT with the morphotropic phase boundary composition offers the highest piezoelectric response and at the present there are no fullydeveloped alternative materials to PZT. The importance of PZT associated with the continuous requirements of device miniaturization, imposes the development of high quality PZT thin films with optimized properties. Concomitantly due to the dependence of the final properties of thin films on the details of the microstructure a thoroughly analysis at the local scale of their microstructure is necessary. Sol-gel method, is one of the Chemical Solution Deposition techniques used to prepare oxide thin films, such as PZT. Starting from a solution, a solid network is progressively formed via inorganic polymerisation reactions. Most metal alkoxides used for sol-gel synthesis are highly reactive towards hydrolysis and condensation. Therefore their chemical reactivity has to be tailored via the chemical modification (or complexation) of metal alkoxides to avoid uncontrolled reactions and precipitation. For PZT sol gel thin film preparation, two chemical routes are frequently used depending on the nature of the molecular precursor, namely methotoxyethanol (MOE) route and diol-route.

Download Full-text

A Model for Mechanism Data Storage

18th Design Automation Conference: Volume 2 — Geometric Modeling, Mechanisms, and Mechanical Systems Analysis ◽

10.1115/detc1992-0150 ◽

1992 ◽

Author(s):

Wan Wang

Keyword(s):

Data Storage ◽

Data Model ◽

Computer Time ◽

Kinematic Structure ◽

Topological Graph ◽

Complete Set ◽

Isomorphism Identification

Abstract A data model for kinematic structure of mechanisms and its coding principle are proposed, based on the topological graph and contract graph. In the model every basic chain is mapped by a code of 5 decimal digits and a mechanism is mapped by a set of code of basic chains. The model occupies minimal memory, and contains a complete set of useful primary parameters of structure, and significantly reduce computer time for isomorphism identification.

Download Full-text

3D DNA structural barcode copying and random access

10.1101/2020.11.27.401596 ◽

2020 ◽

Author(s):

Filip Bošković ◽

Alexander Ohmann ◽

Ulrich F. Keyser ◽

Kaikai Chen

Keyword(s):

Data Storage ◽

Single Molecule ◽

Self Assembly ◽

Large Scale ◽

Three Dimensional ◽

Random Access ◽

Scale Production ◽

Digital Information ◽

Dna Nanostructures ◽

Large Scale Production

AbstractThree-dimensional (3D) DNA nanostructures built via DNA self-assembly have established recent applications in multiplexed biosensing and storing digital information. However, a key challenge is that 3D DNA structures are not easily copied which is of vital importance for their large-scale production and for access to desired molecules by target-specific amplification. Here, we build 3D DNA structural barcodes and demonstrate the copying and random access of the barcodes from a library of molecules using a modified polymerase chain reaction (PCR). The 3D barcodes were assembled by annealing a single-stranded DNA scaffold with complementary short oligonucleotides containing 3D protrusions at defined locations. DNA nicks in these structures are ligated to facilitate barcode copying using PCR. To randomly access a target from a library of barcodes, we employ a non-complementary end in the DNA construct that serves as a barcode-specific primer template. Readout of the 3D DNA structural barcodes was performed with nanopore measurements. Our study provides a roadmap for convenient production of large quantities of self-assembled 3D DNA nanostructures. In addition, this strategy offers access to specific targets, a crucial capability for multiplexed single-molecule sensing and for DNA data storage.

Download Full-text

Storing Hypergraph-Based Data Models in Non-hypergraph Data Storage and Applications for Information Systems

Vietnam Journal of Computer Science ◽

10.1142/s2196888821500160 ◽

2021 ◽

pp. 1-21

Author(s):

Bálint Molnár ◽

András Béleczki ◽

Bence Sarkadi-Nagy

Keyword(s):

Information Systems ◽

Data Storage ◽

Data Model ◽

Relational Data ◽

Complex Data ◽

Graph Representations ◽

Relational Data Model ◽

Rich Information ◽

The Relationship

Data structures and especially the relationship among the data entities have changed in the last couple of years. The network-like graph representations of data-model are becoming more and more common nowadays, since they are more suitable to depict these, than the well-established relational data-model. The graphs can describe large and complex networks — like social networks — but also capable of storing rich information about complex data. This was mostly of relational data-model trait before. This also can be achieved with the use of the knowledge representation tool called “hypergraphs”. To utilize the possibilities of this model, we need a practical way to store and process hypergraphs. In this paper, we propose a way by which we can store hypergraphs model in the SAP HANA in-memory database system which has a “Graph Core” engine besides the relational data model. Graph Core has many graph algorithms by default however it is not capable to store or to work with hypergraphs neither are any of these algorithms specifically tailored for hypergraphs either. Hence in this paper, besides the case study of the two information systems, we also propose pseudo-code level algorithms to accommodate hypergraph semantics to process our IS model.

Download Full-text

Erratum: Random access in large-scale DNA data storage

Nature Biotechnology ◽

10.1038/nbt0718-660c ◽

2018 ◽

Vol 36 (7) ◽

pp. 660-660 ◽

Cited By ~ 1

Author(s):

Lee Organick ◽

Siena Dumas Ang ◽

Yuan-Jyue Chen ◽

Randolph Lopez ◽

Sergey Yekhanin ◽

...

Keyword(s):

Data Storage ◽

Large Scale ◽

Random Access

Download Full-text

THE FIRST PRINCIPLE STUDY OF COMPARISON OF DIVALENT AND TRIVALENT IMPURITY IN RRAM DEVICES USING GGA+U

Surface Review and Letters ◽

10.1142/s0218625x21500396 ◽

2021 ◽

pp. 2150039

Author(s):

EJAZ AHMAD KHERA ◽

HAFEEZ ULLAH ◽

MUHAMMAD IMRAN ◽

HASSAN ALGADI ◽

FAYYAZ HUSSAIN ◽

...

Keyword(s):

Data Storage ◽

Random Access ◽

Resistive Random Access Memory ◽

Full Potential ◽

Generalized Gradient ◽

Augmented Plane Wave ◽

Generalized Gradient Approximation ◽

Metal Insulator Metal ◽

Structure Density ◽

Linearized Augmented Plane Wave

Resistive switching (RS) performances had prodigious attention due to their auspicious potential for data storage. Oxide-based devices with metal insulator metal (MIM) structure are more valuable for RS applications. In this study, we have studied the effect of divalent (nickel) as well as trivalent (aluminum) dopant without and with oxygen vacancy (V[Formula: see text] in hafnia (HfO[Formula: see text]-based resistive random-access memory (RRAM) devices. All calculations are carried out within the full potential linearized augmented plane-wave (FP-LAPW) method based on the WIEN2k code by using generalized gradient approximation (GGA) and generalized gradient approximation with U Hubbard parameters (GGA+U) approach. The studies of the band structure, density of states and charge density reveal that HfNiO2+Vo are more appropriate dopant to enhance the conductivity for RRAM devices.

Download Full-text

New developments on EDR (Event Data Recorder) for automated vehicles

Open Engineering ◽

10.1515/eng-2020-0007 ◽

2020 ◽

Vol 10 (1) ◽

pp. 140-146

Author(s):

Klaus Böhm ◽

Tibor Kubjatko ◽

Daniel Paula ◽

Hans-Georg Schweiger

Keyword(s):

Data Base ◽

Data Storage ◽

Data Model ◽

Working Group ◽

Storage System ◽

Data Access ◽

Event Data ◽

Automated Vehicles ◽

Automated Driving ◽

Quantum Leap

AbstractWith the upcoming new legislative rules in the EU on Event Data Recorder beginning 2022 the question is whether the discussed data base is sufficient for the needs of clarifying accidents involving automated vehicles. Based on the reconstruction of real accidents including vehicles with ADAS combined with specially designed crash tests a broader data base than US EDR regulation (NHTSA 49 CFR Part 563.7) is proposed. The working group AHEAD, to which the authors contribute, has already elaborated a data model that fits the needs of automated driving. The structure of this data model is shown. Moreover, the special benefits of storing internal video or photo feeds form the vehicle camera systems combined with object data is illustrated. When using a sophisticate 3D measurement method of the accident scene the videos or photos can also serve as a control instance for the stored vehicle data. The AHEAD Data Model enhanced with the storage of the video and photo feeds should be considered in the planned roadmap of the Informal Working Group (IWG) on EDR/ DSSAD (Data Storage System for Automated Driving) reporting to UNECE WP29. Also, a data access over the air using technology already applied in China for electric vehicles called Real Time Monitoring would allow a quantum leap in forensic accident reconstruction.

Download Full-text