Cooler: scalable storage for Hi-C data and other genomically labeled arrays

Abstract Motivation Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. Results We developed a file format called cooler, based on a sparse data model, that can support genomically labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Availability and implementation Cooler is cross-platform, BSD-licensed and can be installed from the Python package index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Cooler: scalable storage for Hi-C data and other genomically-labeled arrays

10.1101/557660 ◽

2019 ◽

Cited By ~ 10

Author(s):

Nezar Abdennur ◽

Leonid Mirny

Keyword(s):

Data Storage ◽

Data Model ◽

Full Range ◽

Random Access ◽

Scalable Algorithms ◽

Data Resolution ◽

Data Density ◽

Cross Platform ◽

Data Collections ◽

Python Package

Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. We developed a file format called cooler, based on a sparse data model, that can support genomically-labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns, and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Cooler is cross-platform, BSD-licensed, and can be installed from the Python Package Index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler.

Download Full-text

Deoxyribonucleic Acid as a Tool for Digital Information Storage: An Overview

THE INDIAN JOURNAL OF VETERINARY SCIENCES AND BIOTECHNOLOGY ◽

10.21887/ijvsbt.15.1.1 ◽

2019 ◽

Vol 15 (01) ◽

pp. 1-8

Author(s):

Ashish C Patel ◽

C G Joshi

Keyword(s):

Data Storage ◽

Dna Sequences ◽

Consensus Sequence ◽

Random Access ◽

Information Storage ◽

Digital Data ◽

Digital Information ◽

Multiple Sequence ◽

Digital World ◽

Digital File

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.

Download Full-text

Analysis of 6T SRAM Cell in Different Technologies

Circulation in Computer Science ◽

10.22632/ccs-2017-mcsp026 ◽

2017 ◽

Vol MCSP2017 (01) ◽

pp. 7-10 ◽

Cited By ~ 2

Author(s):

Subhashree Rath ◽

Siba Kumar Panda

Keyword(s):

Low Power ◽

Data Storage ◽

Low Voltage ◽

Random Access ◽

Storage Device ◽

Cmos Process ◽

Process Technology ◽

Digital Devices ◽

Noise Margin ◽

Sram Cell

Static random access memory (SRAM) is an important component of embedded cache memory of handheld digital devices. SRAM has become major data storage device due to its large storage density and less time to access. Exponential growth of low power digital devices has raised the demand of low voltage low power SRAM. This paper presents design and implementation of 6T SRAM cell in 180 nm, 90 nm and 45 nm standard CMOS process technology. The simulation has been done in Cadence Virtuoso environment. The performance analysis of SRAM cell has been evaluated in terms of delay, power and static noise margin (SNM).

Download Full-text

Nanostructure Analysis of Sol-gel PZT Thin Films Derived from Different Chemical Routes

Microscopy and Microanalysis ◽

10.1017/s1431927609990729 ◽

2009 ◽

Vol 15 (S3) ◽

pp. 53-54

Author(s):

Aiying Wu ◽

P. M. Vilarinho

Keyword(s):

Thin Films ◽

Data Storage ◽

Chemical Reactivity ◽

Sol Gel ◽

Random Access ◽

Lead Zirconate ◽

Piezoelectric Response ◽

Metal Alkoxides ◽

Pzt Thin Films ◽

Wide Range

AbstractLead zirconate - lead titanate (PZT) materials are commercially important piezoelectric and ferroelectrics in a wide range of applications, such as data storage (dynamic access and ferroelectric random access memories) and sensing and actuating devices. PZT with the morphotropic phase boundary composition offers the highest piezoelectric response and at the present there are no fullydeveloped alternative materials to PZT. The importance of PZT associated with the continuous requirements of device miniaturization, imposes the development of high quality PZT thin films with optimized properties. Concomitantly due to the dependence of the final properties of thin films on the details of the microstructure a thoroughly analysis at the local scale of their microstructure is necessary. Sol-gel method, is one of the Chemical Solution Deposition techniques used to prepare oxide thin films, such as PZT. Starting from a solution, a solid network is progressively formed via inorganic polymerisation reactions. Most metal alkoxides used for sol-gel synthesis are highly reactive towards hydrolysis and condensation. Therefore their chemical reactivity has to be tailored via the chemical modification (or complexation) of metal alkoxides to avoid uncontrolled reactions and precipitation. For PZT sol gel thin film preparation, two chemical routes are frequently used depending on the nature of the molecular precursor, namely methotoxyethanol (MOE) route and diol-route.

Download Full-text

MapOptics: a light-weight, cross-platform visualization tool for optical mapping alignment

Bioinformatics ◽

10.1093/bioinformatics/bty1013 ◽

2018 ◽

Vol 35 (15) ◽

pp. 2671-2673 ◽

Cited By ~ 1

Author(s):

Josephine Burgin ◽

Corentin Molitor ◽

Fady Mohareb

Keyword(s):

Large Scale ◽

Optical Mapping ◽

Java Virtual Machine ◽

Supplementary Information ◽

Desktop Computer ◽

Draft Assembly ◽

Simple Alternative ◽

Mapping Analysis ◽

Cross Platform ◽

Genomic Map

Abstract Summary Bionano optical mapping is a technology that can assist in the final stages of genome assembly by lengthening and ordering scaffolds in a draft assembly by aligning the assembly to a genomic map. However, currently, tools for visualization are limited to use on a Windows operating system or are developed initially for visualizing large-scale structural variation. MapOptics is a lightweight cross-platform tool that enables the user to visualize and interact with the alignment of Bionano optical mapping data and can be used for in depth exploration of hybrid scaffolding alignments. It provides a fast, simple alternative to the large optical mapping analysis programs currently available for this area of research. Availability and implementation MapOptics is implemented in Java 1.8 and released under an MIT licence. MapOptics can be downloaded from https://github.com/FadyMohareb/mapoptics and run on any standard desktop computer equipped with a Java Virtual Machine (JVM). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

3D DNA structural barcode copying and random access

10.1101/2020.11.27.401596 ◽

2020 ◽

Author(s):

Filip Bošković ◽

Alexander Ohmann ◽

Ulrich F. Keyser ◽

Kaikai Chen

Keyword(s):

Data Storage ◽

Single Molecule ◽

Self Assembly ◽

Large Scale ◽

Three Dimensional ◽

Random Access ◽

Scale Production ◽

Digital Information ◽

Dna Nanostructures ◽

Large Scale Production

AbstractThree-dimensional (3D) DNA nanostructures built via DNA self-assembly have established recent applications in multiplexed biosensing and storing digital information. However, a key challenge is that 3D DNA structures are not easily copied which is of vital importance for their large-scale production and for access to desired molecules by target-specific amplification. Here, we build 3D DNA structural barcodes and demonstrate the copying and random access of the barcodes from a library of molecules using a modified polymerase chain reaction (PCR). The 3D barcodes were assembled by annealing a single-stranded DNA scaffold with complementary short oligonucleotides containing 3D protrusions at defined locations. DNA nicks in these structures are ligated to facilitate barcode copying using PCR. To randomly access a target from a library of barcodes, we employ a non-complementary end in the DNA construct that serves as a barcode-specific primer template. Readout of the 3D DNA structural barcodes was performed with nanopore measurements. Our study provides a roadmap for convenient production of large quantities of self-assembled 3D DNA nanostructures. In addition, this strategy offers access to specific targets, a crucial capability for multiplexed single-molecule sensing and for DNA data storage.

Download Full-text

Erratum: Random access in large-scale DNA data storage

Nature Biotechnology ◽

10.1038/nbt0718-660c ◽

2018 ◽

Vol 36 (7) ◽

pp. 660-660 ◽

Cited By ~ 1

Author(s):

Lee Organick ◽

Siena Dumas Ang ◽

Yuan-Jyue Chen ◽

Randolph Lopez ◽

Sergey Yekhanin ◽

...

Keyword(s):

Data Storage ◽

Large Scale ◽

Random Access

Download Full-text

THE FIRST PRINCIPLE STUDY OF COMPARISON OF DIVALENT AND TRIVALENT IMPURITY IN RRAM DEVICES USING GGA+U

Surface Review and Letters ◽

10.1142/s0218625x21500396 ◽

2021 ◽

pp. 2150039

Author(s):

EJAZ AHMAD KHERA ◽

HAFEEZ ULLAH ◽

MUHAMMAD IMRAN ◽

HASSAN ALGADI ◽

FAYYAZ HUSSAIN ◽

...

Keyword(s):

Data Storage ◽

Random Access ◽

Resistive Random Access Memory ◽

Full Potential ◽

Generalized Gradient ◽

Augmented Plane Wave ◽

Generalized Gradient Approximation ◽

Metal Insulator Metal ◽

Structure Density ◽

Linearized Augmented Plane Wave

Resistive switching (RS) performances had prodigious attention due to their auspicious potential for data storage. Oxide-based devices with metal insulator metal (MIM) structure are more valuable for RS applications. In this study, we have studied the effect of divalent (nickel) as well as trivalent (aluminum) dopant without and with oxygen vacancy (V[Formula: see text] in hafnia (HfO[Formula: see text]-based resistive random-access memory (RRAM) devices. All calculations are carried out within the full potential linearized augmented plane-wave (FP-LAPW) method based on the WIEN2k code by using generalized gradient approximation (GGA) and generalized gradient approximation with U Hubbard parameters (GGA+U) approach. The studies of the band structure, density of states and charge density reveal that HfNiO2+Vo are more appropriate dopant to enhance the conductivity for RRAM devices.

Download Full-text

Genetic association testing using the GENESIS R/Bioconductor package

Bioinformatics ◽

10.1093/bioinformatics/btz567 ◽

2019 ◽

Cited By ~ 20

Author(s):

Stephanie M Gogarten ◽

Tamar Sofer ◽

Han Chen ◽

Chaoyu Yu ◽

Jennifer A Brody ◽

...

Keyword(s):

Data Storage ◽

Genomic Analysis ◽

Supplementary Information ◽

Storage And Retrieval ◽

Association Testing ◽

Link Functions ◽

Efficient Storage ◽

Genetic Association Testing ◽

Analysis Workflow ◽

Complete Genomic

Abstract Summary The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment. Availability and implementation https://bioconductor.org/packages/GENESIS; vignettes included. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

aCLImatise: automated generation of tool definitions for bioinformatics workflows

Bioinformatics ◽

10.1093/bioinformatics/btaa1033 ◽

2020 ◽

Author(s):

Michael Milton ◽

Natalie Thorne

Keyword(s):

Source Code ◽

Supplementary Information ◽

Command Line ◽

Supplementary Data ◽

Automated Generation ◽

Base Camp ◽

Python Package ◽

Bioinformatics Workflow ◽

Bioinformatics Workflows

Abstract Summary aCLImatise is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. aCLImatise also has an associated database called the aCLImatise Base Camp, which provides thousands of pre-computed tool definitions. Availability and implementation The latest aCLImatise source code is available within a GitHub organisation, under the GPL-3.0 license: https://github.com/aCLImatise. In particular, documentation for the aCLImatise Python package is available at https://aclimatise.github.io/CliHelpParser/, and the aCLImatise Base Camp is available at https://aclimatise.github.io/BaseCamp/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text