scholarly journals Cooler: scalable storage for Hi-C data and other genomically labeled arrays

Author(s):  
Nezar Abdennur ◽  
Leonid A Mirny

Abstract Motivation Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. Results We developed a file format called cooler, based on a sparse data model, that can support genomically labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Availability and implementation Cooler is cross-platform, BSD-licensed and can be installed from the Python package index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Author(s):  
Nezar Abdennur ◽  
Leonid Mirny

Most existing coverage-based (epi)genomic datasets are one-dimensional, but newer technologies probing interactions (physical, genetic, etc.) produce quantitative maps with two-dimensional genomic coordinate systems. Storage and computational costs mount sharply with data resolution when such maps are stored in dense form. Hence, there is a pressing need to develop data storage strategies that handle the full range of useful resolutions in multidimensional genomic datasets by taking advantage of their sparse nature, while supporting efficient compression and providing fast random access to facilitate development of scalable algorithms for data analysis. We developed a file format called cooler, based on a sparse data model, that can support genomically-labeled matrices at any resolution. It has the flexibility to accommodate various descriptions of the data axes (genomic coordinates, tracks and bin annotations), resolutions, data density patterns, and metadata. Cooler is based on HDF5 and is supported by a Python library and command line suite to create, read, inspect and manipulate cooler data collections. The format has been adopted as a standard by the NIH 4D Nucleome Consortium. Cooler is cross-platform, BSD-licensed, and can be installed from the Python Package Index or the bioconda repository. The source code is maintained on Github at https://github.com/mirnylab/cooler.


2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.


2017 ◽  
Vol MCSP2017 (01) ◽  
pp. 7-10 ◽  
Author(s):  
Subhashree Rath ◽  
Siba Kumar Panda

Static random access memory (SRAM) is an important component of embedded cache memory of handheld digital devices. SRAM has become major data storage device due to its large storage density and less time to access. Exponential growth of low power digital devices has raised the demand of low voltage low power SRAM. This paper presents design and implementation of 6T SRAM cell in 180 nm, 90 nm and 45 nm standard CMOS process technology. The simulation has been done in Cadence Virtuoso environment. The performance analysis of SRAM cell has been evaluated in terms of delay, power and static noise margin (SNM).


2009 ◽  
Vol 15 (S3) ◽  
pp. 53-54
Author(s):  
Aiying Wu ◽  
P. M. Vilarinho

AbstractLead zirconate - lead titanate (PZT) materials are commercially important piezoelectric and ferroelectrics in a wide range of applications, such as data storage (dynamic access and ferroelectric random access memories) and sensing and actuating devices. PZT with the morphotropic phase boundary composition offers the highest piezoelectric response and at the present there are no fullydeveloped alternative materials to PZT. The importance of PZT associated with the continuous requirements of device miniaturization, imposes the development of high quality PZT thin films with optimized properties. Concomitantly due to the dependence of the final properties of thin films on the details of the microstructure a thoroughly analysis at the local scale of their microstructure is necessary. Sol-gel method, is one of the Chemical Solution Deposition techniques used to prepare oxide thin films, such as PZT. Starting from a solution, a solid network is progressively formed via inorganic polymerisation reactions. Most metal alkoxides used for sol-gel synthesis are highly reactive towards hydrolysis and condensation. Therefore their chemical reactivity has to be tailored via the chemical modification (or complexation) of metal alkoxides to avoid uncontrolled reactions and precipitation. For PZT sol gel thin film preparation, two chemical routes are frequently used depending on the nature of the molecular precursor, namely methotoxyethanol (MOE) route and diol-route.


2018 ◽  
Vol 35 (15) ◽  
pp. 2671-2673 ◽  
Author(s):  
Josephine Burgin ◽  
Corentin Molitor ◽  
Fady Mohareb

Abstract Summary Bionano optical mapping is a technology that can assist in the final stages of genome assembly by lengthening and ordering scaffolds in a draft assembly by aligning the assembly to a genomic map. However, currently, tools for visualization are limited to use on a Windows operating system or are developed initially for visualizing large-scale structural variation. MapOptics is a lightweight cross-platform tool that enables the user to visualize and interact with the alignment of Bionano optical mapping data and can be used for in depth exploration of hybrid scaffolding alignments. It provides a fast, simple alternative to the large optical mapping analysis programs currently available for this area of research. Availability and implementation MapOptics is implemented in Java 1.8 and released under an MIT licence. MapOptics can be downloaded from https://github.com/FadyMohareb/mapoptics and run on any standard desktop computer equipped with a Java Virtual Machine (JVM). Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Filip Bošković ◽  
Alexander Ohmann ◽  
Ulrich F. Keyser ◽  
Kaikai Chen

AbstractThree-dimensional (3D) DNA nanostructures built via DNA self-assembly have established recent applications in multiplexed biosensing and storing digital information. However, a key challenge is that 3D DNA structures are not easily copied which is of vital importance for their large-scale production and for access to desired molecules by target-specific amplification. Here, we build 3D DNA structural barcodes and demonstrate the copying and random access of the barcodes from a library of molecules using a modified polymerase chain reaction (PCR). The 3D barcodes were assembled by annealing a single-stranded DNA scaffold with complementary short oligonucleotides containing 3D protrusions at defined locations. DNA nicks in these structures are ligated to facilitate barcode copying using PCR. To randomly access a target from a library of barcodes, we employ a non-complementary end in the DNA construct that serves as a barcode-specific primer template. Readout of the 3D DNA structural barcodes was performed with nanopore measurements. Our study provides a roadmap for convenient production of large quantities of self-assembled 3D DNA nanostructures. In addition, this strategy offers access to specific targets, a crucial capability for multiplexed single-molecule sensing and for DNA data storage.


2018 ◽  
Vol 36 (7) ◽  
pp. 660-660 ◽  
Author(s):  
Lee Organick ◽  
Siena Dumas Ang ◽  
Yuan-Jyue Chen ◽  
Randolph Lopez ◽  
Sergey Yekhanin ◽  
...  

2021 ◽  
pp. 2150039
Author(s):  
EJAZ AHMAD KHERA ◽  
HAFEEZ ULLAH ◽  
MUHAMMAD IMRAN ◽  
HASSAN ALGADI ◽  
FAYYAZ HUSSAIN ◽  
...  

Resistive switching (RS) performances had prodigious attention due to their auspicious potential for data storage. Oxide-based devices with metal insulator metal (MIM) structure are more valuable for RS applications. In this study, we have studied the effect of divalent (nickel) as well as trivalent (aluminum) dopant without and with oxygen vacancy (V[Formula: see text] in hafnia (HfO[Formula: see text]-based resistive random-access memory (RRAM) devices. All calculations are carried out within the full potential linearized augmented plane-wave (FP-LAPW) method based on the WIEN2k code by using generalized gradient approximation (GGA) and generalized gradient approximation with U Hubbard parameters (GGA+U) approach. The studies of the band structure, density of states and charge density reveal that HfNiO2+Vo are more appropriate dopant to enhance the conductivity for RRAM devices.


Author(s):  
Stephanie M Gogarten ◽  
Tamar Sofer ◽  
Han Chen ◽  
Chaoyu Yu ◽  
Jennifer A Brody ◽  
...  

Abstract Summary The Genomic Data Storage (GDS) format provides efficient storage and retrieval of genotypes measured by microarrays and sequencing. We developed GENESIS to perform various single- and aggregate-variant association tests using genotype data stored in GDS format. GENESIS implements highly flexible mixed models, allowing for different link functions, multiple variance components and phenotypic heteroskedasticity. GENESIS integrates cohesively with other R/Bioconductor packages to build a complete genomic analysis workflow entirely within the R environment. Availability and implementation https://bioconductor.org/packages/GENESIS; vignettes included. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Michael Milton ◽  
Natalie Thorne

Abstract Summary aCLImatise is a utility for automatically generating tool definitions compatible with bioinformatics workflow languages, by parsing command-line help output. aCLImatise also has an associated database called the aCLImatise Base Camp, which provides thousands of pre-computed tool definitions. Availability and implementation The latest aCLImatise source code is available within a GitHub organisation, under the GPL-3.0 license: https://github.com/aCLImatise. In particular, documentation for the aCLImatise Python package is available at https://aclimatise.github.io/CliHelpParser/, and the aCLImatise Base Camp is available at https://aclimatise.github.io/BaseCamp/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document