scholarly journals Chamaeleo: a robust library for DNA storage coding schemes

2020 ◽  
Author(s):  
Zhi Ping ◽  
Haoling Zhang ◽  
Shihong Chen ◽  
Qianlong Zhuang ◽  
Sha Joe Zhu ◽  
...  

AbstractChamaeleo is currently the only collection library that focuses on adapting multiple well-established coding schemes for DNA storage. It provides a tool for researchers to study various coding schemes and apply them in practice. Chamaeleo adheres to the concept of high aggregation and low coupling for software design which will enhance the performance efficiency. Here, we describe the working pipeline of Chamaeleo, and demonstrate its advantages over the implementation of existing single coding schemes. The source code is available at https://github.com/ntpz870817/Chamaeleo, it can be also installed by the command of pip.exe, “pip install chamaeleo”. Alternatively, the wheel file can be downloaded at https://pypi.org/project/Chamaeleo/. Detailed documentation is available at https://chamaeleo.readthedocs.io/en/latest/.Author SummaryDNA is now considered to be a promising candidate media for future digital information storage in order to tackle the global issue of data explosion. Transcoding between binary digital data and quanternary DNA information is one of the most important steps in the whole process of DNA digital storage. Although several coding schemes have been reported, researchers are still investigating better strategies. Moreover, the scripts of these coding schemes use different programming languages, software architectures and optimization contents. Therefore, we here introduce Chamaeleo, a library in which several classical coding schemes are collected, to reconstruct and optimize them. One of the key features of this tool is that we modulize the functions and make it feasible for more customized way of usage. Meanwhile, developers can also incorporate their new algorithms according to the framework expediently. Based on the benchmark tests we conducted, Chamaeleo shows better flexibility and expandability compared to original packages and we hope that it will help the further study and applications in DNA digital storage.

2019 ◽  
Vol 15 (01) ◽  
pp. 1-8
Author(s):  
Ashish C Patel ◽  
C G Joshi

Current data storage technologies cannot keep pace longer with exponentially growing amounts of data through the extensive use of social networking photos and media, etc. The "digital world” with 4.4 zettabytes in 2013 has predicted it to reach 44 zettabytes by 2020. From the past 30 years, scientists and researchers have been trying to develop a robust way of storing data on a medium which is dense and ever-lasting and found DNA as the most promising storage medium. Unlike existing storage devices, DNA requires no maintenance, except the need to store at a cool and dark place. DNA has a small size with high density; just 1 gram of dry DNA can store about 455 exabytes of data. DNA stores the informations using four bases, viz., A, T, G, and C, while CDs, hard disks and other devices stores the information using 0’s and 1’s on the spiral tracks. In the DNA based storage, after binarization of digital file into the binary codes, encoding and decoding are important steps in DNA based storage system. Once the digital file is encoded, the next step is to synthesize arbitrary single-strand DNA sequences and that can be stored in the deep freeze until use.When there is a need for information to be recovered, it can be done using DNA sequencing. New generation sequencing (NGS) capable of producing sequences with very high throughput at a much lower cost about less than 0.1 USD for one MB of data than the first sequencing technologies. Post-sequencing processing includes alignment of all reads using multiple sequence alignment (MSA) algorithms to obtain different consensus sequences. The consensus sequence is decoded as the reversal of the encoding process. Most prior DNA data storage efforts sequenced and decoded the entire amount of stored digital information with no random access, but nowadays it has become possible to extract selective files (e.g., retrieving only required image from a collection) from a DNA pool using PCR-based random access. Various scientists successfully stored up to 110 zettabytes data in one gram of DNA. In the future, with an efficient encoding, error corrections, cheaper DNA synthesis,and sequencing, DNA based storage will become a practical solution for storage of exponentially growing digital data.


2021 ◽  
Vol 23 (4) ◽  
pp. 796-815
Author(s):  
Yang Wang ◽  
Sun Sun Lim

People are today located in media ecosystems in which a variety of ICT devices and platforms coexist and complement each other to fulfil users’ heterogeneous requirements. These multi-media affordances promote a highly hyperlinked and nomadic habit of digital data management which blurs the long-standing boundaries between information storage, sharing and exchange. Specifically, during the pervasive sharing and browsing of fragmentary digital information (e.g. photos, videos, online diaries, news articles) across various platforms, life experiences and knowledge involved are meanwhile classified and stored for future retrieval and collective memory construction. For international migrants who straddle different geographical and cultural contexts, management of various digital materials is particularly complicated as they have to be familiar with and appropriately navigate technological infrastructures of both home and host countries. Drawing on ethnographic observations of 40 Chinese migrant mothers in Singapore, this article delves into their quotidian routines of acquiring, storing, sharing and exchanging digital information across a range of ICT devices and platforms, as well as cultural and emotional implications of these mediated behaviours for their everyday life experiences. A multi-layer and multi-sited repertoire of ‘life archiving’ was identified among these migrant mothers in which they leave footprints of everyday life through a tactical combination of interactive sharing, pervasive tagging and backup storage of diverse digital content.


2018 ◽  
Author(s):  
Henry H. Lee ◽  
Reza Kalhor ◽  
Naveen Goela ◽  
Jean Bolot ◽  
George M. Church

AbstractDNA is an emerging storage medium for digital data but its adoption is hampered by limitations of phosphoramidite chemistry, which was developed for single-base accuracy required for biological functionality. Here, we establish a de novo enzymatic DNA synthesis strategy designed from the bottom-up for information storage. We harness a template-independent DNA polymerase for controlled synthesis of sequences with user-defined information content. We demonstrate retrieval of 144-bits, including addressing, from perfectly synthesized DNA strands using batch-processed Illumina and real-time Oxford Nanopore sequencing. We then develop a codec for data retrieval from populations of diverse but imperfectly synthesized DNA strands, each with a ~30% error tolerance. With this codec, we experimentally validate a kilobyte-scale design which stores 1 bit per nucleotide. Simulations of the codec support reliable and robust storage of information for large-scale systems. This work paves the way for alternative synthesis and sequencing strategies to advance information storage in DNA.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Peter Michael Schwarz ◽  
Bernd Freisleben

Abstract Background DNA is a promising storage medium for high-density long-term digital data storage. Since DNA synthesis and sequencing are still relatively expensive tasks, the coding methods used to store digital data in DNA should correct errors and avoid unstable or error-prone DNA sequences. Near-optimal rateless erasure codes, also called fountain codes, are particularly interesting codes to realize high-capacity and low-error DNA storage systems, as shown by Erlich and Zielinski in their approach based on the Luby transform (LT) code. Since LT is the most basic fountain code, there is a large untapped potential for improvement in using near-optimal erasure codes for DNA storage. Results We present NOREC4DNA, a software framework to use, test, compare, and improve near-optimal rateless erasure codes (NORECs) for DNA storage systems. These codes can effectively be used to store digital information in DNA and cope with the restrictions of the DNA medium. Additionally, they can adapt to possible variable lengths of DNA strands and have nearly zero overhead. We describe the design and implementation of NOREC4DNA. Furthermore, we present experimental results demonstrating that NOREC4DNA can flexibly be used to evaluate the use of NORECs in DNA storage systems. In particular, we show that NORECs that apparently have not yet been used for DNA storage, such as Raptor and Online codes, can achieve significant improvements over LT codes that were used in previous work. NOREC4DNA is available on https://github.com/umr-ds/NOREC4DNA. Conclusion NOREC4DNA is a flexible and extensible software framework for using, evaluating, and comparing NORECs for DNA storage systems.


2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Jun Jiang ◽  
Lianping Guo ◽  
Kuojun Yang ◽  
Huiqing Pan

Vertical resolution is an essential indicator of digital storage oscilloscope (DSO) and the key to improving resolution is to increase digitalizing bits and lower noise. Averaging is a typical method to improve signal to noise ratio (SNR) and the effective number of bits (ENOB). The existing averaging algorithm is apt to be restricted by the repetitiveness of signal and be influenced by gross error in quantization, and therefore its effect on restricting noise and improving resolution is limited. An information entropy-based data fusion and average-based decimation filtering algorithm, proceeding from improving average algorithm and in combination with relevant theories of information entropy, are proposed in this paper to improve the resolution of oscilloscope. For single acquiring signal, resolution is improved through eliminating gross error in quantization by utilizing the maximum entropy of sample data with further noise filtering via average-based decimation after data fusion of efficient sample data under the premise of oversampling. No subjective assumptions and constraints are added to the signal under test in the whole process without any impact on the analog bandwidth of oscilloscope under actual sampling rate.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Jung Min Lee ◽  
Mo Beom Koo ◽  
Seul Woo Lee ◽  
Heelim Lee ◽  
Junho Kwon ◽  
...  

AbstractSynthesis of a polymer composed of a large discrete number of chemically distinct monomers in an absolutely defined aperiodic sequence remains a challenge in polymer chemistry. The synthesis has largely been limited to oligomers having a limited number of repeating units due to the difficulties associated with the step-by-step addition of individual monomers to achieve high molecular weights. Here we report the copolymers of α-hydroxy acids, poly(phenyllactic-co-lactic acid) (PcL) built via the cross-convergent method from four dyads of monomers as constituent units. Our proposed method allows scalable synthesis of sequence-defined PcL in a minimal number of coupling steps from reagents in stoichiometric amounts. Digital information can be stored in an aperiodic sequence of PcL, which can be fully retrieved as binary code by mass spectrometry sequencing. The information storage density (bit/Da) of PcL is 50% higher than DNA, and the storage capacity of PcL can also be increased by adjusting the molecular weight (~38 kDa).


2020 ◽  
pp. 19-43
Author(s):  
Henri Schildt

This chapter examines digitalization as a set of new normative ideals for managing and organizing businesses, enabled by new technologies. The data imperative consists of two mutually reinforcing goals: the pursuit of omniscience—the aspiration of management to capture the world relevant to the company through digital data; and the pursuit of omnipotence—an aspiration of managers to control and optimize activities in real-time and around the world through software. The data imperative model captures a self-reinforcing cycle of four sequential steps: (1) the creation and capture of data, (2) the combination and analysis of data, (3) the redesign of business processes around smart algorithms, and (4) the ability to control the world through digital information flows. The logical end-point of the data imperative is a ‘programmable world’, a conception of society saturated with Internet-connected hardware that is able to capture processes in real time and control them in order to optimize desired outcomes.


2020 ◽  
Vol 117 (46) ◽  
pp. 28589-28595 ◽  
Author(s):  
Chiara Gattinoni ◽  
Nives Strkalj ◽  
Rea Härdi ◽  
Manfred Fiebig ◽  
Morgan Trassin ◽  
...  

Ferroelectric perovskites present a switchable spontaneous polarization and are promising energy-efficient device components for digital information storage. Full control of the ferroelectric polarization in ultrathin films of ferroelectric perovskites needs to be achieved in order to apply this class of materials in modern devices. However, ferroelectricity itself is not well understood in this nanoscale form, where interface and surface effects become particularly relevant and where loss of net polarization is often observed. In this work, we show that the precise control of the structure of the top surface and bottom interface of the thin film is crucial toward this aim. We explore the properties of thin films of the prototypical ferroelectric lead titanate (PbTiO3) on a metallic strontium ruthenate (SrRuO3) buffer using a combination of computational (density functional theory) and experimental (optical second harmonic generation) methods. We find that the polarization direction and strength are influenced by chemical and electronic processes occurring at the epitaxial interface and at the surface. The polarization is particularly sensitive to adsorbates and to surface and interface defects. These results point to the possibility of controlling the polarization direction and magnitude by engineering specific interface and surface chemistries.


Information ◽  
2019 ◽  
Vol 10 (11) ◽  
pp. 332 ◽  
Author(s):  
Kenneth Thibodeau

This paper presents Constructed Past Theory, an epistemological theory about how we come to know things that happened or existed in the past. The theory is expounded both in text and in a formal model comprising UML class diagrams. The ideas presented here have been developed in a half century of experience as a practitioner in the management of information and automated systems in the US government and as a researcher in several collaborations, notably the four international and multidisciplinary InterPARES projects. This work is part of a broader initiative, providing a conceptual framework for reformulating the concepts and theories of archival science in order to enable a new discipline whose assertions are empirically and, wherever possible, quantitatively testable. The new discipline, called archival engineering, is intended to provide an appropriate, coherent foundation for the development of systems and applications for managing, preserving and providing access to digital information, development which is necessitated by the exponential growth and explosive diversification of data recorded in digital form and the use of digital data in an ever increasing variety of domains. Both the text and model are an initial exposition of the theory that both requires and invites further development.


2013 ◽  
Vol 756-759 ◽  
pp. 1210-1214
Author(s):  
Chuan Jin Wang

Cloud storage is a new type of concept of digital information storage. In view of its advantages in distributed storage, collaborative storage, dynamic allocation storage space, smooth migration and backup, on the basis of the introduction of related concepts and unique advantages of cloud storage, the present essay introduced cloud storage into the data bank of plane design materials, and constructed a scheme of cloud storage of the data bank of the plane design materials from the three aspects of system structure, topological structure and function module.


Sign in / Sign up

Export Citation Format

Share Document