Damming the genomic data flood using a comprehensive analysis and storage data structure

Data outsourcing has gradually become a mainstream solution, but once data is outsourced, data owners will without the control of the data hardware, there is a possibility that the integrity of the data will be destroyed objectively. Many current studies have achieved low network overhead cloud data set verification by designing algorithmic structures (e.g., hashing, Merkel verification trees); however, cloud service providers may not recognize the incompleteness of cloud data to avoid liability or business factors fact. There is a need to build a secure, reliable, non-tamperable, and non-forgeable verification system for accountability. Blockchain is a chain-like data structure constructed by using data signatures, timestamps, hash functions, and proof-of-work mechanisms. Using blockchain technology to build an integrity verification system can achieve fault accountability. Blockchain is a chain-like data structure constructed by using data signatures, timestamps, hash functions, and proof-of-work mechanisms. Using blockchain technology to build an integrity verification system can achieve fault accountability. This paper uses the Hadoop framework to implement data collection and storage of the HBase system based on big data architecture. In summary, based on the research of blockchain cloud data collection and storage technology, based on the existing big data storage middleware, a large flow, high concurrency and high availability data collection and processing system has been realized.

Download Full-text

Augmented Interval List: a novel data structure for efficient genomic interval search

Bioinformatics ◽

10.1093/bioinformatics/btz407 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4907-4911 ◽

Cited By ~ 8

Author(s):

Jianglin Feng ◽

Aakrosh Ratan ◽

Nathan C Sheffield

Keyword(s):

Data Structure ◽

High Performance ◽

Genomic Analysis ◽

Genomic Data ◽

Interval Data ◽

Supplementary Information ◽

Genomic Interval ◽

Interval Trees ◽

Running Maximum ◽

Scalable Methods

Abstract Motivation Genomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary. Results We present a new data structure, the Augmented Interval List (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N+n+m), where n is the number of overlaps between R and q, N is the number of intervals in the set R and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5–18 times faster than standard high-performance code based on augmented interval-trees, nested containment lists or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4–60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis. Availability and implementation An implementation of the AIList data structure with both construction and search algorithms is available at http://ailist.databio.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Comprehensive analysis of oxidation and storage stability of argemone biodiesel and development of correlations based on experimental results

Energy Sources Part A Recovery Utilization and Environmental Effects ◽

10.1080/15567036.2020.1773968 ◽

2020 ◽

pp. 1-14

Author(s):

Mandeep Singh ◽

Amit Sarin ◽

Sarbjot Singh Sandhu

Keyword(s):

Storage Stability ◽

Comprehensive Analysis ◽

Experimental Results ◽

And Storage

Download Full-text

Triple Threat: OnRamp Bioinformatics, Cloudian, ScaleMatrix Combine Services for Genomic Data Analysis and Storage Offering

Clinical OMICs ◽

10.1089/clinomi.04.04.20 ◽

2017 ◽

Vol 4 (4) ◽

pp. 30-31

Author(s):

Chris Anderson

Keyword(s):

Data Analysis ◽

Genomic Data ◽

Genomic Data Analysis ◽

And Storage ◽

Triple Threat

Download Full-text

Robustness Comparison Study on Watermarking Techniques against Compression Attack

International Journal of Innovative Computing ◽

10.11113/ijic.v10n1.251 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Mohd Aliff Faiz Jeffry ◽

Hazinah Kutty Mammi

Keyword(s):

Social Media ◽

Digital Watermarking ◽

Digital Image ◽

Comprehensive Analysis ◽

Comparison Study ◽

Compression Method ◽

Designed Experiments ◽

Network Bandwidth ◽

And Storage ◽

Stated Problem

Digital watermarking technique is a way of protecting digital image from malicious attacks. Compression attack is one of the most common attacks for images uploaded into social media. Social media, such as Facebook and Twitter, implement compression method for all types of media, before it is successfully uploaded into their server. This is to reduce the network bandwidth and storage needed to store each media in their server. However, the implemented compression method tends to tarnish image properties from the image itself, which can be used to identify the image itself. This produces other problems, which are ownership and copyright issues. Digital watermark has been proposed in numerous researches, and this research is one of them, in preventing the stated problem. The chosen digital watermarking techniques must be able to withstand against compression attack done by social media. A comprehensive analysis towards the watermarking algorithms and watermarked images were done, by applying several designed experiments. Based on the results, it shows that both chosen watermarking techniques could not withstands against compression attack made by JPEG compression and social media compression. It indicates that watermarking technique was not a suitable method to be used in preserving the ownership and copyright of the image throughout social media.

Download Full-text

Data structure and storage allocation

BIT Numerical Mathematics ◽

10.1007/bf01946819 ◽

1969 ◽

Vol 9 (3) ◽

pp. 270-282 ◽

Cited By ~ 2

Author(s):

P. L. Wodon

Keyword(s):

Data Structure ◽

Storage Allocation ◽

And Storage

Download Full-text

Phytohormones in red seaweeds: a technical review of methods for analysis and a consideration of genomic data

Botanica Marina ◽

10.1515/bot-2016-0056 ◽

2017 ◽

Vol 60 (2) ◽

Cited By ~ 8

Author(s):

Izumi C. Mori ◽

Yoko Ikeda ◽

Takakazu Matsuura ◽

Takashi Hirayama ◽

Koji Mikami

Keyword(s):

Signal Transduction ◽

Acetic Acid ◽

Genomic Data ◽

Comprehensive Analysis ◽

Biosynthetic Pathways ◽

Red Seaweeds ◽

Technical Review ◽

Cellular Factors ◽

Transduction Mechanisms ◽

Indole 3 Acetic Acid

AbstractEmerging studies suggest that seaweeds contain phytohormones; however, their chemical entities, biosynthetic pathways, signal transduction mechanisms, and physiological roles are poorly understood. Until recently, it was difficult to conduct comprehensive analysis of phytohormones in seaweeds because of the interfering effects of cellular constituents on fine quantification. In this review, we discuss the details of the latest method allowing simultaneous profiling of multiple phytohormones in red seaweeds, while avoiding the effects of cellular factors. Recent studies have confirmed the presence of indole-3-acetic acid (IAA),

Download Full-text

WBFQC: A new approach for compressing next-generation sequencing data splitting into homogeneous streams

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001850018x ◽

2018 ◽

Vol 16 (05) ◽

pp. 1850018 ◽

Cited By ~ 1

Author(s):

Sanjeev Kumar ◽

Suneeta Agarwal ◽

Ranvijay

Keyword(s):

Next Generation Sequencing ◽

Genomic Data ◽

Next Generation Sequencing Data ◽

Next Generation ◽

Sequencing Data ◽

Compression Technique ◽

Compression Algorithms ◽

Ngs Data ◽

And Storage ◽

Generation Sequencing

Genomic data nowadays is playing a vital role in number of fields such as personalized medicine, forensic, drug discovery, sequence alignment and agriculture, etc. With the advancements and reduction in the cost of next-generation sequencing (NGS) technology, these data are growing exponentially. NGS data are being generated more rapidly than they could be significantly analyzed. Thus, there is much scope for developing novel data compression algorithms to facilitate data analysis along with data transfer and storage directly. An innovative compression technique is proposed here to address the problem of transmission and storage of large NGS data. This paper presents a lossless non-reference-based FastQ file compression approach, segregating the data into three different streams and then applying appropriate and efficient compression algorithms on each. Experiments show that the proposed approach (WBFQC) outperforms other state-of-the-art approaches for compressing NGS data in terms of compression ratio (CR), and compression and decompression time. It also has random access capability over compressed genomic data. An open source FastQ compression tool is also provided here ( http://www.algorithm-skg.com/wbfqc/home.html ).

Download Full-text