scholarly journals Data Deduplication System Based on Content-Defined Chunking Using Bytes Pair Frequency Occurrence

Symmetry ◽  
2020 ◽  
Vol 12 (11) ◽  
pp. 1841
Author(s):  
Ahmed Sardar M. Saeed ◽  
Loay E. George

Every second, millions of data are being generated due to the use of emerging technologies. It is very challenging to store and handle such a large amount of data. Data deduplication is a solution for this problem. It is a new technique that eliminates duplicate data and stores only a single copy of data, reducing storage utilization and the cost of maintaining redundant data. Content-defined chunking (CDC) has been playing an important role in data deduplication systems due to its ability to detect high redundancy. In this paper, we focused on deduplication system optimization by tuning relevant factors in CDC to identify chunk cut-points and introduce an efficient fingerprint using a new hash function. We proposed a novel bytes frequency-based chunking (BFBC) algorithm and a new low-cost hashing function. To evaluate the efficiency of the proposed system, extensive experiments were done using two different datasets. In all experiments, the proposed system persistently outperformed the common CDC algorithms, achieving a better storage gain ratio and enhancing both chunking and hashing throughput. Practically, our experiments show that BFBC is 10 times faster than basic sliding window (BSW) and approximately three times faster than two thresholds two divisors (TTTD). The proposed triple hash function algorithm is five times faster than SHA1 and MD5 and achieves a better deduplication elimination ratio (DER) than other CDC algorithms. The symmetry of our work is based on the balance between the proposed system performance parameters and its reflection on the system efficiency compared to other deduplication systems.

Author(s):  
Vishal Passricha ◽  
Ashish Chopra ◽  
Shubhanshi Singhal

Cloud storage (CS) is gaining much popularity nowadays because it offers low-cost and convenient network storage services. In this big data era, the explosive growth in digital data moves the users towards CS but this causes a lot of storage pressure on CS systems because a large volume of this data is redundant. Data deduplication is an effective data reduction technique. The dynamic nature of data makes security and ownership of data as a very important issue. Proof-of-ownership schemes are a robust way to check the ownership claimed by any owner. However, this method affects the deduplication process because encryption methods have varying characteristics. A convergent encryption (CE) scheme is widely used for secure data deduplication. The problem with the CE-based scheme is that the user can decrypt the cloud data while he has lost his ownership. This article addresses the problem of ownership revocation by proposing a secure deduplication scheme for encrypted data. The proposed scheme enhances the security against unauthorized encryption and poison attack on the predicted set of data.


Author(s):  
Sumit Kumar Mahana ◽  
Rajesh Kumar Aggarwal

In the present digital scenario, data is of prime significance for individuals and moreover for organizations. With the passage of time, data content being produced increases exponentially, which poses a serious concern as the huge amount of redundant data contents stored on the cloud employs a severe load on the cloud storage systems itself which cannot be accepted. Therefore, a storage optimization strategy is a fundamental prerequisite to cloud storage systems. Data deduplication is a storage optimization strategy that is used for deleting identical copies of redundant data, optimizing bandwidth, improves utilization of storage space, and hence, minimizes storage cost. To guarantee the security parameter, the data which is stored on the cloud must be in an encrypted form to ensure the security of the stored data. Consequently, executing deduplication safely over the encrypted information in the cloud seems to be a challenging job. This chapter discusses various existing data deduplication techniques with a notion of securing the data on the cloud that addresses this challenge.


Author(s):  
Sumit Kumar Mahana ◽  
Rajesh Kumar Aggarwal

In the present digital scenario, data is of prime significance for individuals and moreover for organizations. With the passage of time, data content being produced increases exponentially, which poses a serious concern as the huge amount of redundant data contents stored on the cloud employs a severe load on the cloud storage systems itself which cannot be accepted. Therefore, a storage optimization strategy is a fundamental prerequisite to cloud storage systems. Data deduplication is a storage optimization strategy that is used for deleting identical copies of redundant data, optimizing bandwidth, improves utilization of storage space, and hence, minimizes storage cost. To guarantee the security parameter, the data which is stored on the cloud must be in an encrypted form to ensure the security of the stored data. Consequently, executing deduplication safely over the encrypted information in the cloud seems to be a challenging job. This chapter discusses various existing data deduplication techniques with a notion of securing the data on the cloud that addresses this challenge.


Author(s):  
Vishal Passricha ◽  
Ashish Chopra ◽  
Pooja Sharma ◽  
Shubhanshi Singhal

Cloud storage (CS) is gaining much popularity nowadays because it offers low-cost and convenient network storage services. In this big data era, the explosive growth in digital data moves the users towards CS to store their massive data. This explosive growth of data causes a lot of storage pressure on CS systems because a large volume of this data is redundant. Data deduplication is a most-effective data reduction technique that identifies and eliminates the redundant data. Dynamic nature of data makes security and ownership of data as a very important issue. Proof-of-ownership schemes are a robust way to check the ownership claimed by any owner. However to protect the privacy of data, many users encrypt it before storing in CS. This method affects the deduplication process because encryption methods have varying characteristics. Convergent encryption (CE) scheme is widely used for secure data deduplication, but it destroys the message equality. Although, DupLESS provides strong privacy by enhancing CE, but it is also found insufficient. The problem with the CE-based scheme is that the user can decrypt the cloud data while he has lost his ownership. This paper addresses the problem of ownership revocation by proposing a secure deduplication scheme for encrypted data. The proposed scheme enhances the security against unauthorized encryption and poison attack on the predicted set of data.


2003 ◽  
Vol 783 ◽  
Author(s):  
Charles E Free

This paper discusses the techniques that are available for characterising circuit materials at microwave and millimetre wave frequencies. In particular, the paper focuses on a new technique for measuring the loss tangent of substrates at mm-wave frequencies using a circular resonant cavity. The benefits of the new technique are that it is simple, low cost, capable of good accuracy and has the potential to work at high mm-wave frequencies.


Author(s):  
P. Sudheer ◽  
T. Lakshmi Surekha

Cloud computing is a revolutionary computing paradigm, which enables flexible, on-demand, and low-cost usage of computing resources, but the data is outsourced to some cloud servers, and various privacy concerns emerge from it. Various schemes based on the attribute-based encryption have been to secure the cloud storage. Data content privacy. A semi anonymous privilege control scheme AnonyControl to address not only the data privacy. But also the user identity privacy. AnonyControl decentralizes the central authority to limit the identity leakage and thus achieves semi anonymity. The  Anonymity –F which fully prevent the identity leakage and achieve the full anonymity.


2021 ◽  
Vol 11 (2) ◽  
Author(s):  
James G Baldwin-Brown ◽  
Scott M Villa ◽  
Anna I Vickrey ◽  
Kevin P Johnson ◽  
Sarah E Bush ◽  
...  

Abstract The pigeon louse Columbicola columbae is a longstanding and important model for studies of ectoparasitism and host-parasite coevolution. However, a deeper understanding of its evolution and capacity for rapid adaptation is limited by a lack of genomic resources. Here, we present a high-quality draft assembly of the C. columbae genome, produced using a combination of Oxford Nanopore, Illumina, and Hi-C technologies. The final assembly is 208 Mb in length, with 12 chromosome-size scaffolds representing 98.1% of the assembly. For gene model prediction, we used a novel clustering method (wavy_choose) for Oxford Nanopore RNA-seq reads to feed into the MAKER annotation pipeline. High recovery of conserved single-copy orthologs (BUSCOs) suggests that our assembly and annotation are both highly complete and highly accurate. Consistent with the results of the only other assembled louse genome, Pediculus humanus, we find that C. columbae has a relatively low density of repetitive elements, the majority of which are DNA transposons. Also similar to P. humanus, we find a reduced number of genes encoding opsins, G protein-coupled receptors, odorant receptors, insulin signaling pathway components, and detoxification proteins in the C. columbae genome, relative to other insects. We propose that such losses might characterize the genomes of obligate, permanent ectoparasites with predictable habitats, limited foraging complexity, and simple dietary regimes. The sequencing and analysis for this genome were relatively low cost, and took advantage of a new clustering technique for Oxford Nanopore RNAseq reads that will be useful to future genome projects.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Jinhua Fu ◽  
Sihai Qiao ◽  
Yongzhong Huang ◽  
Xueming Si ◽  
Bin Li ◽  
...  

Blockchain is widely used in encrypted currency, Internet of Things (IoT), supply chain finance, data sharing, and other fields. However, there are security problems in blockchains to varying degrees. As an important component of blockchain, hash function has relatively low computational efficiency. Therefore, this paper proposes a new scheme to optimize the blockchain hashing algorithm based on PRCA (Proactive Reconfigurable Computing Architecture). In order to improve the calculation performance of hashing function, the paper realizes the pipeline hashing algorithm and optimizes the efficiency of communication facilities and network data transmission by combining blockchains with mimic computers. Meanwhile, to ensure the security of data information, this paper chooses lightweight hashing algorithm to do multiple hashing and transforms the hash algorithm structure as well. The experimental results show that the scheme given in the paper not only improves the security of blockchains but also improves the efficiency of data processing.


2021 ◽  
Vol 17 (4) ◽  
pp. 1-38
Author(s):  
Takayuki Fukatani ◽  
Hieu Hanh Le ◽  
Haruo Yokota

With the recent performance improvements in commodity hardware, low-cost commodity server-based storage has become a practical alternative to dedicated-storage appliances. Because of the high failure rate of commodity servers, data redundancy across multiple servers is required in a server-based storage system. However, the extra storage capacity for this redundancy significantly increases the system cost. Although erasure coding (EC) is a promising method to reduce the amount of redundant data, it requires distributing and encoding data among servers. There remains a need to reduce the performance impact of these processes involving much network traffic and processing overhead. Especially, the performance impact becomes significant for random-intensive applications. In this article, we propose a new lightweight redundancy control for server-based storage. Our proposed method uses a new local filesystem-based approach that avoids distributing data by adding data redundancy to locally stored user data. Our method switches the redundancy method of user data between replication and EC according to workloads to improve capacity efficiency while achieving higher performance. Our experiments show up to 230% better online-transaction-processing performance for our method compared with CephFS, a widely used alternative system. We also confirmed that our proposed method prevents unexpected performance degradation while achieving better capacity efficiency.


2018 ◽  
Vol 7 (4.5) ◽  
pp. 654
Author(s):  
M. S. Satyanarayana ◽  
Aruna T.M ◽  
Divyaraj G.N

Accidents have become major issue in Developing countries like India now a day. As per the Surveys 60% of the accidents are happening due to over speed. Though the government has taken so many initiatives like Traffic Awareness & Driving Awareness Week etc.., but still the percentage of accidents are not getting reduced. In this paper a new technique has been introduced to reduce the percentage of accidents. The new technique is implemented using the concept of Machine Learning [1]. The Machine Learning based systems can be implemented in all vehicles to avoid the accidents at low cost [1]. The main objective of this system is to calculate the speed of the vehicle at three various locations based on the place where the vehicle speed must be controlled and if the speed is greater than the designated speed in that road then the vehicle automatically detects the problem and same will be intimated to the driver to control the speed of the vehicle. If the speed is less or equal to the designated speed in that road then the vehicle will be passed without any disturbance. The system will be giving beep sound along with color indication to driver in each and every scenario. The other option implemented in this system is if the driver is driving the vehicle in the night and if he feel drowsy the system detects it immediately and alarm sound will be initiated to wake up the driver. This system though it won’t avoid 100% accidents at least it will reduce the percentage of accidents. This system is not only to avoid accidents it will also intelligently control the speed of the vehicles and creates awareness amongst the drivers.  


Sign in / Sign up

Export Citation Format

Share Document