Optimized Data Replication for Small Files in Cloud Storage Systems

Cloud storage has become an important part of a cloud system nowadays. Most current cloud storage systems perform well for large files but they cannot manage small file storage appropriately. With the development of cloud services, more and more small files are emerging. Therefore, we propose an optimized data replication approach for small files in cloud storage systems. A small file merging algorithm and a block replica placement algorithm are involved in this approach. Small files are classified into four types according to their access frequencies. A number of small files will be merged into the same block based on which type they belong to. And the replica placement algorithm helps to improve the access efficiencies of small files in a cloud system. Related experiment results demonstrate that our proposed approach can effectively shorten the time spent reading and writing small files, and it performs better than the other two already known data replication algorithms: HAR and SequenceFile.

Download Full-text

Replica Placement Algorithm for Highly Available Peer-to-Peer Storage Systems

2009 First International Conference on Advances in P2P Systems ◽

10.1109/ap2ps.2009.33 ◽

2009 ◽

Cited By ~ 11

Author(s):

Gyuwon Song ◽

Suhyun Kim ◽

Daeil Seo

Keyword(s):

Storage Systems ◽

Peer To Peer ◽

Replica Placement ◽

Placement Algorithm

Download Full-text

CERN Disk Storage Services: Report from last data taking, evolution and future outlook towards Exabyte-scale storage

EPJ Web of Conferences ◽

10.1051/epjconf/202024504038 ◽

2020 ◽

Vol 245 ◽

pp. 04038 ◽

Cited By ~ 1

Author(s):

Luca Mascetti ◽

Maria Arsuaga Rios ◽

Enrico Bocchi ◽

Joao Calado Vicente ◽

Belinda Chan Kwok Cheong ◽

...

Keyword(s):

Data Storage ◽

Cloud Storage ◽

Large Scale ◽

Storage Systems ◽

Distributed Storage ◽

General Purpose ◽

Easy Access ◽

It Use ◽

File Storage ◽

Block Storage

The CERN IT Storage group operates multiple distributed storage systems to support all CERN data storage requirements: the physics data generated by LHC and non-LHC experiments; object and file storage for infrastructure services; block storage for the CERN cloud system; filesystems for general use and specialized HPC clusters; content distribution filesystem for software distribution and condition databases; and sync&share cloud storage for end-user files. The total integrated capacity of these systems exceeds 0.6 Exabyte. Large-scale experiment data taking has been supported by EOS and CASTOR for the last 10+ years. Particular highlights for 2018 include the special HeavyIon run which was the last part of the LHC Run2 Programme: the IT storage systems sustained over 10GB/s to flawlessly collect and archive more than 13 PB of data in a single month. While the tape archival continues to be handled by CASTOR, the effort to migrate the current experiment workflows to the new CERN Tape Archive system (CTA) is underway. Ceph infrastructure has operated for more than 5 years to provide block storage to CERN IT private OpenStack cloud, a shared filesystem (CephFS) to HPC clusters and NFS storage to replace commercial Filers. S3 service was introduced in 2018, following increased user requirements for S3-compatible object storage from physics experiments and IT use-cases. Since its introduction in 2014N, CERNBox has become a ubiquitous cloud storage interface for all CERN user groups: physicists, engineers and administration. CERNBox provides easy access to multi-petabyte data stores from a multitude of mobile and desktop devices and all mainstream, modern operating systems (Linux, Windows, macOS, Android, iOS). CERNBox provides synchronized storage for end-user’s devices as well as easy sharing for individual users and e-groups. CERNBox has also become a storage platform to host online applications to process the data such as SWAN (Service for Web-based Analysis) as well as file editors such as Collabora Online, Only Office, Draw.IO and more. An increasing number of online applications in the Windows infrastructure uses CIFS/SMB access to CERNBox files. CVMFS provides software repositories for all experiments across the WLCG infrastructure and has recently been optimized to efficiently handle nightlybuilds. While AFS continues to provide general-purpose filesystem for internal CERN users, especially as $HOME login area on central computing infrastructure, the migration of project and web spaces has significantly advanced. In this paper, we report on the experiences from the last year of LHC RUN2 data taking and evolution of our services in the past year.. We will highlight upcoming changes and future improvements and challenges.

Download Full-text

Cryptographic Solution for Security Problem in Cloud Computing Storage During Global Pandemics

International Journal of Safety and Security Engineering ◽

10.18280/ijsse.110208 ◽

2021 ◽

Vol 11 (2) ◽

pp. 193-199

Author(s):

Anuj Kumar Yadav ◽

Ritika ◽

Madan Garg

Keyword(s):

Cloud Computing ◽

Data Security ◽

Cloud Storage ◽

Storage Systems ◽

Cloud Services ◽

Work From Home ◽

Computing Services ◽

Cloud Storage Service ◽

User Data ◽

Almost All

Cloud computing has emerged as a potential substitute over traditional computing systems during the time of the COVID-19 pandemic. Almost all organizations shift their working from conventional ways to the online form of working. Most of the organizations are planning to permanently change some % of their work to online WFH (Work from Home) mode. There are numerous benefits of using cloud services in terms of cost, portability, platform independence, accessibility, elasticity, etc. But security is the biggest barrier when one wants to move towards cloud computing services, especially the cloud storage service. To overcome the problem of security in cloud storage systems, we have presented an approach for data security in cloud storage. The proposed approach uses the cryptographic methods and provides security and monitoring features to the user data stored in cloud storage systems. The proposed approach continuously monitors user’s data for any kind of modification by attackers. Thus, approach not only provides data security but also improves user’s trust on cloud based storage services.

Download Full-text

Reliability of Clustered vs. Declustered Replica Placement in Data Storage Systems

2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems ◽

10.1109/mascots.2011.53 ◽

2011 ◽

Cited By ~ 8

Author(s):

Vinodh Venkatesan ◽

Ilias Iliadis ◽

Christina Fragouli ◽

Rudiger Urbanke

Keyword(s):

Data Storage ◽

Storage Systems ◽

Replica Placement

Download Full-text

SLA-Aware Multi-Criteria Data Placement in Cloud Storage Systems

IEEE Access ◽

10.1109/access.2021.3071325 ◽

2021 ◽

pp. 1-1

Author(s):

M. Maghsoudloo ◽

A. Rahdari ◽

N. Khoshavi

Keyword(s):

Cloud Storage ◽

Storage Systems ◽

Data Placement

Download Full-text

Secure Password-Protected Encryption Key for Deduplicated Cloud Storage Systems

IEEE Transactions on Dependable and Secure Computing ◽

10.1109/tdsc.2021.3074146 ◽

2021 ◽

pp. 1-1

Author(s):

Yuan Zhang ◽

Chunxiang Xu ◽

Nan Cheng ◽

Xuemin Sherman Shen

Keyword(s):

Cloud Storage ◽

Storage Systems

Download Full-text

State Management for Cloud-Native Applications

Electronics ◽

10.3390/electronics10040423 ◽

2021 ◽

Vol 10 (4) ◽

pp. 423

Author(s):

Márk Szalay ◽

Péter Mátray ◽

László Toka

Keyword(s):

Large Scale ◽

Distributed Databases ◽

Access Time ◽

Replica Placement ◽

State Management ◽

Placement Decisions ◽

Dynamic Replication ◽

Cloud Databases ◽

Placement Algorithm ◽

The One

The stateless cloud-native design improves the elasticity and reliability of applications running in the cloud. The design decouples the life-cycle of application states from that of application instances; states are written to and read from cloud databases, and deployed close to the application code to ensure low latency bounds on state access. However, the scalability of applications brings the well-known limitations of distributed databases, in which the states are stored. In this paper, we propose a full-fledged state layer that supports the stateless cloud application design. In order to minimize the inter-host communication due to state externalization, we propose, on the one hand, a system design jointly with a data placement algorithm that places functions’ states across the hosts of a data center. On the other hand, we design a dynamic replication module that decides the proper number of copies for each state to ensure a sweet spot in short state-access time and low network traffic. We evaluate the proposed methods across realistic scenarios. We show that our solution yields state-access delays close to the optimal, and ensures fast replica placement decisions in large-scale settings.

Download Full-text