scholarly journals Optimized Data Replication for Small Files in Cloud Storage Systems

2016 ◽  
Vol 2016 ◽  
pp. 1-8 ◽  
Author(s):  
Xiong Fu ◽  
Wenjie Liu ◽  
Yeliang Cang ◽  
Xiaojie Gong ◽  
Song Deng

Cloud storage has become an important part of a cloud system nowadays. Most current cloud storage systems perform well for large files but they cannot manage small file storage appropriately. With the development of cloud services, more and more small files are emerging. Therefore, we propose an optimized data replication approach for small files in cloud storage systems. A small file merging algorithm and a block replica placement algorithm are involved in this approach. Small files are classified into four types according to their access frequencies. A number of small files will be merged into the same block based on which type they belong to. And the replica placement algorithm helps to improve the access efficiencies of small files in a cloud system. Related experiment results demonstrate that our proposed approach can effectively shorten the time spent reading and writing small files, and it performs better than the other two already known data replication algorithms: HAR and SequenceFile.

2020 ◽  
Vol 245 ◽  
pp. 04038 ◽  
Author(s):  
Luca Mascetti ◽  
Maria Arsuaga Rios ◽  
Enrico Bocchi ◽  
Joao Calado Vicente ◽  
Belinda Chan Kwok Cheong ◽  
...  

The CERN IT Storage group operates multiple distributed storage systems to support all CERN data storage requirements: the physics data generated by LHC and non-LHC experiments; object and file storage for infrastructure services; block storage for the CERN cloud system; filesystems for general use and specialized HPC clusters; content distribution filesystem for software distribution and condition databases; and sync&share cloud storage for end-user files. The total integrated capacity of these systems exceeds 0.6 Exabyte. Large-scale experiment data taking has been supported by EOS and CASTOR for the last 10+ years. Particular highlights for 2018 include the special HeavyIon run which was the last part of the LHC Run2 Programme: the IT storage systems sustained over 10GB/s to flawlessly collect and archive more than 13 PB of data in a single month. While the tape archival continues to be handled by CASTOR, the effort to migrate the current experiment workflows to the new CERN Tape Archive system (CTA) is underway. Ceph infrastructure has operated for more than 5 years to provide block storage to CERN IT private OpenStack cloud, a shared filesystem (CephFS) to HPC clusters and NFS storage to replace commercial Filers. S3 service was introduced in 2018, following increased user requirements for S3-compatible object storage from physics experiments and IT use-cases. Since its introduction in 2014N, CERNBox has become a ubiquitous cloud storage interface for all CERN user groups: physicists, engineers and administration. CERNBox provides easy access to multi-petabyte data stores from a multitude of mobile and desktop devices and all mainstream, modern operating systems (Linux, Windows, macOS, Android, iOS). CERNBox provides synchronized storage for end-user’s devices as well as easy sharing for individual users and e-groups. CERNBox has also become a storage platform to host online applications to process the data such as SWAN (Service for Web-based Analysis) as well as file editors such as Collabora Online, Only Office, Draw.IO and more. An increasing number of online applications in the Windows infrastructure uses CIFS/SMB access to CERNBox files. CVMFS provides software repositories for all experiments across the WLCG infrastructure and has recently been optimized to efficiently handle nightlybuilds. While AFS continues to provide general-purpose filesystem for internal CERN users, especially as $HOME login area on central computing infrastructure, the migration of project and web spaces has significantly advanced. In this paper, we report on the experiences from the last year of LHC RUN2 data taking and evolution of our services in the past year.. We will highlight upcoming changes and future improvements and challenges.


2021 ◽  
Vol 11 (2) ◽  
pp. 193-199
Author(s):  
Anuj Kumar Yadav ◽  
Ritika ◽  
Madan Garg

Cloud computing has emerged as a potential substitute over traditional computing systems during the time of the COVID-19 pandemic. Almost all organizations shift their working from conventional ways to the online form of working. Most of the organizations are planning to permanently change some % of their work to online WFH (Work from Home) mode. There are numerous benefits of using cloud services in terms of cost, portability, platform independence, accessibility, elasticity, etc. But security is the biggest barrier when one wants to move towards cloud computing services, especially the cloud storage service. To overcome the problem of security in cloud storage systems, we have presented an approach for data security in cloud storage. The proposed approach uses the cryptographic methods and provides security and monitoring features to the user data stored in cloud storage systems. The proposed approach continuously monitors user’s data for any kind of modification by attackers. Thus, approach not only provides data security but also improves user’s trust on cloud based storage services.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
M. Maghsoudloo ◽  
A. Rahdari ◽  
N. Khoshavi

Author(s):  
Yuan Zhang ◽  
Chunxiang Xu ◽  
Nan Cheng ◽  
Xuemin Sherman Shen

Electronics ◽  
2021 ◽  
Vol 10 (4) ◽  
pp. 423
Author(s):  
Márk Szalay ◽  
Péter Mátray ◽  
László Toka

The stateless cloud-native design improves the elasticity and reliability of applications running in the cloud. The design decouples the life-cycle of application states from that of application instances; states are written to and read from cloud databases, and deployed close to the application code to ensure low latency bounds on state access. However, the scalability of applications brings the well-known limitations of distributed databases, in which the states are stored. In this paper, we propose a full-fledged state layer that supports the stateless cloud application design. In order to minimize the inter-host communication due to state externalization, we propose, on the one hand, a system design jointly with a data placement algorithm that places functions’ states across the hosts of a data center. On the other hand, we design a dynamic replication module that decides the proper number of copies for each state to ensure a sweet spot in short state-access time and low network traffic. We evaluate the proposed methods across realistic scenarios. We show that our solution yields state-access delays close to the optimal, and ensures fast replica placement decisions in large-scale settings.


Sign in / Sign up

Export Citation Format

Share Document