scholarly journals Testing of complex, large-scale distributed storage systems: a CERN disk storage case study

2019 ◽  
Vol 214 ◽  
pp. 05008 ◽  
Author(s):  
Jozsef Makai ◽  
Andreas Joachim Peters ◽  
Georgios Bitzes ◽  
Elvin Alin Sindrilaru ◽  
Michal Kamil Simon ◽  
...  

Complex, large-scale distributed systems are frequently used to solve extraordinary computing, storage and other problems. However, the development of these systems usually requires working with several software components, maintaining and improving a large codebase and also providing a collaborative environment for many developers working together. The central role that such complex systems play in mission critical tasks and also in the daily activity of the users means that any software bug affecting the availability of the service has far reaching effects. Providing an easily extensible testing framework is a pre-requisite for building both confidence in the system but also among developers who contribute to the code. The testing framework can address concrete bugs found in the odebase thus avoiding any future regressions and also provides a high degree of confidence for the people contributing new code. Easily incorporating other people's work into the project greatly helps scaling out manpower so that having more developers contributing to the project can actually result in more work being done rather then more bugs added. In this paper we go through the case study of EOS, the CERN disk storage system and introduce the methods and mechanisms of how to achieve all-automatic regression and robustness testing along with continuous integration for such a large-scale, complex and critical system using a container-based environment.

2019 ◽  
Vol 214 ◽  
pp. 04033
Author(s):  
Hervé Rousseau ◽  
Belinda Chan Kwok Cheong ◽  
Cristian Contescu ◽  
Xavier Espinal Curull ◽  
Jan Iven ◽  
...  

The CERN IT Storage group operates multiple distributed storage systems and is responsible for the support of the infrastructure to accommodate all CERN storage requirements, from the physics data generated by LHC and non-LHC experiments to the personnel users' files. EOS is now the key component of the CERN Storage strategy. It allows to operate at high incoming throughput for experiment data-taking while running concurrent complex production work-loads. This high-performance distributed storage provides now more than 250PB of raw disks and it is the key component behind the success of CERNBox, the CERN cloud synchronisation service which allows syncing and sharing files on all major mobile and desktop platforms to provide offline availability to any data stored in the EOS infrastructure. CERNBox recorded an exponential growth in the last couple of year in terms of files and data stored thanks to its increasing popularity inside CERN users community and thanks to its integration with a multitude of other CERN services (Batch, SWAN, Microsoft Office). In parallel CASTOR is being simplified and transitioning from an HSM into an archival system, focusing mainly in the long-term data recording of the primary data from the detectors, preparing the road to the next-generation tape archival system, CTA. The storage services at CERN cover as well the needs of the rest of our community: Ceph as data back-end for the CERN OpenStack infrastructure, NFS services and S3 functionality; AFS for legacy home directory filesystem services and its ongoing phase-out and CVMFS for software distribution. In this paper we will summarise our experience in supporting all our distributed storage system and the ongoing work in evolving our infrastructure, testing very-dense storage building block (nodes with more than 1PB of raw space) for the challenges waiting ahead.


2011 ◽  
Vol 51 (2) ◽  
pp. 707
Author(s):  
Peter Goode

There is an estimated $200 billion worth of capital expenditure presently planned for Australian gas projects. These projects provide the potential for $20 billion worth of engineering and maintenance opportunities for Australian companies and an estimated 16,000 ongoing positions in the sector. The scale of these projects has drawn international attention and is increasingly drawing global competition. Australian companies are at risk of the misperception that they don’t have the international know-how or the people to compete for these large-scale projects. We need to ensure that our Australian ingenuity and scale continue to position us as the service provider of choice for construction, project management and maintenance opportunities. Working together with industry, we have shown that we have what it takes to compete on a global scale. We also need to work with government and unions to ensure we have scalable highly-skilled people available to support these projects. This presentation will consider the following case study: Transfield Services delivers services to companies including Woodside Energy, which operates the A$27 billion North West Shelf project, one of the world’s largest LNG production facilities with an output of 16.4 million tonnes of LNG a year. While expansion continues, ongoing brownfield project and maintenance services demand the ongoing support of a highly-skilled workforce of up to 1,000 people. This case study explores: innovative service solutions in a resource-scarce environment through access to global resources innovative scheduling of work; and, the challenges of sourcing and retaining highly-skilled people by improving the opportunities for global and domestic employee mobility and investing in training and developing local people.


2013 ◽  
Vol 5 (1) ◽  
pp. 53-69
Author(s):  
Jacques Jorda ◽  
Aurélien Ortiz ◽  
Abdelaziz M’zoughi ◽  
Salam Traboulsi

Grid computing is commonly used for large scale application requiring huge computation capabilities. In such distributed architectures, the data storage on the distributed storage resources must be handled by a dedicated storage system to ensure the required quality of service. In order to simplify the data placement on nodes and to increase the performance of applications, a storage virtualization layer can be used. This layer can be a single parallel filesystem (like GPFS) or a more complex middleware. The latter is preferred as it allows the data placement on the nodes to be tuned to increase both the reliability and the performance of data access. Thus, in such a middleware, a dedicated monitoring system must be used to ensure optimal performance. In this paper, the authors briefly introduce the Visage middleware – a middleware for storage virtualization. They present the most broadly used grid monitoring systems, and explain why they are not adequate for virtualized storage monitoring. The authors then present the architecture of their monitoring system dedicated to storage virtualization. We introduce the workload prediction model used to define the best node for data placement, and show on a simple experiment its accuracy.


Author(s):  
Yaxing Wei ◽  
Liping Di ◽  
Guangxuan Liao ◽  
Baohua Zhao ◽  
Aijun Chen ◽  
...  

With the rapid accumulation of geospatial data and the advancement of geoscience, there is a critical requirement for an infrastructure that can integrate large-scale, heterogeneous, and distributed storage systems for the sharing of geospatial data within multiple user communities. This article probes into the feasibility to share distributed geospatial data through Grid computing technology by introducing several major issues (including system heterogeneity, uniform mechanism to publish and discover geospatial data, performance, and security) to be faced by geospatial data sharing and how Grid technology can help to solve these issues. Some recent research efforts, such as ESG and the Data Grid system in GMU CSISS, have proven that Grid technology provides a large-scale infrastructure which can seamlessly integrate dispersed geospatial data together and provide uniform and efficient ways to access the data.


2018 ◽  
Vol 9 (2) ◽  
pp. 28-37
Author(s):  
Nur Hidayat Sardini

Moral and political are two things that cannot be separated. The marriage of a regent in Garut regency, Indonesia, with an underage girl eventually leads to community action, where people demand the regent to resign from his position as a regional head. It was not even four days of marriage the regent had divorced his young wife via a short message from his own mobile phone. Therefore, the people of Garut suddenly expressed their wrath through a large-scale demonstration which pushedthe Local House of Representative immediately to process the regent’s removal. This research utilized a qualitative approach with a case-study method, the data in this research relied on the practice of in-depth interviews, observations, and documentaries. This research succeeded to observe that the general factor which underlying the action of demonstration in Garut which demanded the resignation of the regent was caused by the political climate change of democratization in the national level which also impacted Garut Regency. The national politicalclimate change increased the unconventional public participation in Garut and provided political sphere for non-state actors to establish political-involvement balance between state actors and non-state actors themselves. In other hand, the specific underlying factor on this case is the regent’s behavior which was judged as the act of dishonorable humiliation on women’s dignity, especially his speech in some national television channels. The power of this study lies on its novelty, filling in lubrication and study originality, towards the moral and ethical behavior as the new object on Social Movement.   


In Cloud based Big Data applications, Hadoop has been widely adopted for distributed processing large scale data sets. However, the wastage of energy consumption of data centers still constitutes an important axis of research due to overuse of resources and extra overhead costs. As a solution to overcome this challenge, a dynamic scaling of resources in Hadoop YARN Cluster is a practical solution. This paper proposes a dynamic scaling approach in Hadoop YARN (DSHYARN) to add or remove nodes automatically based on workload. It is based on two algorithms (scaling up/down) which are implemented to automate the scaling process in the cluster. This article aims to assure energy efficiency and performance of Hadoop YARN’ clusters. To validate the effectiveness of DSHYARN, a case study with sentiment analysis on tweets about covid-19 vaccine is provided. the goal is to analyze tweets of the people posted on Twitter application. The results showed improvement in CPU utilization, RAM utilization and Job Completion time. In addition, the energy has been reduced of 16% under average workload.


Sign in / Sign up

Export Citation Format

Share Document