Multi-cloud performance and security driven federated workflow management

2016 ◽  
Author(s):  
◽  
Matthew Dickinson

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT REQUEST OF AUTHOR.] In recent years, most scientific research in both academia and industry has become increasingly data-driven. According to market estimates, spending related to supporting scientific data-intensive research is expected to increase to $5.8 billion by 2018. Particularly for data-intensive scientific fields such as bioscience, or particle physics within academic environments, data storage/processing facilities, expert collaborators and specialized computing resources do not always reside within campus boundaries. With the growing trend of large collaborative partnerships involving researchers, expensive scientific instruments and high performance computing centers, experiments and simulations produce peta-bytes of data viz., Big Data, that is likely to be shared and analyzed by scientists in multi-disciplinary areas. Federated multi-cloud resource allocation for data-intensive application workflows is generally performed based on performance or quality of service (i.e., QSpecs) considerations. At the same time, end-to-end security requirements of these workflows across multiple domains are considered as an afterthought due to lack of standardized formalization methods. Consequently, diverse/heterogenous domain resource and security policies cause inter-conflicts between application's security and performance requirements that lead to sub-optimal resource allocations, especially when multiple such applications contend for limited resources. In this thesis, a joint performance and security-driven federated resource allocation scheme for data-intensive scientific applications is presented. In order to aid joint resource brokering among multi-cloud domains with diverse/heterogenous security postures, the definition and characterization of a data-intensive application's security specifications (i.e., SSpecs) is required. Next, an alignment technique inspired by Portunes Algebra to homogenize the various domain resource policies (i.e., RSpecs) along an application's workflow lifecycle stages is presented. Using such formalization and alignment, a near optimal cost-aware joint QSpecs-SSpecs-driven, RSpecs-compliant resource allocation algorithm for multi-cloud computing resource domain/location selection as well as network path selection, is proposed. We implement our security formalization, alignment, and allocation scheme as a framework, viz., "OnTimeURB" and validate it in a multi-cloud environment with exemplar data-intensive application workflows involving distributed computing and remote instrumentation use cases with different performance and security requirements.

Author(s):  
Vijayalakshmi Saravanan ◽  
Anpalagan Alagan ◽  
Isaac Woungang

With the advent of novel wireless technologies and Cloud Computing, large volumes of data are being produced from various heterogeneous devices such as mobile phones, credit cards, and computers. Managing this data has become the de-facto challenge in the current Information Systems. According to Moore's law, processor speeds are no longer doubling, the processing power also continuing to grow rapidly which leads to a new scientific data intensive problem in every field, especially Big Data domain. The revolution of Big Data lies in the improved statistical analysis and computational power depend on its processing speed. Hence, the need to put massively multi-core systems on the job is vital in order to overcome the physical limits of complexity and speed. It also arises with many challenges such as difficulties in capturing massive applications, data storage, and analysis. This chapter discusses some of the Big Data architectural challenges in the perspective of multi-core processors.


Author(s):  
K. Kalyana Chakravarthi ◽  
Vaidhehi Vijayakumar

In the modern era, workflows are adopted as a powerful and attractive paradigm for expressing/solving a variety of applications like scientific, data intensive computing, and big data applications such as MapReduce and Hadoop. These complex applications are described using high-level representations in workflow methods. With the emerging model of cloud computing technology, scheduling in the cloud becomes the important research topic. Consequently, workflow scheduling problem has been studied extensively over the past few years, from homogeneous clusters, grids to the most recent paradigm, cloud computing. The challenges that need to be addressed lies in task-resource mapping, QoS requirements, resource provisioning, performance fluctuation, failure handling, resource scheduling, and data storage. This work focuses on the complete study of the resource provisioning and scheduling algorithms in cloud environment focusing on Infrastructure as a service (IaaS). We provided a comprehensive understanding of existing scheduling techniques and provided an insight into research challenges that will be a possible future direction to the researchers.


2020 ◽  
Vol 1690 ◽  
pp. 012166
Author(s):  
A Alekseev ◽  
A Kiryanov ◽  
A Klimentov ◽  
T Korchuganova ◽  
V Mitsyn ◽  
...  

Author(s):  
Magali Roux

E-sciences are data-intensive sciences that make a large use of the Web to share, collect, and process data. In this context, primary scientific data is becoming a new challenging issue as data must be extensively described (1) to account for empiric conditions and results that allow interpretation and/or analyses and (2) to be understandable by computers used for data storage and information retrieval. With this respect, metadata is a focal point whatever it is considered from the point of view of the user to visualize and exploit data as well as this of the search tools to find and retrieve information. Numerous disciplines are concerned with the issues of describing complex observations and addressing pertinent knowledge. In this paper, similarities and differences in data description and exploration strategies among disciplines in e-sciences are examined.


2020 ◽  
Vol 35 (33) ◽  
pp. 2030022
Author(s):  
Aleksandr Alekseev ◽  
Simone Campana ◽  
Xavier Espinal ◽  
Stephane Jezequel ◽  
Andrey Kirianov ◽  
...  

The experiments at CERN’s Large Hadron Collider use the Worldwide LHC Computing Grid, the WLCG, for its distributed computing infrastructure. Through the distributed workload and data management systems, they provide seamless access to hundreds of grid, HPC and cloud based computing and storage resources that are distributed worldwide to thousands of physicists. LHC experiments annually process more than an exabyte of data using an average of 500,000 distributed CPU cores, to enable hundreds of new scientific results from the collider. However, the resources available to the experiments have been insufficient to meet data processing, simulation and analysis needs over the past five years as the volume of data from the LHC has grown. The problem will be even more severe for the next LHC phases. High Luminosity LHC will be a multiexabyte challenge where the envisaged Storage and Compute needs are a factor 10 to 100 above the expected technology evolution. The particle physics community needs to evolve current computing and data organization models in order to introduce changes in the way it uses and manages the infrastructure, focused on optimizations to bring performance and efficiency not forgetting simplification of operations. In this paper we highlight a recent R&D project related to scientific data lake and federated data storage.


Author(s):  
Hyun Jun Kim ◽  
Ye Seul Son ◽  
Joon Tae Kim

An amendment to this paper has been published and can be accessed via the original article.


Sign in / Sign up

Export Citation Format

Share Document