scholarly journals Review on failure forecast in cloud for a fault tolerant system

2018 ◽  
Vol 7 (3.3) ◽  
pp. 67
Author(s):  
J M. Nandhini ◽  
T Gnanasekaran

Cloud Computing is an increasingly popular computer paradigm constituting a large infrastructure involving storage, memory, servers and applications accessible via computer network. The cloud system design aims to provide on-demand services with scalability on diverse resources to ensure efficient resource utilization in addition to effectiveness. As cloud is a service-oriented infrastructure, it is critically imperative that the system is highly reliable to meet the Service Level Agreement (SLA). To achieve reliability, cloud requires a very efficient fault tolerance mechanism. Serviceability and reliability is impacted by any failure in the system. Prior prediction of faults in the system helps in overcoming failures. The Fault Tolerance in cloud involves ascertaining the resource fitness to execute scheduled task. The process involves prior screening of resources against various tasks as part of scheduling process. The scheduling process relies significantly on the virtualization of resources to maintain high efficiency. 

2020 ◽  
Vol 13 (2) ◽  
pp. 16-21
Author(s):  
Ints Meijers

Abstract Attribute study and analysis of fault tolerant data networks. This work is aimed at introducing SLA constrain into fault tolerance and thus increasing overall network availability. Proposed model will evaluate given constraints and select best path that fits requirements. Fault tolerance is increased by adding multiple constraints and thus reducing available paths to best fitting ones.


Author(s):  
А.А. Гончар ◽  
А.П. Овсянников ◽  
А.А. Сорокин ◽  
Б.М. Шабанов ◽  
А.В. Юрченко

Развитие национальной телекоммуникационной сети науки и образования играет ключевую роль в проведении научных исследований в современных условиях. В 2019 г. создана Национальная исследовательская компьютерная сеть (НИКС) телекоммуникационная сеть федерального масштаба, которая должна представлять национальную сеть науки и образования на международной арене. Планы развития НИКС включают организацию магистральной кольцевой инфраструктуры на территории Сибирского и Дальневосточного федеральных округов, расширение взаимодействия с региональными сетями, развитие сетевых сервисов, в том числе передачи данных с заданными требованиями по уровню обслуживания. Реализация планов позволит существенно расширить возможности информационно-телекоммуникационной инфраструктуры сферы науки и образования для проектов мегасайенс, высокопроизводительных и распределенных вычислений, искусственного интеллекта. National research and educational network (NREN) is a key factor for modern research and education. The article considers the prospects and directions of the NREN development in the Russian Federation. In 2019, as a result of the merging of the departmental research and educational networks RUNNet and RASNet, a federal scale telecommunications network called the National research computer network (NRCN) was created. It should play the role of a NREN for the international cooperation projects. The concept of NRCN was approved at the Meeting of the Council of the Ministry of Science and Higher Education of the Russian Federation on information and telecommunication infrastructure, information security and supercomputer technologies. NRCN development plans include setting up the circle backbone infrastructure as well as deploying the federal network nodes in the Siberia and the Far East and providing effective interconnection to the present regional network nodes. The plan is intended to optimize network connectivity and to improve the quality of the data transmission as well as to implement Service Level Agreement (SLA) and variety network services for research and education. The implementation of the considered plans will significantly improve the reliability of the backbone and regional components of the NRCN, expand the potential of the IT infrastructure for research and education, focused on the MegaScience, highperformance and distributed computing, Big Data, Deep Learning and AI.


Author(s):  
Mohd Farhan Md Fudzee ◽  
Jemal H. Abawajy

It is paramount to provide seamless and ubiquitous access to rich contents available online to interested users via a wide range of devices with varied characteristics. Recently, a service-oriented content adaptation scheme has emerged to address this content-device mismatch problem. In this scheme, content adaptation functions are provided as services by third-party providers. Clients pay for the consumed services and thus demand service quality. As such, negotiating for the QoS offers, assuring negotiated QoS levels and accuracy of adapted content version are essential. Any non-compliance should be handled and reported in real time. These issues elevate the management of service level agreement (SLA) as an important problem. This chapter presents prior work, important challenges, and a framework for managing SLA for service-oriented content adaptation platform.


Author(s):  
Suvendu Chandan Nayak ◽  
Sasmita Parida ◽  
Chitaranjan Tripathy ◽  
Prasant Kumar Pattnaik

The basic concept of cloud computing is based on “Pay per Use”. The user can use the remote resources on demand for computing on payment basis. The on-demand resources of the user are provided according to a Service Level Agreement (SLA). In real time, the tasks are associated with a time constraint for which they are called deadline based tasks. The huge number of deadline based task coming to a cloud datacenter should be scheduled. The scheduling of this task with an efficient algorithm provides better resource utilization without violating SLA. In this chapter, we discussed the backfilling algorithm and its different types. Moreover, the backfilling algorithm was proposed for scheduling tasks in parallel. Whenever the application environment is changed the performance of the backfilling algorithm is changed. The chapter aims implementation of different types of backfilling algorithms. Finally, the reader can be able to get some idea about the different backfilling scheduling algorithms that are used for scheduling deadline based task in cloud computing environment at the end.


2016 ◽  
Vol 54 ◽  
pp. 247-259 ◽  
Author(s):  
Kuan Lu ◽  
Ramin Yahyapour ◽  
Philipp Wieder ◽  
Edwin Yaqub ◽  
Monir Abdullah ◽  
...  

2021 ◽  
Author(s):  
Paul ChanHyung Park

Docker has been widely adopted as a platform solution for microservice. As the popularity of microservice increases, the importance of fine-tuning the efficiency of resource management in the Docker platform also increases. While Docker’s out-of-box resource management solution provides some generic management capability, more work is required to improve resource utilization and enforce Service Level Agreement (SLA) for critical services. In this research, an efficient Docker resource management scheme, called Adaptive SLA Enforcement, is designed and implemented. For the sake of comparison, we also study and implement three simpler schemes: 1) Fixed Number of Containers, 2) Dynamic Resource Management without SLA Enforcement, 3) Strict SLA Enforcement. We found that the Adaptive SLA Enforcement scheme can deliver efficient resource management with SLA enforcement, thus successfully addressing the deficiencies of the other three schemes.


2021 ◽  
Author(s):  
Paul ChanHyung Park

Docker has been widely adopted as a platform solution for microservice. As the popularity of microservice increases, the importance of fine-tuning the efficiency of resource management in the Docker platform also increases. While Docker’s out-of-box resource management solution provides some generic management capability, more work is required to improve resource utilization and enforce Service Level Agreement (SLA) for critical services. In this research, an efficient Docker resource management scheme, called Adaptive SLA Enforcement, is designed and implemented. For the sake of comparison, we also study and implement three simpler schemes: 1) Fixed Number of Containers, 2) Dynamic Resource Management without SLA Enforcement, 3) Strict SLA Enforcement. We found that the Adaptive SLA Enforcement scheme can deliver efficient resource management with SLA enforcement, thus successfully addressing the deficiencies of the other three schemes.


2014 ◽  
Vol 40 (5) ◽  
pp. 1621-1633 ◽  
Author(s):  
Yongqiang Gao ◽  
Haibing Guan ◽  
Zhengwei Qi ◽  
Tao Song ◽  
Fei Huan ◽  
...  

Author(s):  
MALARVIZHI NANDAGOPAL ◽  
S. GAJALAKSHMI ◽  
V. RHYMEND UTHARIARAJ

Computational grids have the potential for solving large-scale scientific applications using heterogeneous and geographically distributed resources. In addition to the challenges of managing and scheduling these applications, reliability challenges arise because of the unreliable nature of grid infrastructure. Two major problems that are critical to the effective utilization of computational resources are efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance mechanism with minimum total time to release (MTTR) job scheduling algorithm. TTR includes the service time of the job, waiting time in the queue, transfer of input and output data to and from the resource. The MTTR algorithm minimizes the response time by selecting a computational resource based on job requirements, job characteristics, and hardware features of the resources. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. Globus ToolKit is used as the grid middleware to set up a grid environment and evaluate the performance of the proposed approach. The monitoring tools Ganglia and Network Weather Service are used to gather hardware and network details, respectively. The experimental results demonstrate that, the proposed approach effectively schedule the grid jobs with fault-tolerant way thereby reduces TTR of the jobs submitted in the grid. Also, it increases the percentage of jobs completed within specified deadline and making the grid trustworthy.


Sign in / Sign up

Export Citation Format

Share Document