Review on failure forecast in cloud for a fault tolerant  system

Abstract Attribute study and analysis of fault tolerant data networks. This work is aimed at introducing SLA constrain into fault tolerance and thus increasing overall network availability. Proposed model will evaluate given constraints and select best path that fits requirements. Fault tolerance is increased by adding multiple constraints and thus reducing available paths to best fitting ones.

Download Full-text

Development of Federal telecommunication and computer infrastructure for research and education

Вычислительные технологии ◽

10.25743/ict.2019.24.6.004. ◽

2019 ◽

Author(s):

А.А. Гончар ◽

А.П. Овсянников ◽

А.А. Сорокин ◽

Б.М. Шабанов ◽

А.В. Юрченко

Keyword(s):

Russian Federation ◽

Computer Network ◽

Far East ◽

Service Level Agreement ◽

Network Connectivity ◽

Service Level ◽

Network Nodes ◽

Telecommunications Network ◽

The Russian Federation ◽

Research And Education

Развитие национальной телекоммуникационной сети науки и образования играет ключевую роль в проведении научных исследований в современных условиях. В 2019 г. создана Национальная исследовательская компьютерная сеть (НИКС) телекоммуникационная сеть федерального масштаба, которая должна представлять национальную сеть науки и образования на международной арене. Планы развития НИКС включают организацию магистральной кольцевой инфраструктуры на территории Сибирского и Дальневосточного федеральных округов, расширение взаимодействия с региональными сетями, развитие сетевых сервисов, в том числе передачи данных с заданными требованиями по уровню обслуживания. Реализация планов позволит существенно расширить возможности информационно-телекоммуникационной инфраструктуры сферы науки и образования для проектов мегасайенс, высокопроизводительных и распределенных вычислений, искусственного интеллекта. National research and educational network (NREN) is a key factor for modern research and education. The article considers the prospects and directions of the NREN development in the Russian Federation. In 2019, as a result of the merging of the departmental research and educational networks RUNNet and RASNet, a federal scale telecommunications network called the National research computer network (NRCN) was created. It should play the role of a NREN for the international cooperation projects. The concept of NRCN was approved at the Meeting of the Council of the Ministry of Science and Higher Education of the Russian Federation on information and telecommunication infrastructure, information security and supercomputer technologies. NRCN development plans include setting up the circle backbone infrastructure as well as deploying the federal network nodes in the Siberia and the Far East and providing effective interconnection to the present regional network nodes. The plan is intended to optimize network connectivity and to improve the quality of the data transmission as well as to implement Service Level Agreement (SLA) and variety network services for research and education. The implementation of the considered plans will significantly improve the reliability of the backbone and regional components of the NRCN, expand the potential of the IT infrastructure for research and education, focused on the MegaScience, highperformance and distributed computing, Big Data, Deep Learning and AI.

Download Full-text

Management of Service Level Agreement for Service-Oriented Content Adaptation Platform

Network and Traffic Engineering in Emerging Distributed Computing Applications ◽

10.4018/978-1-4666-1888-6.ch002 ◽

2012 ◽

pp. 21-42 ◽

Cited By ~ 2

Author(s):

Mohd Farhan Md Fudzee ◽

Jemal H. Abawajy

Keyword(s):

Service Level Agreement ◽

Service Level ◽

Third Party ◽

Content Adaptation ◽

Device Mismatch ◽

Wide Range ◽

Service Oriented ◽

Ubiquitous Access ◽

Adaptation Scheme ◽

Demand Service

It is paramount to provide seamless and ubiquitous access to rich contents available online to interested users via a wide range of devices with varied characteristics. Recently, a service-oriented content adaptation scheme has emerged to address this content-device mismatch problem. In this scheme, content adaptation functions are provided as services by third-party providers. Clients pay for the consumed services and thus demand service quality. As such, negotiating for the QoS offers, assuring negotiated QoS levels and accuracy of adapted content version are essential. Any non-compliance should be handled and reported in real time. These issues elevate the management of service level agreement (SLA) as an important problem. This chapter presents prior work, important challenges, and a framework for managing SLA for service-oriented content adaptation platform.

Download Full-text

Resource Allocation Policies in Cloud Computing Environment

Advances in Data Mining and Database Management - Advancing Cloud Database Systems and Capacity Planning With Dynamic Applications ◽

10.4018/978-1-5225-2013-9.ch005 ◽

2017 ◽

pp. 115-132 ◽

Cited By ~ 1

Author(s):

Suvendu Chandan Nayak ◽

Sasmita Parida ◽

Chitaranjan Tripathy ◽

Prasant Kumar Pattnaik

Keyword(s):

Resource Allocation ◽

Cloud Computing ◽

Resource Utilization ◽

Service Level Agreement ◽

Service Level ◽

Computing Environment ◽

Cloud Computing Environment ◽

On Demand ◽

Different Types ◽

Cloud Datacenter

The basic concept of cloud computing is based on “Pay per Use”. The user can use the remote resources on demand for computing on payment basis. The on-demand resources of the user are provided according to a Service Level Agreement (SLA). In real time, the tasks are associated with a time constraint for which they are called deadline based tasks. The huge number of deadline based task coming to a cloud datacenter should be scheduled. The scheduling of this task with an efficient algorithm provides better resource utilization without violating SLA. In this chapter, we discussed the backfilling algorithm and its different types. Moreover, the backfilling algorithm was proposed for scheduling tasks in parallel. Whenever the application environment is changed the performance of the backfilling algorithm is changed. The chapter aims implementation of different types of backfilling algorithms. Finally, the reader can be able to get some idea about the different backfilling scheduling algorithms that are used for scheduling deadline based task in cloud computing environment at the end.

Download Full-text

Fault-tolerant Service Level Agreement lifecycle management in clouds using actor system

Future Generation Computer Systems ◽

10.1016/j.future.2015.03.016 ◽

2016 ◽

Vol 54 ◽

pp. 247-259 ◽

Cited By ~ 25

Author(s):

Kuan Lu ◽

Ramin Yahyapour ◽

Philipp Wieder ◽

Edwin Yaqub ◽

Monir Abdullah ◽

...

Keyword(s):

Fault Tolerant ◽

Service Level Agreement ◽

Service Level ◽

Lifecycle Management

Download Full-text

EFFICIENT RESOURCE MANAGEMENT ON CONTAINER AS A SERVICE

10.32920/ryerson.14645304.v1 ◽

2021 ◽

Author(s):

Paul ChanHyung Park

Keyword(s):

Resource Management ◽

Service Level Agreement ◽

Service Level ◽

Fixed Number ◽

Fine Tuning ◽

Management Capability ◽

Management Scheme ◽

Generic Management ◽

Efficient Resource ◽

Dynamic Resource

Docker has been widely adopted as a platform solution for microservice. As the popularity of microservice increases, the importance of fine-tuning the efficiency of resource management in the Docker platform also increases. While Docker’s out-of-box resource management solution provides some generic management capability, more work is required to improve resource utilization and enforce Service Level Agreement (SLA) for critical services. In this research, an efficient Docker resource management scheme, called Adaptive SLA Enforcement, is designed and implemented. For the sake of comparison, we also study and implement three simpler schemes: 1) Fixed Number of Containers, 2) Dynamic Resource Management without SLA Enforcement, 3) Strict SLA Enforcement. We found that the Adaptive SLA Enforcement scheme can deliver efficient resource management with SLA enforcement, thus successfully addressing the deficiencies of the other three schemes.

Download Full-text

EFFICIENT RESOURCE MANAGEMENT ON CONTAINER AS A SERVICE

10.32920/ryerson.14645304 ◽

2021 ◽

Author(s):

Paul ChanHyung Park

Keyword(s):

Resource Management ◽

Service Level Agreement ◽

Service Level ◽

Fixed Number ◽

Fine Tuning ◽

Management Capability ◽

Management Scheme ◽

Generic Management ◽

Efficient Resource ◽

Dynamic Resource

Docker has been widely adopted as a platform solution for microservice. As the popularity of microservice increases, the importance of fine-tuning the efficiency of resource management in the Docker platform also increases. While Docker’s out-of-box resource management solution provides some generic management capability, more work is required to improve resource utilization and enforce Service Level Agreement (SLA) for critical services. In this research, an efficient Docker resource management scheme, called Adaptive SLA Enforcement, is designed and implemented. For the sake of comparison, we also study and implement three simpler schemes: 1) Fixed Number of Containers, 2) Dynamic Resource Management without SLA Enforcement, 3) Strict SLA Enforcement. We found that the Adaptive SLA Enforcement scheme can deliver efficient resource management with SLA enforcement, thus successfully addressing the deficiencies of the other three schemes.

Download Full-text

Service Level Agreement Based Fault Tolerant Workload Scheduling in Cloud Computing Environment

International Journal of Grid Computing & Applications ◽

10.5121/ijgca.2016.7401 ◽

2016 ◽

Vol 7 (4) ◽

pp. 01-08

Author(s):

Manpreet Singh Gill ◽

R.K. Bawa

Keyword(s):

Cloud Computing ◽

Fault Tolerant ◽

Service Level Agreement ◽

Service Level ◽

Computing Environment ◽

Cloud Computing Environment ◽

Workload Scheduling

Download Full-text

Service level agreement based energy-efficient resource management in cloud data centers

Computers & Electrical Engineering ◽

10.1016/j.compeleceng.2013.11.001 ◽

2014 ◽

Vol 40 (5) ◽

pp. 1621-1633 ◽

Cited By ~ 40

Author(s):

Yongqiang Gao ◽

Haibing Guan ◽

Zhengwei Qi ◽

Tao Song ◽

Fei Huan ◽

...

Keyword(s):

Resource Management ◽

Energy Efficient ◽

Data Centers ◽

Service Level Agreement ◽

Service Level ◽

Cloud Data ◽

Efficient Resource ◽

Cloud Data Centers

Download Full-text

SCHEDULING WITH JOB CHECKPOINT IN COMPUTATIONAL GRID ENVIRONMENT

International Journal of Modeling Simulation and Scientific Computing ◽

10.1142/s1793962311000517 ◽

2011 ◽

Vol 02 (03) ◽

pp. 299-316

Author(s):

MALARVIZHI NANDAGOPAL ◽

S. GAJALAKSHMI ◽

V. RHYMEND UTHARIARAJ

Keyword(s):

Fault Tolerance ◽

Large Scale ◽

Job Scheduling ◽

Fault Tolerant ◽

Scheduling Algorithm ◽

Computational Grids ◽

Tolerance Mechanism ◽

Grid Resource ◽

Distributed Resources ◽

Grid Environment

Computational grids have the potential for solving large-scale scientific applications using heterogeneous and geographically distributed resources. In addition to the challenges of managing and scheduling these applications, reliability challenges arise because of the unreliable nature of grid infrastructure. Two major problems that are critical to the effective utilization of computational resources are efficient scheduling of jobs and providing fault tolerance in a reliable manner. This paper addresses these problems by combining the checkpoint replication based fault tolerance mechanism with minimum total time to release (MTTR) job scheduling algorithm. TTR includes the service time of the job, waiting time in the queue, transfer of input and output data to and from the resource. The MTTR algorithm minimizes the response time by selecting a computational resource based on job requirements, job characteristics, and hardware features of the resources. The fault tolerance mechanism used here sets the job checkpoints based on the resource failure rate. If resource failure occurs, the job is restarted from its last successful state using a checkpoint file from another grid resource. Globus ToolKit is used as the grid middleware to set up a grid environment and evaluate the performance of the proposed approach. The monitoring tools Ganglia and Network Weather Service are used to gather hardware and network details, respectively. The experimental results demonstrate that, the proposed approach effectively schedule the grid jobs with fault-tolerant way thereby reduces TTR of the jobs submitted in the grid. Also, it increases the percentage of jobs completed within specified deadline and making the grid trustworthy.

Download Full-text