Increasing the resiliency of highly loaded systems

Informacionno-technologicheskij vestnik ◽

10.21499/2409-1650-2020-25-3-118-123 ◽

2020 ◽

pp. 118-123

Author(s):

В.А. Рудометкин

Keyword(s):

Fault Tolerance ◽

High Load ◽

High Availability ◽

Distributed Transactions ◽

Microservice Architecture ◽

To Receive ◽

Processor Cores ◽

Highly Loaded

В настоящее время большинство сервисов переходят в онлайн, что позволяет пользователям получать услугу в любое время. Высокая доступность услуги приводит к росту количества пользователей, что влечет за собой повышение нагрузки на систему, поэтому необходимо уделить особое внимание отказоустойчивости системы перед началом ее разработки. Рассматриваются основные проблемы высоконагруженных систем, способ оптимизации приложения путем распараллеливания задач по ядрам процессора. В данной статье описывается необходимость перехода на микросервисную архитектуру, ее недостатки и способы их устранения. В процессе решения проблем масштабирования, затрагиваются проблемы распределенных транзакций и долгого ответа от сервера. Nowadays, most of the services are moving online, which allows users to receive the service at any time. The high availability of the service leads to an increase in the number of users, which entails an increase in the load on the system, therefore, it is necessary to pay special attention to the fault tolerance of the system before starting its development. The main problems of high-load systems, a way to optimize an application by parallelizing tasks across processor cores are considered. This article describes the need to migrate to a microservice architecture, its weaknesses, and how to fix them. In the process of solving scaling problems, the problems of distributed transactions and long response from the server are addressed.

Download Full-text

Designing Highly Loaded Systems

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2020-32(6)-6 ◽

2020 ◽

Vol 32 (6) ◽

pp. 79-86

Author(s):

Vasilii Andreevich Rudometkin

Keyword(s):

System Architecture ◽

Negative Impact ◽

High Load ◽

High Availability ◽

Data Loss ◽

Reading And Writing ◽

System Components ◽

To Receive ◽

Highly Loaded

Nowadays, most of the services are moving online, which allows users to receive the service at any time. The high availability of the service leads to an increase in the number of users, which entails an increase in the load on the system. High load has a negative impact on system components, which can lead to malfunctions and data loss. To avoid this, the article discusses several design and monitoring approaches, the observance of which will help prevent system malfunctioning. The article describes the most popular way to distribute the area of responsibility of each service, in accordance with the DDD pattern, the use of which will allow you to separate the components of the system logically by use and physically when scaling the system. This approach will also be useful when scaling a team and allow developers to work independently on different system components without interfering with each other. The integration of new people into the project will also take the shortest possible time. When designing the system architecture, it is worth paying attention to the scheme of interaction between services. Using the CQRS pattern allows you to separate reading and writing into different components, which later allows the user to quickly receive a response from the system. Particular attention in the article is paid to monitoring the system, since with an increase in the size of the system, the time to search for errors in the system reaches a large amount of time, which can lead to a long unavailability of the system, which will entail the loss of clients. All the methods described in the article have been applied on many projects, for example, MTS POISK. Thanks to a properly designed system, it was possible to reduce the waiting time for a service response from two minutes to several seconds without losing the quality of the result, and a sophisticated system monitoring system allows you to monitor all processes within the system in real time and prevent accidents. As a result, at the beginning of the system design, special attention should be paid to the architecture, the issue of monitoring and testing the system. Subsequently, these temporary investments will reduce the risks of data loss and system unavailability.

Download Full-text

Cloud-Niagara: A high availability and low overhead fault tolerance middleware for the cloud

16th Int'l Conf. Computer and Information Technology ◽

10.1109/iccitechn.2014.6997344 ◽

2014 ◽

Cited By ~ 3

Author(s):

Asif Imran ◽

Alim Ul Gias ◽

Rayhanur Rahman ◽

Amit Seal ◽

Tajkia Rahman ◽

...

Keyword(s):

Fault Tolerance ◽

High Availability

Download Full-text

Research on load balancing technology for microservice architecture

MATEC Web of Conferences ◽

10.1051/matecconf/202133608002 ◽

2021 ◽

Vol 336 ◽

pp. 08002

Author(s):

Hao Wang ◽

Yong Wang ◽

Guanying Liang ◽

Yunfan Gao ◽

Weijian Gao ◽

...

Keyword(s):

Artificial Intelligence ◽

Load Balancing ◽

High Availability ◽

Software Architectures ◽

Delayed Response ◽

Urgent Problem ◽

Service Load ◽

Development Direction ◽

Service Capability ◽

Microservice Architecture

With the emergence and development of new software architectures such as microservices, how to effectively handle the service load and ensure the service capability of the system has become an urgent problem to be solved. Load balancing technology needs to achieve high availability of microservices without affecting the delayed response of requests. According to different principles of adoption, mainstream load balancing technologies have emerged, such as polling methods, hash algorithms, and artificial intelligence technologies. This article categorizes and summarizes load balancing technologies for microservice architecture, and elaborates the methods and characteristics of current mainstream load balancing technologies. Based on the comparative analysis of existing technologies, this paper summarizes and points out the future development direction of load balancing technology.

Download Full-text

Enhancing Fault Tolerance of Cloud Nodes using Replication Techniques

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e5607.018520 ◽

2020 ◽

Vol 8 (5) ◽

pp. 2040-2044

Keyword(s):

Fault Tolerance ◽

High Availability ◽

The Other ◽

Cloud Infrastructure ◽

Virtual Node ◽

Adaptive Scheme ◽

Fine Grained ◽

Cloud Technologies ◽

Tolerance Method ◽

Selection Of

The cloud technologies are gaining boom in the field of information technology. But on the same side cloud computing sometimes results in failures. These failures demand more reliable frameworks with high availability of computers acting as nodes. The request made by the user is replicated and sent to various VMs. If one of the VMs fail, the other can respond to increase the reliability. A lot of research has been done and being carried out to suggest various schemes for fault tolerance thus increasing the reliability. Earlier schemes focus on only one way of dealing with faults but the scheme proposed by the the author in this paper presents an adaptive scheme that deals with the issues related to fault tolerance in various cloud infrastructure. The projected scheme uses adaptive behavior during the selection of replication and fine-grained checkpointing methods for attaining a reliable cloud infrastructure that can handle different client requirements. In addition to it the algorithm also determines the best suited fault tolerance method for every designated virtual node. Zheng, Zhou,. Lyu and I. King (2012).

Download Full-text

Tightly coupled multiprocessor systems with high availability exploiting fault-tolerance features

Microprocessing and Microprogramming ◽

10.1016/0165-6074(87)90112-8 ◽

1987 ◽

Vol 20 (1-3) ◽

pp. 11-13

Author(s):

Riccardo Curti

Keyword(s):

Fault Tolerance ◽

High Availability ◽

Multiprocessor Systems ◽

Tightly Coupled

Download Full-text

Increasing the load carrying capacity of highly loaded gears by nitriding

MATEC Web of Conferences ◽

10.1051/matecconf/201928702001 ◽

2019 ◽

Vol 287 ◽

pp. 02001 ◽

Cited By ~ 1

Author(s):

Johannes Koenig ◽

Stefanie Hoja ◽

Thomas Tobie ◽

Franz Hoffmann ◽

Karsten Stahl

Keyword(s):

Carrying Capacity ◽

High Sensitivity ◽

Compound Layer ◽

High Load ◽

Load Carrying Capacity ◽

Stable Compound ◽

Current State ◽

Load Carrying ◽

State Of Research ◽

Highly Loaded

Nitriding is a common heat treatment process for highly loaded gears. A very hard compound layer with a thickness of a few microns is produced at the surface of the gear. In the underlying material areas, a diffusion layer with nitride precipitations is formed. This publication summarizes the state of knowledge of nitrided gears and gives an overview of the current state of research in the field of nitrided gears. It can be concluded that a high load carrying capacity of nitrided gears is dependent on an adequate NHD and a stable compound layer. However, due to the increased surface roughness after nitriding, the risk of micropitting increases, too. Therefore, it may be favourable to grind the gears after nitriding. Ground gears also can provide a high load carrying capacity, but it must be taken into account that the wear performance will decrease significantly, since it is mainly influenced by the compound layer. In addition, nitrided gears usually show a high sensitivity against local load peaks. Beyond creating a stable compound the layer, the realization of a sufficient nitriding hardness depth with larger gear sizes is a focus in the current field of research.

Download Full-text

Multi-SPMD Programming Model with YML and XcalableMP

XcalableMP PGAS Programming Language ◽

10.1007/978-981-15-7683-6_9 ◽

2020 ◽

pp. 219-243

Author(s):

Miwako Tsuji ◽

Hitoshi Murai ◽

Taisuke Boku ◽

Mitsuhisa Sato ◽

Serge G. Petiton ◽

...

Keyword(s):

Fault Tolerance ◽

Programming Model ◽

Parallel Program ◽

Programming Models ◽

Hierarchical Systems ◽

Scalable Applications ◽

Processor Cores

AbstractThis chapter describes a multi-SPMD (mSPMD) programming model and a set of software and libraries to support the mSPMD programming model. The mSPMD programming model has been proposed to realize scalable applications on huge and hierarchical systems. It has been evident that simple SPMD programs such as MPI, XMP, or hybrid programs such as OpenMP/MPI cannot exploit the postpeta- or exascale systems efficiently due to the increasing complexity of applications and systems. The mSPMD programming model has been designed to adopt multiple programming models across different architecture levels. Instead of invoking a single parallel program on millions of processor cores, multiple SPMD programs of moderate sizes can be worked together in the mSPMD programming model. As components of the mSPMD programming model, XMP has been supported. Fault-tolerance features, correctness checks, and some numerical libraries’ implementations in the mSPMD programming model have been presented.

Download Full-text

Database Sharding

International Journal of Cloud Applications and Computing ◽

10.4018/ijcac.2015040103 ◽

2015 ◽

Vol 5 (2) ◽

pp. 36-52 ◽

Cited By ~ 8

Author(s):

Sikha Bagui ◽

Loi Tang Nguyen

Keyword(s):

Big Data ◽

Fault Tolerance ◽

Distributed Database ◽

Database System ◽

High Availability ◽

Effective Management ◽

Large Databases ◽

Distributed Database System ◽

Horizontal Partitioning

In this paper, the authors present an architecture and implementation of a distributed database system using sharding to provide high availability, fault-tolerance, and scalability of large databases in the cloud. Sharding, or horizontal partitioning, is used to disperse the data among the data nodes located on commodity servers for effective management of big data on the cloud.

Download Full-text