Increasing the resiliency of highly loaded systems

Author(s):  
В.А. Рудометкин

В настоящее время большинство сервисов переходят в онлайн, что позволяет пользователям получать услугу в любое время. Высокая доступность услуги приводит к росту количества пользователей, что влечет за собой повышение нагрузки на систему, поэтому необходимо уделить особое внимание отказоустойчивости системы перед началом ее разработки. Рассматриваются основные проблемы высоконагруженных систем, способ оптимизации приложения путем распараллеливания задач по ядрам процессора. В данной статье описывается необходимость перехода на микросервисную архитектуру, ее недостатки и способы их устранения. В процессе решения проблем масштабирования, затрагиваются проблемы распределенных транзакций и долгого ответа от сервера. Nowadays, most of the services are moving online, which allows users to receive the service at any time. The high availability of the service leads to an increase in the number of users, which entails an increase in the load on the system, therefore, it is necessary to pay special attention to the fault tolerance of the system before starting its development. The main problems of high-load systems, a way to optimize an application by parallelizing tasks across processor cores are considered. This article describes the need to migrate to a microservice architecture, its weaknesses, and how to fix them. In the process of solving scaling problems, the problems of distributed transactions and long response from the server are addressed.

Author(s):  
Vasilii Andreevich Rudometkin

Nowadays, most of the services are moving online, which allows users to receive the service at any time. The high availability of the service leads to an increase in the number of users, which entails an increase in the load on the system. High load has a negative impact on system components, which can lead to malfunctions and data loss. To avoid this, the article discusses several design and monitoring approaches, the observance of which will help prevent system malfunctioning. The article describes the most popular way to distribute the area of responsibility of each service, in accordance with the DDD pattern, the use of which will allow you to separate the components of the system logically by use and physically when scaling the system. This approach will also be useful when scaling a team and allow developers to work independently on different system components without interfering with each other. The integration of new people into the project will also take the shortest possible time. When designing the system architecture, it is worth paying attention to the scheme of interaction between services. Using the CQRS pattern allows you to separate reading and writing into different components, which later allows the user to quickly receive a response from the system. Particular attention in the article is paid to monitoring the system, since with an increase in the size of the system, the time to search for errors in the system reaches a large amount of time, which can lead to a long unavailability of the system, which will entail the loss of clients. All the methods described in the article have been applied on many projects, for example, MTS POISK. Thanks to a properly designed system, it was possible to reduce the waiting time for a service response from two minutes to several seconds without losing the quality of the result, and a sophisticated system monitoring system allows you to monitor all processes within the system in real time and prevent accidents. As a result, at the beginning of the system design, special attention should be paid to the architecture, the issue of monitoring and testing the system. Subsequently, these temporary investments will reduce the risks of data loss and system unavailability.


Author(s):  
Asif Imran ◽  
Alim Ul Gias ◽  
Rayhanur Rahman ◽  
Amit Seal ◽  
Tajkia Rahman ◽  
...  

2021 ◽  
Vol 336 ◽  
pp. 08002
Author(s):  
Hao Wang ◽  
Yong Wang ◽  
Guanying Liang ◽  
Yunfan Gao ◽  
Weijian Gao ◽  
...  

With the emergence and development of new software architectures such as microservices, how to effectively handle the service load and ensure the service capability of the system has become an urgent problem to be solved. Load balancing technology needs to achieve high availability of microservices without affecting the delayed response of requests. According to different principles of adoption, mainstream load balancing technologies have emerged, such as polling methods, hash algorithms, and artificial intelligence technologies. This article categorizes and summarizes load balancing technologies for microservice architecture, and elaborates the methods and characteristics of current mainstream load balancing technologies. Based on the comparative analysis of existing technologies, this paper summarizes and points out the future development direction of load balancing technology.


2020 ◽  
Vol 8 (5) ◽  
pp. 2040-2044

The cloud technologies are gaining boom in the field of information technology. But on the same side cloud computing sometimes results in failures. These failures demand more reliable frameworks with high availability of computers acting as nodes. The request made by the user is replicated and sent to various VMs. If one of the VMs fail, the other can respond to increase the reliability. A lot of research has been done and being carried out to suggest various schemes for fault tolerance thus increasing the reliability. Earlier schemes focus on only one way of dealing with faults but the scheme proposed by the the author in this paper presents an adaptive scheme that deals with the issues related to fault tolerance in various cloud infrastructure. The projected scheme uses adaptive behavior during the selection of replication and fine-grained checkpointing methods for attaining a reliable cloud infrastructure that can handle different client requirements. In addition to it the algorithm also determines the best suited fault tolerance method for every designated virtual node. Zheng, Zhou,. Lyu and I. King (2012).


2019 ◽  
Vol 287 ◽  
pp. 02001 ◽  
Author(s):  
Johannes Koenig ◽  
Stefanie Hoja ◽  
Thomas Tobie ◽  
Franz Hoffmann ◽  
Karsten Stahl

Nitriding is a common heat treatment process for highly loaded gears. A very hard compound layer with a thickness of a few microns is produced at the surface of the gear. In the underlying material areas, a diffusion layer with nitride precipitations is formed. This publication summarizes the state of knowledge of nitrided gears and gives an overview of the current state of research in the field of nitrided gears. It can be concluded that a high load carrying capacity of nitrided gears is dependent on an adequate NHD and a stable compound layer. However, due to the increased surface roughness after nitriding, the risk of micropitting increases, too. Therefore, it may be favourable to grind the gears after nitriding. Ground gears also can provide a high load carrying capacity, but it must be taken into account that the wear performance will decrease significantly, since it is mainly influenced by the compound layer. In addition, nitrided gears usually show a high sensitivity against local load peaks. Beyond creating a stable compound the layer, the realization of a sufficient nitriding hardness depth with larger gear sizes is a focus in the current field of research.


Author(s):  
Miwako Tsuji ◽  
Hitoshi Murai ◽  
Taisuke Boku ◽  
Mitsuhisa Sato ◽  
Serge G. Petiton ◽  
...  

AbstractThis chapter describes a multi-SPMD (mSPMD) programming model and a set of software and libraries to support the mSPMD programming model. The mSPMD programming model has been proposed to realize scalable applications on huge and hierarchical systems. It has been evident that simple SPMD programs such as MPI, XMP, or hybrid programs such as OpenMP/MPI cannot exploit the postpeta- or exascale systems efficiently due to the increasing complexity of applications and systems. The mSPMD programming model has been designed to adopt multiple programming models across different architecture levels. Instead of invoking a single parallel program on millions of processor cores, multiple SPMD programs of moderate sizes can be worked together in the mSPMD programming model. As components of the mSPMD programming model, XMP has been supported. Fault-tolerance features, correctness checks, and some numerical libraries’ implementations in the mSPMD programming model have been presented.


2015 ◽  
Vol 5 (2) ◽  
pp. 36-52 ◽  
Author(s):  
Sikha Bagui ◽  
Loi Tang Nguyen

In this paper, the authors present an architecture and implementation of a distributed database system using sharding to provide high availability, fault-tolerance, and scalability of large databases in the cloud. Sharding, or horizontal partitioning, is used to disperse the data among the data nodes located on commodity servers for effective management of big data on the cloud.


Sign in / Sign up

Export Citation Format

Share Document