Dynamic Control of Computation Consistency in the Parallel Dataflow Computing System

2021 ◽  
Vol 27 (12) ◽  
pp. 625-633
Author(s):  
N. N. Levchenko ◽  
◽  
D. N. Zmejev ◽  

When developing high-performance multiprocessor computing systems, much attention is paid to ensuring uninterrupted operation, both in terms of hardware and software. In traditional computing systems, software is the main focus in address­ing these issues. The article discusses the solution to the issue of ensuring uninterrupted operation for the parallel dataflow computing system (PDCS), which implements the dataflow computational model with a dynamically formed context. Due to the features of the PDCS, it is proposed to implement this type of control in hardware, which will increase its efficiency, since the computational process will be controlled in dynamics, and not only in statics.

Author(s):  
K. I. Volovich ◽  
S. A. Denisov ◽  
S. I. Malkovsky

The article is devoted to the problem of solving scientific problems in the field of high-performance computing systems. An approach to solving a certain kind of problems in materials science is the use of mathematical modeling technologies implemented by specialized modeling systems. The greatest efficiency of the modeling system is shown when deployed in hybrid high-performance computing systems (HHPC), which have high performance and allow solving problems in an acceptable time with sufficient accuracy. However, there are a number of limitations that affect the work of the research team with modeling systems in the HHPC computing environment: the need to access graphics accelerators at the stage of development and debugging of algorithms in the modeling system, the need to use several modeling systems in order to obtain the most optimal solution, the need to dynamically change settings modeling systems for solving problems. The solution to the problem of the above limitations is assigned to an individual modeling environment functioning in the HHPC computing environment. The optimal solution for creating an individual modeling environment is the technology of virtual containerization. An algorithm for the formation of an individual modeling environment in a hybrid high-performance computing complex based on the «docker» virtual containerization system is proposed. An individual modeling environment is created by installing the necessary software in the base container, setting environment variables, installing custom software and licenses. A feature of the algorithm is the ability to form a library image from a base container with a customized individual modeling environment. In conclusion, the direction for further research work is indicated. The algorithm presented in the article is independent of the implementation of the job management system and can be used for any high-performance computing system.


2019 ◽  
Vol 214 ◽  
pp. 01033 ◽  
Author(s):  
Teo Mrnjavac ◽  
Vasco Chibante Barroso

The ALICE Experiment at CERN LHC (Large Hadron Collider) is under preparation for a major upgrade that is scheduled to be deployed during Long Shutdown 2 in 2019-2020 and that includes new computing systems, called O2 (Online-Offine). To ensure the efficient operation of the upgraded experiment along with its newly designed computing system, a reliable, high performance and automated control system will be developed with the goal of managingthe lifetime of all the O2 processes, and of handling the various phases of the data taking activity by interacting with the detectors, the trigger system and the LHC. The ALICE O2 control system will be a distributed systembased on state of the art cluster management and microservices which have recently emerged in the distributed computing ecosystem. Such technologies weren’t available during the design and development of the original LHC computing systems, and their use will allow the ALICE collaboration to benefit from a vibrant and innovatingopen source community. This paper illustrates the O2 control system architecture. It evaluates several olutionsthat were considered during an initial prototyping phase and provides a rationale for the choices made. It also provides an in-depth overview of the components, features and design elements of the actual system.


SPIN ◽  
2013 ◽  
Vol 03 (04) ◽  
pp. 1340012 ◽  
Author(s):  
HAO MENG ◽  
GUCHANG HAN

High performance computing system design based on complementary metal oxide semiconductor (CMOS) is facing more and more challenges due to the volatility, increased leak current and interconnection delay. Computations utilizing magnetic logic devices have attracted considerable interest as the potential alternatives because of their features of nonvolatility, re-configurability, unlimited endurance and low power consumption. Instead of using electron charges, the magnetic logic device stores and processes the data information by controlling spins, i.e., the magnetization states in a device. The emerging technologies related to the magnetic logic are mainly composed of three design schemes, i.e., the magnetoresistive logic, the magnetic quantum cellular automata and the magnetic domain wall logic. This paper will illustrate the principles as well as review the recent developments of these magnetic logic devices. Challenges and prospects of the future development are also discussed.


2019 ◽  
Vol 23 (2) ◽  
pp. 137-152
Author(s):  
S. S. Schevelev

Purpose of research. A reconfigurable computer system consists of a computing system and special-purpose computers that are used to solve the tasks of vector and matrix algebra, pattern recognition. There are distinctions between matrix and associative systems, neural networks. Matrix computing systems comprise a set of processor units connected through a switching device with multi-module memory. They are designed to solve vector, matrix and data array problems. Associative systems contain a large number of operating devices that can simultaneously process multiple data streams. Neural networks and neurocomputers have high performance when solving problems of expert systems, pattern recognition due to parallel processing of a neural network.Methods. An information graph of the computational process of a reconfigurable modular system was plotted. Structural and functional schemes, algorithms that implement the construction of specialized modules for performing arithmetic and logical operations, search operations and functions for replacing occurrences in processed words were developed. Software for modelling the operation of the arithmetic-symbol processor, specialized computing modules, and switching systems was developed.Results. A block diagram of a reconfigurable computing modular system was developed. The system consists of compatible functional modules and is capable of static and dynamic reconfiguration, has a parallel connection structure of the processor and computing modules through the use of interface channels. It consists of an arithmeticsymbol processor, specialized computing modules and switching systems; it performs specific tasks of symbolic information processing, arithmetic and logical operations.Conclusion. Systems with a reconfigurable structure are high-performance and highly reliable computing systems that consist of integrated processors in multi-machine and multiprocessor systems. Reconfigurability of the structure provides high system performance due to its adaptation to computational processes and the composition of the processed tasks.


Author(s):  
Apolinar Velarde Martinez

Increasingly complex algorithms for the modeling and resolution of different problems, which are currently facing humanity, has made it necessary the advent of new data processing requirements and the consequent implementation of high performance computing systems; but due to the high economic cost of this type of equipment and considering that an education institution cannot acquire, it is necessary to develop and implement computable architectures that are economical and scalable in their construction, such as heterogeneous distributed computing systems, constituted by several clustering of multicore processing elements with shared and distributed memory systems. This paper presents the analysis, design and implementation of a high-performance computing system called Liebres InTELigentes, whose purpose is the design and execution of intrinsically parallel algorithms, which require high amounts of storage and excessive processing times. The proposed computer system is constituted by conventional computing equipment (desktop computers, lap top equipment and servers), linked by a high-speed network. The main objective of this research is to build technology for the purposes of scientific and educational research.


2012 ◽  
Vol 4 (1) ◽  
pp. 37-51 ◽  
Author(s):  
Hodjat Hamidi ◽  
Abbas Vafaei ◽  
Seyed Amir Hassan Monadjemi

In this paper, the authors present a new approach to algorithm based fault tolerance (ABFT) for High Performance computing system. The Algorithm Based Fault Tolerance approach transforms a system that does not tolerate a specific type of fault, called the fault-intolerant system, to a system that provides a specific level of fault tolerance, namely recovery. The ABFT techniques that detect errors rely on the comparison of parity values computed in two ways, the parallel processing of input parity values produce output parity values comparable with parity values regenerated from the original processed outputs, can apply convolution codes for the redundancy. This method is a new approach to concurrent error correction in fault-tolerant computing systems. This paper proposes a novel computing paradigm to provide fault tolerance for numerical algorithms. The authors also present, implement, and evaluate early detection in ABFT.


2016 ◽  
Vol 11 (1) ◽  
pp. 72-80
Author(s):  
O.V. Darintsev ◽  
A.B. Migranov

In article one of possible approaches to synthezis of group control of mobile robots which is based on use of cloud computing is considered. Distinctive feature of the offered techniques is adequate reflection of specifics of a scope and the robots of tasks solved by group in architecture of control-information systems, methods of the organization of information exchange, etc. The approach offered by authors allows to increase reliability and robustness of collectives of robots, to lower requirements to airborne computers when saving summary high performance in general.


Sign in / Sign up

Export Citation Format

Share Document