Исследуются вопросы организации многопользовательской работы гибридных вычислительных систем. На примере кластера Центра коллективного пользования Центр данных ДВО РАН, построенного на архитектуре OpenPOWER, рассмотрены особенности функционирования систем подобного класса и предложены решения для организации их работы. С использованием механизма виртуальных узлов проведена адаптация системы диспетчеризации заданий PBS Professional, позволяющая организовать эффективное распределение аппаратных ресурсов кластера между пользовательскими задачами. Реализованное программное окружение кластера с системой комплексного планирования заданий рассчитано на работу с широким перечнем компьютерных приложений, включая программы, построенные с использованием различных технологий параллельного программирования. Для эффективного исполнения в данной среде решений на основе машинного обучения, глубокого обучения и искусственного интеллекта применены технологии виртуализации. С использованием возможностей среды контейнеризации Singularity сформирован специализированный стек программного обеспечения и реализован особый режим его работы в формате единой вычислительной цифровой платформы.
Purpose. Improving the technology of machine learning, deep learning and artificial intelligence plays an important role in acquiring new knowledge, technological modernization and the digital economy development.An important factor of the development in these areas is the availability of an appropriate highperformance computing infrastructure capable of providing the processing of large amounts of data. The creation of coprocessorbased hybrid computing systems, as well as new parallel programming technologies and application development tools allows partial solving this problem. However, many issues of organizing the effective multiuser operation of this class of systems require a separate study. The current paper addresses research in this area. Methodology. Using the OpenPOWER architecturebased cluster in the Shared Services Center The Data Center of the Far Eastern Branch of the Russian Academy of Sciences, the features of the functioning of hybrid computing systems are considered and solutions are proposed for organizing their work in a multiuser mode. Based on the virtual nodes concept, an adaptation of the PBS Professional job scheduling system was carried out, which provides an efficient allocation of cluster hardware resources among user tasks. Application virtualization technology was used for effective execution of machine learning and deep learning problems. Findings. The implemented cluster software environment with the integrated task scheduling system is designed to work with a wide range of computer applications, including programs built using parallel programming technologies. The virtualization technologies were used in this environment for effective execution of the software, based on machine learning, deep learning and artificial intelligence. Having the capabilities of the container Singularity, a specialized software stack and its operation mode was implemented for execution machine learning, deep learning and artificial intelligence tasks on a unified computing digital platform. Originality. The features of hybrid computing platforms functioning are considered, and the approach for their effective multiuser work mode is proposed. An effective resource manage model is developed, based on the virtualization technology usage.