Software for Calculating Deformations of the Earth's Surface using Satellite Radar Data

2021 ◽  
Vol 12 (5) ◽  
pp. 246-259
Author(s):  
S. E. Popov ◽  
◽  
R. Yu. Zamaraev ◽  
N. I. Yukina ◽  
O. L. Giniyatullina ◽  
...  

The article presents a description of a software package for calculating displacement rates and detecting displacements of the earths surface over areas of intensive coal mining. The complex is built on the basis of the microservice architecture Docker Swarm in integration with the system of massively parallel execution of tasks Apache Spark, as a high-level tool for organizing container-type computations with orchestration of hardware resources. In the software package, the container is used as an element of the sequence of calculation stages of the mathematical model of interferometric processing, presented in the form of a managed service. The service itself is built on the basis of a microkernel of the specified operating system, with support for multitasking of process identifiers and network protocols. Due to the use of containerization of executor objects, the independence of calculations is achieved both within one pool of jobs and between different pools initialized in multi-user mode. The use of the cluster resource manage­ment system and YARN job scheduling made it possible to abstract all the computing resources of the cluster from the specific launch of jobs and to provide dispatching of distributed processing applications. The use in the program code based on the Sentinel-1 Toolbox of the possibility of storing the intermediate results of the operation of procedures in the schemes for calculating the displacement rates makes it possible to carry out calculations with various parameters, and parallelization provides a reduction in the calculation time in comparison with commercial software products. The combination of Docker Swarm and Apache Spark technologies in one software package made it possible to implement the idea of a high-performance computing system based on open source software and cross-platform programming languages Java and Python using low-budget hardware blocks, including those made in Russia.

Author(s):  
Yao Wu ◽  
Long Zheng ◽  
Brian Heilig ◽  
Guang R Gao

As the attention given to big data grows, cluster computing systems for distributed processing of large data sets become the mainstream and critical requirement in high performance distributed system research. One of the most successful systems is Hadoop, which uses MapReduce as a programming/execution model and takes disks as intermedia to process huge volumes of data. Spark, as an in-memory computing engine, can solve the iterative and interactive problems more efficiently. However, currently it is a consensus that they are not the final solutions to big data due to a MapReduce-like programming model, synchronous execution model and the constraint that only supports batch processing, and so on. A new solution, especially, a fundamental evolution is needed to bring big data solutions into a new era. In this paper, we introduce a new cluster computing system called HAMR which supports both batch and streaming processing. To achieve better performance, HAMR integrates high performance computing approaches, i.e. dataflow fundamental into a big data solution. With more specifications, HAMR is fully designed based on in-memory computing to reduce the unnecessary disk access overhead; task scheduling and memory management are in fine-grain manner to explore more parallelism; asynchronous execution improves efficiency of computation resource usage, and also makes workload balance across the whole cluster better. The experimental results show that HAMR can outperform Hadoop MapReduce and Spark by up to 19x and 7x respectively, in the same cluster environment. Furthermore, HAMR can handle scaling data size well beyond the capabilities of Spark.


Author(s):  
David Lowther ◽  
Vahid Ghorbanian ◽  
Mohammad Hossain Mohammadi ◽  
Issah Ibrahim

Purpose The design of electromagnetic systems for a variety of applications such as induction heating, electrical machines, actuators and transformers requires the solution of a multi-physics problem often involving thermal, structural and mechanical coupling to the electromagnetic system. This results in a complex analysis system embedded within an optimization process. The appearance of high-performance computing systems over the past few years has made coupled simulations feasible for the design engineer. When coupled with surrogate modelling techniques, it is possible to significantly reduce the wall clock time for generating a complete design while including the impact of the multi-physics performance on the device. Design/methodology/approach An architecture is proposed for linking multiple singe physics analysis tools through the material models and a controller which schedules the execution of the various software tools. The combination of tools is implemented on a series of computational nodes operating in parallel and creating a “super node” cluster within a collection of interconnected processors. Findings The proposed architecture and job scheduling system can allow a parallel exploration of the design space for a device. Originality/value The originality of the work derives from the organization of the parallel computing system into a series of “super nodes” and the creation of a materials database suitable for multi-physics interactions.


Author(s):  
Masnida Hussin ◽  
Raja Azlina Raja Mahmood ◽  
Mas Rina Mustaffa

Energy consumption in distributed computing system gains a lot of attention recently after its processing capacity becomes significant for better business and economic operations. Comprehensive analysis of energy efficiency in high-performance data center for distributed processing requires ability to monitor a proportion of resource utilization versus energy consumption. In order to gain green data center while sustaining computational performance, a model of energy efficient cyber-physical communication is proposed. A real-time sensor communication is used to monitor heat emitted by processors and room temperature. Specifically, our cyber-physical communication model dynamically identifies processing states in data center while implying a suitable air-conditioning temperature level. The information is then used by administration to fine-tune the room temperature according to the current processing activities. Our automated triggering approach aims to improve edge computing performance with cost-effective energy consumption. Simulation experiments show that our cyber-physical communication achieves better energy consumption and resource utilization compared with other cooling model.


Author(s):  
O. Dmytriieva ◽  
◽  
D. Nikulin

Роботу присвячено питанням розподіленої обробки транзакцій при проведенні аналізу великих обсягів даних з метою пошуку асоціативних правил. На основі відомих алгоритмів глибинного аналізу даних для пошуку частих предметних наборів AIS та Apriori було визначено можливі варіанти паралелізації, які позбавлені необхідності ітераційного сканування бази даних та великого споживання пам'яті. Досліджено можливість перенесення обчислень на різні платформи, які підтримують паралельну обробку даних. В якості обчислювальних платформ було обрано MapReduce – потужну базу для обробки великих, розподілених наборів даних на кластері Hadoop, а також програмний інструмент для обробки надзвичайно великої кількості даних Apache Spark. Проведено порівняльний аналіз швидкодії розглянутих методів, отримано рекомендації щодо ефективного використання паралельних обчислювальних платформ, запропоновано модифікації алгоритмів пошуку асоціативних правил. В якості основних завдань, реалізованих в роботі, слід визначити дослідження сучасних засобів розподіленої обробки структурованих і не структурованих даних, розгортання тестового кластера в хмарному сервісі, розробку скриптів для автоматизації розгортання кластера, проведення модифікацій розподілених алгоритмів з метою адаптації під необхідні фреймворки розподілених обчислень, отримання показників швидкодії обробки даних в послідовному і розподіленому режимах з застосуванням Hadoop MapReduce. та Apache Spark, проведення порівняльного аналізу результатів тестових вимірів швидкодії, отримання та обґрунтування залежності між кількістю оброблюваних даних, і часом, витраченим на обробку, оптимізацію розподілених алгоритмів пошуку асоціативних правил при обробці великих обсягів транзакційних даних, отримання показників швидкодії розподіленої обробки існуючими програмними засобами. Ключові слова: розподілена обробка, транзакційні дані, асоціативні правила, обчислюваний кластер, Hadoop, MapReduce, Apache Spark


2016 ◽  
Vol 11 (1) ◽  
pp. 72-80
Author(s):  
O.V. Darintsev ◽  
A.B. Migranov

In article one of possible approaches to synthezis of group control of mobile robots which is based on use of cloud computing is considered. Distinctive feature of the offered techniques is adequate reflection of specifics of a scope and the robots of tasks solved by group in architecture of control-information systems, methods of the organization of information exchange, etc. The approach offered by authors allows to increase reliability and robustness of collectives of robots, to lower requirements to airborne computers when saving summary high performance in general.


2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye ◽  
Juniarto Samsudin ◽  
Yongqing Zhu

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Mahdi Torabzadehkashi ◽  
Siavash Rezaei ◽  
Ali HeydariGorji ◽  
Hosein Bobarshad ◽  
Vladimir Alves ◽  
...  

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.


Sign in / Sign up

Export Citation Format

Share Document