scholarly journals w-Stacking w-projection hybrid algorithm for wide-field interferometric imaging: implementation details and improvements

Author(s):  
Luke Pratley ◽  
Melanie Johnston-Hollitt ◽  
Jason D. McEwen

Abstract We present a detailed discussion of the implementation strategies for a recently developed w-stacking w-projection hybrid algorithm used to reconstruct wide-field interferometric images. In particular, we discuss the methodology used to deploy the algorithm efficiently on a supercomputer via use of a Message Passing Interface (MPI) k-means clustering technique to achieve efficient construction and application of non-coplanar effects. Additionally, we show that the use of conjugate symmetry can increase the w-stacking efficiency, decrease the time required to construction, and apply w-projection kernels for large data sets. We then demonstrate this implementation by imaging an interferometric observation of Fornax A from the Murchison Widefield Array (MWA). We perform an exact non-coplanar wide-field correction for 126.6 million visibilities using 50 nodes of a computing cluster. The w-projection kernel construction takes only 15 min prior to reconstruction, demonstrating that the implementation is both fast and efficient.

2019 ◽  
Vol 214 ◽  
pp. 05029 ◽  
Author(s):  
Alexey Rybalchenko ◽  
Dennis Klein ◽  
Mohammad Al-Turany ◽  
Thorsten Kollegger

The high data rates expected for the next generation of particle physics experiments (e.g.: new experiments at FAIR/GSI and the upgrade of CERN experiments) call for dedicated attention with respect to design of the needed computing infrastructure. The common ALICE-FAIR framework ALFA is a modern software layer, that serves as a platform for simulation, reconstruction and analysis of particle physics experiments. Beside standard services needed for simulation and reconstruction of particle physics experiments, ALFA also provides tools for data transport, configuration and deployment. The FairMQ module in ALFA offers building blocks for creating distributed software components (processes) that communicate between each other via message passing. The abstract "message passing" interface in FairMQ has at the moment three implementations: ZeroMQ, nanomsg and shared memory. The newly developed shared memory transport will be presented, that provides significant per-formance benefits for transferring large data chunks between components on the same node. The implementation in FairMQ allows users to switch between the different transports via a trivial configuration change. The design decisions, im-plementation details and performance numbers of the shared memory transport in FairMQ/ALFA will be highlighted.


2018 ◽  
Vol 12 (3) ◽  
pp. 74
Author(s):  
Faten Hamad ◽  
Abdelsalam Alawamrah

Evaluation the performance of the algorithms and the method that is used to implement it play a major role in the assessment of the performance of many applications and it help the researchers to decide which algorithm to use and which method to implement it, it also give indicate of the performance of the hardware that the algorithm is tested over. In this paper we evaluate the performance of solving linear equation application over supercomputer which was implemented and using Message Passing interface (MPI) library. The sequential and multithreaded algorithm for solving linear equations has been experimented too and the results has been recorded, the speedup and efficiency of the algorithm has been calculated and the results showed that the parallel algorithm outperforms other methods with the large size matrix of 8192 * 8192 over the number of processors of 64. For large input size, the results also showed that there is a noticeable decrease in running time as the number of processors increase. But in case of multithreaded the results showed that as the matrix size increase the time required for running the algorithm is rapidly increasing although the number of threads increased. This indicates that the parallel performance over for large matrix input size is better and outperforms other methods. 


2016 ◽  
Vol 2 ◽  
pp. 115
Author(s):  
Astika Ayuningtyas

Parallel processing is a process of calculating two or more tasks simultaneously through the optimization of the computer system resource, one treatment models is a desktop system. The model allows to perform parallel processing between computers with specifications different computers. An implementation of a model network of workstations using MPI (Message Passing Interface). In this study, applied to the case of the low-pass filtering (LPF), a process in the image or the image of the shape of the filter that retrieves the data at low frequencies. filtering programs lowpass using the cosine transform MPI implemented by modifying the algorithm in the process on each node (computer). Depending on the test results, so that the processing speed of a parallel system is influenced by the number of nodes / processes and the number of frequency components that are processed. In the treatment of single larger process, the time it takes more and more and the value prop affects only the amount of high frequency data is filtered on the field. While parallel processing of more and more computers involved in the filter calculation process low-pass, plus the time required to perform the calculation.


2019 ◽  
Vol 874 (2) ◽  
pp. 174 ◽  
Author(s):  
Luke Pratley ◽  
Melanie Johnston-Hollitt ◽  
Jason D. McEwen

2020 ◽  
Vol 15 ◽  
Author(s):  
Weiwen Zhang ◽  
Long Wang ◽  
Theint Theint Aye ◽  
Juniarto Samsudin ◽  
Yongqing Zhu

Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.


Energies ◽  
2021 ◽  
Vol 14 (8) ◽  
pp. 2284
Author(s):  
Krzysztof Przystupa ◽  
Mykola Beshley ◽  
Olena Hordiichuk-Bublivska ◽  
Marian Kyryk ◽  
Halyna Beshley ◽  
...  

The problem of analyzing a big amount of user data to determine their preferences and, based on these data, to provide recommendations on new products is important. Depending on the correctness and timeliness of the recommendations, significant profits or losses can be obtained. The task of analyzing data on users of services of companies is carried out in special recommendation systems. However, with a large number of users, the data for processing become very big, which causes complexity in the work of recommendation systems. For efficient data analysis in commercial systems, the Singular Value Decomposition (SVD) method can perform intelligent analysis of information. With a large amount of processed information we proposed to use distributed systems. This approach allows reducing time of data processing and recommendations to users. For the experimental study, we implemented the distributed SVD method using Message Passing Interface, Hadoop and Spark technologies and obtained the results of reducing the time of data processing when using distributed systems compared to non-distributed ones.


1996 ◽  
Vol 22 (6) ◽  
pp. 789-828 ◽  
Author(s):  
William Gropp ◽  
Ewing Lusk ◽  
Nathan Doss ◽  
Anthony Skjellum

2013 ◽  
Vol 718-720 ◽  
pp. 1645-1650
Author(s):  
Gen Yin Cheng ◽  
Sheng Chen Yu ◽  
Zhi Yong Wei ◽  
Shao Jie Chen ◽  
You Cheng

Commonly used commercial simulation software SYSNOISE and ANSYS is run on a single machine (can not directly run on parallel machine) when use the finite element and boundary element to simulate muffler effect, and it will take more than ten days, sometimes even twenty days to work out an exact solution as the large amount of numerical simulation. Use a high performance parallel machine which was built by 32 commercial computers and transform the finite element and boundary element simulation software into a program that can running under the MPI (message passing interface) parallel environment in order to reduce the cost of numerical simulation. The relevant data worked out from the simulation experiment demonstrate that the result effect of the numerical simulation is well. And the computing speed of the high performance parallel machine is 25 ~ 30 times a microcomputer.


Author(s):  
Alan Gray ◽  
Kevin Stratford

Leading high performance computing systems achieve their status through use of highly parallel devices such as NVIDIA graphics processing units or Intel Xeon Phi many-core CPUs. The concept of performance portability across such architectures, as well as traditional CPUs, is vital for the application programmer. In this paper we describe targetDP, a lightweight abstraction layer which allows grid-based applications to target data parallel hardware in a platform agnostic manner. We demonstrate the effectiveness of our pragmatic approach by presenting performance results for a complex fluid application (with which the model was co-designed), plus separate lattice quantum chromodynamics particle physics code. For each application, a single source code base is seen to achieve portable performance, as assessed within the context of the Roofline model. TargetDP can be combined with Message Passing Interface (MPI) to allow use on systems containing multiple nodes: we demonstrate this through provision of scaling results on traditional and graphics processing unit-accelerated large scale supercomputers.


Sign in / Sign up

Export Citation Format

Share Document