scholarly journals Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Author(s):  
Alex Gittens ◽  
Kai Rothauge ◽  
Shusen Wang ◽  
Michael W. Mahoney ◽  
Lisa Gerhardt ◽  
...  
2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Bingzheng Li ◽  
Jinchen Xu ◽  
Zijing Liu

With the development of high-performance computing and big data applications, the scale of data transmitted, stored, and processed by high-performance computing cluster systems is increasing explosively. Efficient compression of large-scale data and reducing the space required for data storage and transmission is one of the keys to improving the performance of high-performance computing cluster systems. In this paper, we present SW-LZMA, a parallel design and optimization of LZMA based on the Sunway 26010 heterogeneous many-core processor. Combined with the characteristics of SW26010 processors, we analyse the storage space requirements, memory access characteristics, and hotspot functions of the LZMA algorithm and implement the thread-level parallelism of the LZMA algorithm based on Athread interface. Furthermore, we make a fine-grained layout of LDM address space to achieve DMA double buffer cyclic sliding window algorithm, which optimizes the performance of SW-LZMA. The experimental results show that compared with the serial baseline implementation of LZMA, the parallel LZMA algorithm obtains a maximum speedup ratio of 4.1 times using the Silesia corpus benchmark, while on the large-scale data set, speedup is 5.3 times.


2013 ◽  
Vol 2013 ◽  
pp. 1-6 ◽  
Author(s):  
Ying-Chih Lin ◽  
Chin-Sheng Yu ◽  
Yen-Jen Lin

Recent progress in high-throughput instrumentations has led to an astonishing growth in both volume and complexity of biomedical data collected from various sources. The planet-size data brings serious challenges to the storage and computing technologies. Cloud computing is an alternative to crack the nut because it gives concurrent consideration to enable storage and high-performance computing on large-scale data. This work briefly introduces the data intensive computing system and summarizes existing cloud-based resources in bioinformatics. These developments and applications would facilitate biomedical research to make the vast amount of diversification data meaningful and usable.


2020 ◽  
Vol 245 ◽  
pp. 09011
Author(s):  
Michael Hildreth ◽  
Kenyi Paolo Hurtado Anampa ◽  
Cody Kankel ◽  
Scott Hampton ◽  
Paul Brenner ◽  
...  

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.


Author(s):  
Balraj Singh ◽  
Harsh K Verma

Background: Extreme growth of data necessitates the need for high-performance computing. MapReduce is among the most sought-after platform for processing large-scale data. Research work and analysis of the existing system has revealed its performance bottlenecks and areas of the concern. MapReduce suffers extremely from the problem of skew and load imbalance on its processing nodes. Objective: This paper proposes a novel technique for MapReduce to lower the skew on Map tasks and improve the load balance. It reduces the execution time of job by lowering the completion time of the slowest task. Method:Proposed method performs one-time settlement of load balancing among the Map tasks by analyzing the expected completion time of the Map tasks and redistributes the load. It uses intervals to migrate the overloaded or slows tasks and append them on the under loaded tasks or free slots. Result:Experiments reveal an improvement of up to 1.3x by implementing the proposed strategy and comparing it with the relevant techniques using different datasets. Conclusion:Significant improvement is observed in the performance as a result of lower completion time of a job. Proposed technique exhibits reduced amount of skew and a uniform distribution of load among Map nodes.


Sign in / Sign up

Export Citation Format

Share Document