Exploring the Future of Out-of-Core Computing with Compute-Local Non-Volatile Memory

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.

Download Full-text

Octopus + : An RDMA-Enabled Distributed Persistent Memory File System

ACM Transactions on Storage ◽

10.1145/3448418 ◽

2021 ◽

Vol 17 (3) ◽

pp. 1-25

Author(s):

Bohong Zhu ◽

Youmin Chen ◽

Qing Wang ◽

Youyou Lu ◽

Jiwu Shu

Keyword(s):

High Speed ◽

High Performance ◽

File System ◽

Direct Memory Access ◽

File Systems ◽

Distributed File Systems ◽

Persistent Memory ◽

Memory Modules ◽

Non Volatile Memory ◽

Volatile Memory

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.

Download Full-text

Efficient Graph Component Labeling on Hybrid CPU and GPU Platforms

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.596.276 ◽

2014 ◽

Vol 596 ◽

pp. 276-279

Author(s):

Xiao Hui Pan

Keyword(s):

High Performance ◽

General Purpose ◽

Gpu Programming ◽

Data Parallel ◽

Graphical Processing Units ◽

Architectural Features ◽

Graph Coloring Problem ◽

Graphical Processing ◽

And Performance ◽

Performance Results

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.

Download Full-text

RUMD: A general purpose molecular dynamics package optimized to utilize GPU hardware down to a few thousand particles

SciPost Physics ◽

10.21468/scipostphys.3.6.038 ◽

2017 ◽

Vol 3 (6) ◽

Cited By ~ 31

Author(s):

Nicholas Bailey ◽

Trond Ingebrigtsen ◽

Jesper Schmidt Hansen ◽

Arno Veldhorst ◽

Lasse Bøhling ◽

...

Keyword(s):

Molecular Dynamics ◽

High Performance ◽

General Purpose ◽

Graphical Processing Units ◽

Performance Benchmarks ◽

Graphical Processing ◽

And Performance ◽

Set Up ◽

The Many ◽

Many Core

RUMD is a general purpose, high-performance molecular dynamics (MD) simulation package running on graphical processing units (GPU’s). RUMD addresses the challenge of utilizing the many-core nature of modern GPU hardware when simulating small to medium system sizes (roughly from a few thousand up to hundred thousand particles). It has a performance that is comparable to other GPU-MD codes at large system sizes and substantially better at smaller sizes. RUMD is open-source and consists of a library written in C++ and the CUDA extension to C, an easy-to-use Python interface, and a set of tools for set-up and post-simulation data analysis. The paper describes RUMD’s main features, optimizations and performance benchmarks.

Download Full-text

Exploring Non-Volatility of Non-Volatile Memory for High Performance Computing Under Failures

2020 IEEE International Conference on Cluster Computing (CLUSTER) ◽

10.1109/cluster49012.2020.00034 ◽

2020 ◽

Author(s):

Jie Ren ◽

Kai Wu ◽

Dong Li

Keyword(s):

High Performance Computing ◽

High Performance ◽

Non Volatile Memory ◽

Volatile Memory ◽

Performance Computing

Download Full-text

Efektivitas alat tes servis bolavoli berbasis mikrokontroller

Jurnal SPORTIF Jurnal Penelitian Pembelajaran ◽

10.29407/js_unpgri.v6i2.14492 ◽

2020 ◽

Vol 6 (2) ◽

pp. 499-513

Author(s):

Giartama Giartama ◽

Destriani Destriani ◽

Waluyo Waluyo ◽

Muslimin Muslimin

Keyword(s):

Low Power ◽

High Performance ◽

Non Volatile Memory ◽

Volatile Memory ◽

Microcontroller Unit

Ilmu pengetahuan dengan cepat harus menyesuaikan dengan tuntutan zaman. Berbagai cabang olahraga telah menggunakan kemajuan teknologi sebagai penunjang kegiatan baik dalam pembelajaran ataupun saat latihan khususnya pada olahraga cabang permainan bolavoli. Penelitian ini bertujuan untuk menguji efektivitas alat tes servis bolavoli berbasis mikrokontroller yang terdiri dari komponen-komponen seperti high performance, low power avr® 8-bit microcontroller unit, advanced risc architecture, high endurance non-volatile memory segments, peripheral features, special microcontroller features, dan menggunakan perangkat yang lain agar dapat digunakan untuk mengukur tes servis bolavoli. Penelitian ini menggunakan metode penelitian kuantitatif. Instrumen tes yang digunakan berupa tes keterampilan servis bolavoli. Subjek dalam penelitian ini yaitu untuk kelas pemula subjek penelitian mahasiswa semester 2 yang bukan merupakan atlet bolavoli, kemudian pada mahasiswa yang ekstrakurikulernya bolavoli, dan kelompok ketiga pada mahasiswa yang termasuk pada atlet nasional dan daerah dengan jumlah subjek sebanyak 60 orang. Hasil dari penelitian ini didapatkan nilai keefektifan sebesar 99,04% dengan mengklasifikasikan subjek penelitian menjadi tiga tingkat yang berbeda. Berdasarkan hasil tersebut dapat disimpulkan bahwa alat tes servis bolavoli berbasis mikrokontroller ini efektif digunakan baik bagi pemula hingga atlet professional.

Download Full-text

Non-volatile memory host controller interface performance analysis in high-performance I/O systems

2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) ◽

10.1109/ispass.2015.7095793 ◽

2015 ◽

Cited By ~ 8

Author(s):

Amro Awad ◽

Brett Kettering ◽

Yan Solihin

Keyword(s):

Performance Analysis ◽

High Performance ◽

Non Volatile Memory ◽

Volatile Memory

Download Full-text

Non-volatile memory for fast, reliable file systems

ACM SIGPLAN Notices ◽

10.1145/143371.143380 ◽

1992 ◽

Vol 27 (9) ◽

pp. 10-22 ◽

Cited By ~ 6

Author(s):

Mary Baker ◽

Satoshi Asami ◽

Etienne Deprit ◽

John Ouseterhout ◽

Margo Seltzer

Keyword(s):

File Systems ◽

Non Volatile Memory ◽

Volatile Memory

Download Full-text

SIMinG-1k: A thousand-core simulator running on general-purpose graphical processing units

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.2940 ◽

2012 ◽

Vol 25 (10) ◽

pp. 1443-1461 ◽

Cited By ~ 2

Author(s):

Shivani Raghav ◽

Andrea Marongiu ◽

Christian Pinto ◽

Martino Ruggiero ◽

David Atienza ◽

...

Keyword(s):

General Purpose ◽

Graphical Processing Units ◽

Graphical Processing

Download Full-text

High-Performance Computing for Theoretical Study of Nanoscale and Molecular Interconnects

Nanotechnology ◽

10.4018/978-1-4666-5125-8.ch021 ◽

2014 ◽

pp. 513-532

Author(s):

Rasit O. Topaloglu ◽

Swati R. Manjari ◽

Saroj K. Nayak

Keyword(s):

Integrated Circuits ◽

High Performance Computing ◽

Real World ◽

High Performance ◽

Quantum Effects ◽

Accurate Analysis ◽

Graphical Processing Units ◽

Graphical Processing ◽

Silicon Based ◽

Performance Computing

Interconnects in semiconductor integrated circuits have shrunk to nanoscale sizes. This size reduction requires accurate analysis of the quantum effects. Furthermore, improved low-resistance interconnects need to be discovered that can integrate with biological and nanoelectronic systems. Accurate system-scale simulation of these quantum effects is possible with high-performance computing (HPC), while high cost and poor feasibility of experiments also suggest the application of simulation and HPC. This chapter introduces computational nanoelectronics, presenting real-world applications for the simulation and analysis of nanoscale and molecular interconnects, which may provide the connection between molecules and silicon-based devices. We survey computational nanoelectronics of interconnects and analyze four real-world case studies: 1) using graphical processing units (GPUs) for nanoelectronic simulations; 2) HPC simulations of current flow in nanotubes; 3) resistance analysis of molecular interconnects; and 4) electron transport improvement in graphene interconnects. In conclusion, HPC simulations are promising vehicles to advance interconnects and study their interactions with molecular/biological structures in support of traditional experimentation.

Download Full-text

Towards on high performance computing of medical imaging based on graphical processing units

2013 15th International Conference on Advanced Computing Technologies (ICACT) ◽

10.1109/icact.2013.6710504 ◽

2013 ◽

Author(s):

K. Suresh ◽

M. Rajasekhara Babu

Keyword(s):

Medical Imaging ◽

High Performance Computing ◽

High Performance ◽

Graphical Processing Units ◽

Graphical Processing ◽

Performance Computing

Download Full-text