scholarly journals Exploring the Future of Out-of-Core Computing with Compute-Local Non-Volatile Memory

2014 ◽  
Vol 22 (2) ◽  
pp. 125-139 ◽  
Author(s):  
Myoungsoo Jung ◽  
Ellis H. Wilson ◽  
Wonil Choi ◽  
John Shalf ◽  
Hasan Metin Aktulga ◽  
...  

Drawing parallels to the rise of general purpose graphical processing units (GPGPUs) as accelerators for specific high-performance computing (HPC) workloads, there is a rise in the use of non-volatile memory (NVM) as accelerators for I/O-intensive scientific applications. However, existing works have explored use of NVM within dedicated I/O nodes, which are distant from the compute nodes that actually need such acceleration. As NVM bandwidth begins to out-pace point-to-point network capacity, we argue for the need to break from the archetype of completely separated storage. Therefore, in this work we investigate co-location of NVM and compute by varying I/O interfaces, file systems, types of NVM, and both current and future SSD architectures, uncovering numerous bottlenecks implicit in these various levels in the I/O stack. We present novel hardware and software solutions, including the new Unified File System (UFS), to enable fuller utilization of the new compute-local NVM storage. Our experimental evaluation, which employs a real-world Out-of-Core (OoC) HPC application, demonstrates throughput increases in excess of an order of magnitude over current approaches.

2021 ◽  
Vol 17 (3) ◽  
pp. 1-25
Author(s):  
Bohong Zhu ◽  
Youmin Chen ◽  
Qing Wang ◽  
Youyou Lu ◽  
Jiwu Shu

Non-volatile memory and remote direct memory access (RDMA) provide extremely high performance in storage and network hardware. However, existing distributed file systems strictly isolate file system and network layers, and the heavy layered software designs leave high-speed hardware under-exploited. In this article, we propose an RDMA-enabled distributed persistent memory file system, Octopus + , to redesign file system internal mechanisms by closely coupling non-volatile memory and RDMA features. For data operations, Octopus + directly accesses a shared persistent memory pool to reduce memory copying overhead, and actively fetches and pushes data all in clients to rebalance the load between the server and network. For metadata operations, Octopus + introduces self-identified remote procedure calls for immediate notification between file systems and networking, and an efficient distributed transaction mechanism for consistency. Octopus + is enabled with replication feature to provide better availability. Evaluations on Intel Optane DC Persistent Memory Modules show that Octopus + achieves nearly the raw bandwidth for large I/Os and orders of magnitude better performance than existing distributed file systems.


2014 ◽  
Vol 596 ◽  
pp. 276-279
Author(s):  
Xiao Hui Pan

Graph component labeling, which is a subset of the general graph coloring problem, is a computationally expensive operation in many important applications and simulations. A number of data-parallel algorithmic variations to the component labeling problem are possible and we explore their use with general purpose graphical processing units (GPGPUs) and with the CUDA GPU programming language. We discuss implementation issues and performance results on CPUs and GPUs using CUDA. We evaluated our system with real-world graphs. We show how to consider different architectural features of the GPU and the host CPUs and achieve high performance.


2017 ◽  
Vol 3 (6) ◽  
Author(s):  
Nicholas Bailey ◽  
Trond Ingebrigtsen ◽  
Jesper Schmidt Hansen ◽  
Arno Veldhorst ◽  
Lasse Bøhling ◽  
...  

RUMD is a general purpose, high-performance molecular dynamics (MD) simulation package running on graphical processing units (GPU’s). RUMD addresses the challenge of utilizing the many-core nature of modern GPU hardware when simulating small to medium system sizes (roughly from a few thousand up to hundred thousand particles). It has a performance that is comparable to other GPU-MD codes at large system sizes and substantially better at smaller sizes. RUMD is open-source and consists of a library written in C++ and the CUDA extension to C, an easy-to-use Python interface, and a set of tools for set-up and post-simulation data analysis. The paper describes RUMD’s main features, optimizations and performance benchmarks.


2020 ◽  
Vol 6 (2) ◽  
pp. 499-513
Author(s):  
Giartama Giartama ◽  
Destriani Destriani ◽  
Waluyo Waluyo ◽  
Muslimin Muslimin

Ilmu pengetahuan dengan cepat harus menyesuaikan dengan tuntutan zaman. Berbagai cabang olahraga telah menggunakan kemajuan teknologi sebagai penunjang kegiatan baik dalam pembelajaran ataupun saat latihan khususnya pada olahraga cabang permainan bolavoli. Penelitian ini bertujuan untuk menguji efektivitas alat tes servis bolavoli berbasis mikrokontroller yang terdiri dari komponen-komponen seperti high performance, low power avr® 8-bit microcontroller unit, advanced risc architecture, high endurance non-volatile memory segments, peripheral features, special microcontroller features, dan menggunakan perangkat yang lain agar dapat digunakan untuk mengukur tes servis bolavoli. Penelitian ini menggunakan metode penelitian kuantitatif. Instrumen tes yang digunakan berupa tes keterampilan servis bolavoli. Subjek dalam penelitian ini yaitu untuk kelas pemula subjek penelitian mahasiswa semester 2 yang bukan merupakan atlet bolavoli, kemudian pada mahasiswa yang ekstrakurikulernya bolavoli, dan kelompok ketiga pada mahasiswa yang termasuk pada atlet nasional dan daerah dengan jumlah subjek sebanyak 60 orang. Hasil dari penelitian ini didapatkan nilai keefektifan sebesar 99,04% dengan mengklasifikasikan subjek penelitian menjadi tiga tingkat yang berbeda. Berdasarkan hasil tersebut dapat disimpulkan bahwa alat tes servis bolavoli berbasis mikrokontroller ini efektif digunakan baik bagi pemula hingga atlet professional.


1992 ◽  
Vol 27 (9) ◽  
pp. 10-22 ◽  
Author(s):  
Mary Baker ◽  
Satoshi Asami ◽  
Etienne Deprit ◽  
John Ouseterhout ◽  
Margo Seltzer

2012 ◽  
Vol 25 (10) ◽  
pp. 1443-1461 ◽  
Author(s):  
Shivani Raghav ◽  
Andrea Marongiu ◽  
Christian Pinto ◽  
Martino Ruggiero ◽  
David Atienza ◽  
...  

2014 ◽  
pp. 513-532
Author(s):  
Rasit O. Topaloglu ◽  
Swati R. Manjari ◽  
Saroj K. Nayak

Interconnects in semiconductor integrated circuits have shrunk to nanoscale sizes. This size reduction requires accurate analysis of the quantum effects. Furthermore, improved low-resistance interconnects need to be discovered that can integrate with biological and nanoelectronic systems. Accurate system-scale simulation of these quantum effects is possible with high-performance computing (HPC), while high cost and poor feasibility of experiments also suggest the application of simulation and HPC. This chapter introduces computational nanoelectronics, presenting real-world applications for the simulation and analysis of nanoscale and molecular interconnects, which may provide the connection between molecules and silicon-based devices. We survey computational nanoelectronics of interconnects and analyze four real-world case studies: 1) using graphical processing units (GPUs) for nanoelectronic simulations; 2) HPC simulations of current flow in nanotubes; 3) resistance analysis of molecular interconnects; and 4) electron transport improvement in graphene interconnects. In conclusion, HPC simulations are promising vehicles to advance interconnects and study their interactions with molecular/biological structures in support of traditional experimentation.


Sign in / Sign up

Export Citation Format

Share Document