An Evaluation of an Integrated On-Chip/Off-Chip Network for High-Performance Reconfigurable Computing

As the number of cores per discrete integrated circuit (IC) device grows, the importance of the network on chip (NoC) increases. However, the body of research in this area has focused on discrete IC devices alone which may or may not serve the high-performance computing community which needs to assemble many of these devices into very large scale, parallel computing machines. This paper describes an integrated on-chip/off-chip network that has been implemented on an all-FPGA computing cluster. The system supports MPI-style point-to-point messages, collectives, and other novel communication. Results include the resource utilization and performance (in latency and bandwidth).

Download Full-text

User Steering Support in Large-scale Workflows

10.5753/sbbd_estendido.2021.18185 ◽

2021 ◽

Author(s):

Renan Souza ◽

Marta Mattoso ◽

Patrick Valduriez

Keyword(s):

Performance Indicators ◽

High Performance ◽

Large Scale ◽

Provenance Data ◽

Management Concepts ◽

Data Files ◽

The Impact ◽

Computing Machines ◽

Performance Computing ◽

Fine Tune

Large-scale workflows that execute on High-Performance Computing machines need to be dynamically steered by users. This means that users analyze big data files, assess key performance indicators, fine-tune parameters, and evaluate the tuning impacts while the workflows generate multiple files, which is challenging. If one does not keep track of such interactions (called user steering actions), it may be impossible to understand the consequences of steering actions and to reproduce the results. This thesis proposes a generic approach to enable tracking user steering actions by characterizing, capturing, relating, and analyzing them by leveraging provenance data management concepts. Experiments with real users show that the approach enabled the understanding of the impact of steering actions while incurring negligible overhead.

Download Full-text

Revisiting the High-Performance Reconfigurable Computing for Future Datacenters

Future Internet ◽

10.3390/fi12040064 ◽

2020 ◽

Vol 12 (4) ◽

pp. 64 ◽

Cited By ~ 2

Author(s):

Qaiser Ijaz ◽

El-Bay Bourennane ◽

Ali Kashif Bashir ◽

Hira Asghar

Keyword(s):

Reconfigurable Computing ◽

High Performance ◽

Large Scale ◽

Open Problems ◽

Communication Architecture ◽

Field Programmable ◽

Large Scale Integration ◽

On Chip ◽

Scale Integration ◽

Standard Terms

Modern datacenters are reinforcing the computational power and energy efficiency by assimilating field programmable gate arrays (FPGAs). The sustainability of this large-scale integration depends on enabling multi-tenant FPGAs. This requisite amplifies the importance of communication architecture and virtualization method with the required features in order to meet the high-end objective. Consequently, in the last decade, academia and industry proposed several virtualization techniques and hardware architectures for addressing resource management, scheduling, adoptability, segregation, scalability, performance-overhead, availability, programmability, time-to-market, security, and mainly, multitenancy. This paper provides an extensive survey covering three important aspects—discussion on non-standard terms used in existing literature, network-on-chip evaluation choices as a mean to explore the communication architecture, and virtualization methods under latest classification. The purpose is to emphasize the importance of choosing appropriate communication architecture, virtualization technique and standard language to evolve the multi-tenant FPGAs in datacenters. None of the previous surveys encapsulated these aspects in one writing. Open problems are indicated for scientific community as well.

Download Full-text

Adverse Drug Reaction Prediction Using Scores Produced by Large-Scale Drug-Protein Target Docking on High-Performance Computing Machines

PLoS ONE ◽

10.1371/journal.pone.0106298 ◽

2014 ◽

Vol 9 (9) ◽

pp. e106298 ◽

Cited By ~ 40

Author(s):

Montiago X. LaBute ◽

Xiaohua Zhang ◽

Jason Lenderman ◽

Brian J. Bennion ◽

Sergio E. Wong ◽

...

Keyword(s):

Adverse Drug Reaction ◽

Drug Reaction ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Reaction Prediction ◽

Protein Target ◽

Computing Machines ◽

Performance Computing

Download Full-text

PCJ Java library as a solution to integrate HPC, Big Data and Artificial Intelligence workloads

Journal Of Big Data ◽

10.1186/s40537-021-00454-6 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Marek Nowicki ◽

Łukasz Górski ◽

Piotr Bała

Keyword(s):

Artificial Intelligence ◽

Big Data ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

And Performance ◽

Performance Results ◽

Computational Systems ◽

Performance Computing ◽

Java Library

AbstractWith the development of peta- and exascale size computational systems there is growing interest in running Big Data and Artificial Intelligence (AI) applications on them. Big Data and AI applications are implemented in Java, Scala, Python and other languages that are not widely used in High-Performance Computing (HPC) which is still dominated by C and Fortran. Moreover, they are based on dedicated environments such as Hadoop or Spark which are difficult to integrate with the traditional HPC management systems. We have developed the Parallel Computing in Java (PCJ) library, a tool for scalable high-performance computing and Big Data processing in Java. In this paper, we present the basic functionality of the PCJ library with examples of highly scalable applications running on the large resources. The performance results are presented for different classes of applications including traditional computational intensive (HPC) workloads (e.g. stencil), as well as communication-intensive algorithms such as Fast Fourier Transform (FFT). We present implementation details and performance results for Big Data type processing running on petascale size systems. The examples of large scale AI workloads parallelized using PCJ are presented.

Download Full-text

Low-Process–Voltage–Temperature-Sensitivity Multi-Stage Timing Monitor for System-on-Chip Applications

Electronics ◽

10.3390/electronics10131587 ◽

2021 ◽

Vol 10 (13) ◽

pp. 1587

Author(s):

Duo Sheng ◽

Hsueh-Ru Lin ◽

Li Tai

Keyword(s):

High Performance ◽

Power Reduction ◽

System On Chip ◽

Timing Information ◽

Multi Stage ◽

Dynamic Voltage ◽

And Performance ◽

On Chip ◽

Maximum Measurement ◽

Maximum Measurement Error

High performance and complex system-on-chip (SoC) design require a throughput and stable timing monitor to reduce the impacts of uncertain timing and implement the dynamic voltage and frequency scaling (DVFS) scheme for overall power reduction. This paper presents a multi-stage timing monitor, combining three timing-monitoring stages to achieve a high timing-monitoring resolution and a wide timing-monitoring range simultaneously. Additionally, because the proposed timing monitor has high immunity to the process–voltage–temperature (PVT) variation, it provides a more stable time-monitoring results. The time-monitoring resolution and range of the proposed timing monitor are 47 ps and 2.2 µs, respectively, and the maximum measurement error is 0.06%. Therefore, the proposed multi-stage timing monitor provides not only the timing information of the specified signals to maintain the functionality and performance of the SoC, but also makes the operation of the DVFS scheme more efficient and accurate in SoC design.

Download Full-text

Towards A Multi-FPGA Infrared Simulator

The Journal of Defense Modeling and Simulation Applications Methodology Technology ◽

10.1177/154851290700400404 ◽

2007 ◽

Vol 4 (4) ◽

pp. 343-355 ◽

Cited By ~ 1

Author(s):

Vinay Sriram ◽

David Kearney

Keyword(s):

Homeland Security ◽

Reconfigurable Computing ◽

High Speed ◽

High Performance ◽

Large Scale ◽

Computation Time ◽

Ccd Camera ◽

Hardware Acceleration ◽

Limiting Factor ◽

Scene Simulation

High speed infrared (IR) scene simulation is used extensively in defense and homeland security to test sensitivity of IR cameras and accuracy of IR threat detection and tracking algorithms used commonly in IR missile approach warning systems (MAWS). A typical MAWS requires an input scene rate of over 100 scenes/second. Infrared scene simulations typically take 32 minutes to simulate a single IR scene that accounts for effects of atmospheric turbulence, refraction, optical blurring and charge-coupled device (CCD) camera electronic noise on a Pentium 4 (2.8GHz) dual core processor [7]. Thus, in IR scene simulation, the processing power of modern computers is a limiting factor. In this paper we report our research to accelerate IR scene simulation using high performance reconfigurable computing. We constructed a multi Field Programmable Gate Array (FPGA) hardware acceleration platform and accelerated a key computationally intensive IR algorithm over the hardware acceleration platform. We were successful in reducing the computation time of IR scene simulation by over 36%. This research acts as a unique case study for accelerating large scale defense simulations using a high performance multi-FPGA reconfigurable computer.

Download Full-text

Merging Plasmonics and Silicon Photonics Towards Greener and Faster “Network-on-Chip” Solutions for Data Centers and High-Performance Computing Systems

Plasmonics - Principles and Applications ◽

10.5772/51853 ◽

2012 ◽

Cited By ~ 3

Author(s):

Sotirios Papaioannou ◽

Konstantinos Vyrsokinos ◽

Dimitrios Kalavrouziotis ◽

Giannis Giannoulis ◽

Dimitrios Apostolopoulos ◽

...

Keyword(s):

High Performance Computing ◽

Silicon Photonics ◽

High Performance ◽

Data Centers ◽

Network On Chip ◽

Computing Systems ◽

On Chip ◽

Performance Computing

Download Full-text

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3219819.3219927 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alex Gittens ◽

Kai Rothauge ◽

Shusen Wang ◽

Michael W. Mahoney ◽

Lisa Gerhardt ◽

...

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

Fast Playback Framework for Analysis of Ground-Based Doppler Radar Observations Using MapReduce Technology

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-15-0118.1 ◽

2016 ◽

Vol 33 (4) ◽

pp. 621-634 ◽

Cited By ~ 4

Author(s):

Jingyin Tang ◽

Corene J. Matyas

Keyword(s):

Spatial Analysis ◽

High Performance ◽

Large Scale ◽

Doppler Radar ◽

Weather Events ◽

Data Architecture ◽

Research Grade ◽

High Performance Computing Cluster ◽

Time Systems ◽

Performance Computing

AbstractThe creation of a 3D mosaic is often the first step when using the high-spatial- and temporal-resolution data produced by ground-based radars. Efficient yet accurate methods are needed to mosaic data from dozens of radar to better understand the precipitation processes in synoptic-scale systems such as tropical cyclones. Research-grade radar mosaic methods of analyzing historical weather events should utilize data from both sides of a moving temporal window and process them in a flexible data architecture that is not available in most stand-alone software tools or real-time systems. Thus, these historical analyses require a different strategy for optimizing flexibility and scalability by removing time constraints from the design. This paper presents a MapReduce-based playback framework using Apache Spark’s computational engine to interpolate large volumes of radar reflectivity and velocity data onto 3D grids. Designed as being friendly to use on a high-performance computing cluster, these methods may also be executed on a low-end configured machine. A protocol is designed to enable interoperability with GIS and spatial analysis functions in this framework. Open-source software is utilized to enhance radar usability in the nonspecialist community. Case studies during a tropical cyclone landfall shows this framework’s capability of efficiently creating a large-scale high-resolution 3D radar mosaic with the integration of GIS functions for spatial analysis.

Download Full-text

High-Performance Computing Framework Based on Distributed Systems for Large-Scale Neurophysiological Data

10.21203/rs.3.rs-136986/v1 ◽

2021 ◽

Author(s):

Mohsen Hadianpour ◽

Ehsan Rezayat ◽

Mohammad-Reza Dehaqani

Keyword(s):

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Electrophysiological Recording ◽

Neural Data ◽

Data Framework ◽

Neurophysiological Data ◽

Computing Framework ◽

Performance Computing ◽

Neuroscience Community

Abstract Due to the significantly drastic progress and improvement in neurophysiological recording technologies, neuroscientists have faced various complexities dealing with unstructured large-scale neural data. In the neuroscience community, these complexities could create serious bottlenecks in storing, sharing, and processing neural datasets. In this article, we developed a distributed high-performance computing (HPC) framework called `Big neuronal data framework' (BNDF), to overcome these complexities. BNDF is based on open-source big data frameworks, Hadoop and Spark providing a flexible and scalable structure. We examined BNDF on three different large-scale electrophysiological recording datasets from nonhuman primate’s brains. Our results exhibited faster runtimes with scalability due to the distributed nature of BNDF. We compared BNDF results to a widely used platform like MATLAB in an equitable computational resource. Compared with other similar methods, using BNDF provides more than five times faster performance in spike sorting as a usual neuroscience application.

Download Full-text