On combining GUI desktop GIS with computer clusters & cloud resources, the role of programming skills and the state of the art in GUI driven GIS HPC applications

Mapping Intimacies ◽

10.5194/egusphere-egu21-16185 ◽

2021 ◽

Author(s):

Sebastian M. Ernst

Keyword(s):

Software Development ◽

Graphics Processing Units ◽

High Performance ◽

State Of The Art ◽

Scientific Work ◽

Practical Experience ◽

The State ◽

Entry Barrier ◽

Software Libraries ◽

Cloud Resources

The Free and Open Source Software (FOSS) ecosystem around Geographic Information System (GIS) is currently seeing rapid growth &#8211; similar to FOSS ecosystems in other scientific disciplines. At the same time, the need of broad programming and software development skills appears to become a common theme for potential (scientific) users. There is a rather clear boundary between what can be done with Graphical User Interface applications such as QGIS only on the one hand side and contemporary software libraries on the other hand side &#8211; if one actually has the required skillet to use the latter. Practical experience shows that more and more types of research require far more than just rudimentary software development skills. Those can be hard to acquire and distract from the actual scientific work at hand. For instance the installation, integration and deployment of much desired software libraries from the field of high-performance computing (HPC) for e.g. general-purpose computing on graphics processing units (GPGPU) or computations on clusters or cloud resources is very often becoming an obstacle on its own. Recent advances in packaging and deployment systems around popular programming language ecosystems such as Python enable a new kind of thinking, however. Desktop GUI applications can now much more easily be combined with the mentioned type of libraries, which lowers the entry barrier to HPC applications and the handling of large quantities of data drastically. This work aims at providing an overview of the state of the art in this field and showcasing possible techniques.

Download Full-text

Advancing the state of the art in high-performance logic and array technology

IBM Journal of Research and Development ◽

10.1147/rd.365.0821 ◽

1992 ◽

Vol 36 (5) ◽

pp. 821-828 ◽

Cited By ~ 9

Author(s):

K. H. Brown ◽

D. A. Grose ◽

R. C. Lange ◽

T. H. Ning ◽

P. A. Totta

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Array Technology

Download Full-text

BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

ACM Transactions on Reconfigurable Technology and Systems ◽

10.1145/3467476 ◽

2021 ◽

Vol 14 (4) ◽

pp. 1-28

Author(s):

Tao Yang ◽

Zhezhi He ◽

Tengchuan Kou ◽

Qingzheng Li ◽

Qi Han ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Optimization Approach ◽

Quantization Scheme ◽

Model Accuracy ◽

Sparsity Pattern ◽

Computing Platform ◽

Energy Efficiency Improvement ◽

Mixed Precision

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd. In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.

Download Full-text

State-of-the-art in Heterogeneous Computing

Scientific Programming ◽

10.1155/2010/540159 ◽

2010 ◽

Vol 18 (1) ◽

pp. 1-33 ◽

Cited By ~ 96

Author(s):

Andre R. Brodtkorb ◽

Christopher Dyken ◽

Trond R. Hagen ◽

Jon M. Hjelmervik ◽

Olaf O. Storaasli

Keyword(s):

Graphics Processing Units ◽

High Performance ◽

Heterogeneous Computing ◽

State Of The Art ◽

Peak Performance ◽

Fine Grained ◽

Field Programmable ◽

Programmable Gate Arrays ◽

Cost Efficient ◽

Graphics Processing

Node level heterogeneous architectures have become attractive during the last decade for several reasons: compared to traditional symmetric CPUs, they offer high peak performance and are energy and/or cost efficient. With the increase of fine-grained parallelism in high-performance computing, as well as the introduction of parallelism in workstations, there is an acute need for a good overview and understanding of these architectures. We give an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs). We present a review of hardware, available software tools, and an overview of state-of-the-art techniques and algorithms. Furthermore, we present a qualitative and quantitative comparison of the architectures, and give our view on the future of heterogeneous computing.

Download Full-text

Resilient gossip-inspired all-reduce algorithms for high-performance computing: Potential, limitations, and open questions

The International Journal of High Performance Computing Applications ◽

10.1177/1094342018762531 ◽

2018 ◽

Vol 33 (2) ◽

pp. 366-383

Author(s):

Marc Casas ◽

Wilfried N Gansterer ◽

Elias Wimmer

Keyword(s):

Fault Tolerance ◽

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

The State ◽

Reduction Algorithm ◽

Data Corruption ◽

Parallel Reduction ◽

Open Questions ◽

Performance Computing

We investigate the usefulness of gossip-based reduction algorithms in a high-performance computing (HPC) context. We compare them to state-of-the-art deterministic parallel reduction algorithms in terms of fault tolerance and resilience against silent data corruption (SDC) as well as in terms of performance and scalability. New gossip-based reduction algorithms are proposed, which significantly improve the state-of-the-art in terms of resilience against SDC. Moreover, a new gossip-inspired reduction algorithm is proposed, which promises a much more competitive runtime performance in an HPC context than classical gossip-based algorithms, in particular for low accuracy requirements.

Download Full-text

Fast Infrared Radiative Transfer Calculations Using Graphics Processing Units: JURASSIC-GPU v2.0

10.5194/gmd-2021-203 ◽

2021 ◽

Author(s):

Paul F. Baumeister ◽

Lars Hoffmann

Keyword(s):

Radiative Transfer ◽

Data Processing ◽

Graphics Processing Units ◽

High Performance ◽

State Of The Art ◽

Computing Time ◽

Global Scale ◽

Computing Systems ◽

Mid Infrared ◽

Infrared Spectral

Abstract. Remote sensing observations in the mid-infrared spectral region (4–15 μm) play a key role in monitoring the composition of the Earth's atmosphere. Mid-infrared spectral measurements from satellite, aircraft, balloon and ground-based instruments provide information on pressure and temperature, trace gases as well as aerosols and clouds. As state-of-the-art instruments deliver a vast amount of data on a global scale, their analysis, however, may require advanced methods and high-performance computing capacities for data processing. A large amount of computing time is usually spent on evaluating the radiative transfer equation. Line-by-line calculations of infrared radiative transfer are considered to be most accurate, but they are also most time-consuming. Here, we discuss the emissivity growth approximation (EGA), which can accelerate infrared radiative transfer calculations by several orders of magnitude compared with line-by-line calculations. As future satellite missions will likely depend on Exascale computing systems to process their observational data in due time, we think that the utilization of graphical processing units (GPUs) for the radiative transfer calculations and satellite retrievals is a logical next step in further accelerating and improving the efficiency of data processing. Focusing on the EGA method, we first discuss the implementation of infrared radiative transfer calculations on GPU-based computing systems in detail. Second, we discuss distinct features of our implementation of the EGA method, in particular regarding the memory needs, performance, and scalability on state-of-the-art GPU systems. As we found our implementation to perform about an order of magnitude more energy-efficient on GPU-accelerated architectures compared to CPU, we conclude that our approach provides various future opportunities for this high-throughput problem.

Download Full-text

Ultra-high-performance microscope objectives: the state of the art in design, manufacturing, and testing

10.1117/12.692202 ◽

2007 ◽

Author(s):

Thomas Sure ◽

Lambert Danner ◽

Peter Euteneuer ◽

Gerhard Hoppen ◽

Armin Pausch ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State

Download Full-text

Hardware and Software Aspects of VM-Based Mobile-Cloud Offloading

Advances in Wireless Technologies and Telecommunication - Enabling Real-Time Mobile Cloud Computing through Emerging Technologies ◽

10.4018/978-1-4666-8662-5.ch008 ◽

2015 ◽

pp. 247-271 ◽

Cited By ~ 5

Author(s):

Yang Song ◽

Haoliang Wang ◽

Tolga Soyata

Keyword(s):

Mobile Devices ◽

State Of The Art ◽

The State ◽

Mobile Cloud ◽

Architectural Support ◽

Advantages And Disadvantages ◽

Support Resource ◽

Cloud Resources

To allow mobile devices to support resource intensive applications beyond their capabilities, mobile-cloud offloading is introduced to extend the resources of mobile devices by leveraging cloud resources. In this chapter, we will survey the state-of-the-art in VM-based mobile-cloud offloading techniques including their software and architectural aspects in detail. For the software aspects, we will provide the current improvements to different layers of various virtualization systems, particularly focusing on mobile-cloud offloading. Approaches at different offloading granularities will be reviewed and their advantages and disadvantages will be discussed. For the architectural support aspects of the virtualization, three platforms including Intel x86, ARM and NVidia GPUs will be reviewed in terms of their special architectural designs to accommodate virtualization and VM-based offloading.

Download Full-text

The State-of-the-Art Trends in Education Strategy for Sustainable Development of the High Performance Computing Ecosystem

Communications in Computer and Information Science - Supercomputing ◽

10.1007/978-3-319-71255-0_40 ◽

2017 ◽

pp. 494-504 ◽

Cited By ~ 1

Author(s):

Sergey Mosin

Keyword(s):

Sustainable Development ◽

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

The State ◽

Education Strategy ◽

Performance Computing

Download Full-text

Efficient Hardware Implementations of Binary-to-BCD Conversion Schemes for Decimal Multiplication

Journal of Circuits System and Computers ◽

10.1142/s021812661550019x ◽

2014 ◽

Vol 24 (02) ◽

pp. 1550019

Author(s):

Osama Al-Khaleel ◽

Zakaria Al-Qudah ◽

Mohammad Al-Khaleel ◽

Raed Bani-Hani ◽

Christos Papachristou ◽

...

Keyword(s):

High Performance ◽

State Of The Art ◽

The State ◽

Partial Product ◽

Hardware Implementations ◽

Array Multipliers ◽

Decimal Multiplication ◽

Multiplier Circuit

This paper proposes two high performance binary-to-binary coded decimal (BCD) conversion algorithms for use in BCD multiplication. These algorithms are based on splitting the 7-bit binary partial product of two BCD digits into two groups, computing the contribution of each group to the equivalent BCD partial product, and adding these contributions to compute the final BCD partial product. Designs for the proposed architectures and their implementations targeting both ASIC and FPGA are compared with others. Implementations of BCD array multipliers using both our conversion circuits and existing conversion circuits have been performed. The synthesis results for both ASIC and FPGA show that the proposed designs are faster and occupying less area than the state-of-the-art conversion circuits. Furthermore, the results obtained from comparing BCD multipliers of various sizes show that the enhancement in the area of the conversion circuit grows into a sizable area improvement in the multiplier circuit.

Download Full-text

The state-of-the-art mobility enhancing schemes for high-performance logic CMOS technologies

2008 9th International Conference on Solid-State and Integrated-Circuit Technology ◽

10.1109/icsict.2008.4734481 ◽

2008 ◽

Author(s):

Steve S. Chung

Keyword(s):

High Performance ◽

State Of The Art ◽

The State

Download Full-text