specialized hardware Latest Research Papers

Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3487922 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-26

Author(s):

Dennis Rieber ◽

Axel Acosta ◽

Holger Fröning

Keyword(s):

Constraint Satisfaction Problem ◽

Search Space ◽

Program Transformations ◽

Data Layout ◽

New Approach ◽

Deployment Strategy ◽

Joint Program ◽

Reference Implementation ◽

And Performance ◽

Specialized Hardware

The success of Deep Artificial Neural Networks (DNNs) in many domains created a rich body of research concerned with hardware accelerators for compute-intensive DNN operators. However, implementing such operators efficiently with complex hardware intrinsics such as matrix multiply is a task not yet automated gracefully. Solving this task often requires joint program and data layout transformations. First solutions to this problem have been proposed, such as TVM, UNIT, or ISAMIR, which work on a loop-level representation of operators and specify data layout and possible program transformations before the embedding into the operator is performed. This top-down approach creates a tension between exploration range and search space complexity, especially when also exploring data layout transformations such as im2col, channel packing, or padding. In this work, we propose a new approach to this problem. We created a bottom-up method that allows the joint transformation of both computation and data layout based on the found embedding. By formulating the embedding as a constraint satisfaction problem over the scalar dataflow, every possible embedding solution is contained in the search space. Adding additional constraints and optimization targets to the solver generates the subset of preferable solutions. An evaluation using the VTA hardware accelerator with the Baidu DeepBench inference benchmark shows that our approach can automatically generate code competitive to reference implementations. Further, we show that dynamically determining the data layout based on intrinsic and workload is beneficial for hardware utilization and performance. In cases where the reference implementation has low hardware utilization due to its fixed deployment strategy, we achieve a geomean speedup of up to × 2.813, while individual operators can improve as much as × 170.

Using Your Beam Efficiently: Reducing Electron Dose in the STEM via Flyback Compensation

Microscopy and Microanalysis ◽

10.1017/s1431927621013908 ◽

2022 ◽

pp. 1-9

Author(s):

Tiarnan Mullarkey ◽

Jonathan J. P. Peters ◽

Clive Downing ◽

Lewys Jones

Keyword(s):

Electron Beam ◽

Strain Measurement ◽

Scanning Transmission Electron Microscope ◽

Absolute Minimum ◽

Electron Dose ◽

Transmission Electron ◽

Scanning Transmission ◽

Dose Efficiency ◽

Specialized Hardware ◽

Beam Damage

In the scanning transmission electron microscope, fast-scanning and frame-averaging are two widely used approaches for reducing electron-beam damage and increasing image signal noise ratio which require no additional specialized hardware. Unfortunately, for scans with short pixel dwell-times (less than 5 μs), line flyback time represents an increasingly wasteful overhead. Although beam exposure during flyback causes damage while yielding no useful information, scan coil hysteresis means that eliminating it entirely leads to unacceptably distorted images. In this work, we reduce this flyback to an absolute minimum by calibrating and correcting for this hysteresis in postprocessing. Substantial improvements in dose efficiency can be realized (up to 20%), while crystallographic and spatial fidelity is maintained for displacement/strain measurement.

Searching for memory-lighter architectures for OCR-augmented image captioning

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-219230 ◽

2021 ◽

pp. 1-12

Author(s):

Rafael Gallardo García ◽

Beatriz Beltrán Martínez ◽

Carlos Hernández Gracidas ◽

Darnes Vilariño Ayala

Keyword(s):

State Of The Art ◽

Image Captioning ◽

Baseline Model ◽

Test Set ◽

Current State ◽

Processing Power ◽

Test Sets ◽

The One ◽

Specialized Hardware ◽

High Processing

Current State-of-the-Art image captioning systems that can read and integrate read text into the generated descriptions need high processing power and memory usage, which limits the sustainability and usability of the models (as they require expensive and very specialized hardware). The present work introduces two alternative versions (L-M4C and L-CNMT) of top architectures (on the TextCaps challenge), which were mainly adapted to achieve near-State-of-The-Art performance while being memory-lighter when compared to the original architectures, this is mainly achieved by using distilled or smaller pre-trained models on the text-and-OCR embedding modules. On the one hand, a distilled version of BERT was used in order to reduce the size of the text-embedding module (the distilled model has 59% fewer parameters), on the other hand, the OCR context processor on both architectures was replaced by Global Vectors (GloVe), instead of using FastText pre-trained vectors, this can reduce the memory used by the OCR-embedding module up to a 94% . Two of the three models presented in this work surpassed the baseline (M4C-Captioner) of the challenge on the evaluation and test sets, also, our best lighter architecture reached a CIDEr score of 88.24 on the test set, which is 7.25 points above the baseline model.

Deuterium labeling enables non-invasive 3D proton MR imaging of glucose and neurotransmitter metabolism in the human brain

10.21203/rs.3.rs-1027370/v1 ◽

2021 ◽

Author(s):

Petr Bednarik ◽

Dario Goranovic ◽

Alena Svátková ◽

Fabian Niess ◽

Lukas Hingerl ◽

...

Keyword(s):

Glucose Metabolism ◽

Brain Diseases ◽

Gamma Aminobutyric Acid ◽

Deuterium Labeling ◽

Non Invasive ◽

Positron Emission ◽

Novel Method ◽

Specialized Hardware ◽

First Time ◽

Higher Sensitivity

Abstract Impaired brain glucose metabolism characterizes most severe brain diseases. Recent studies have proposed deuterium (2H)-Magnetic Resonance Spectroscopic Imaging (MRSI) as a reliable, non-invasive, and safe method to quantify the human metabolism of 2H-labeled substrates such as glucose and their downstream metabolism (e.g., aerobic/anaerobic glucose utilization and neurotransmitter synthesis) and address the major drawbacks of positron emission tomography (PET) or carbon (13C)-MRS. Here, for the first time, we show an indirect dynamic proton (1H)-MRSI technique in humans, which overcomes four critical 2H-MRSI limitations. Our innovative approach provides higher sensitivity with improved spatial/temporal resolution and higher chemical specificity to differentiate glutamate (Glu4), glutamine (Gln4), and gamma-aminobutyric acid (GABA2) deuterated at specific molecular positions while allowing simultaneous mapping of both labeled and unlabeled metabolites without the need for specialized hardware. Our novel method demonstrated significant Glu4, Gln4, and GABA2 decreases, with 18% faster Glu4 reduction in the gray matter than white matter after ingestion of deuterated glucose. Thus, robustly detected downstream glucose metabolism utilizing clinically available MR hardware without the need for radioactive tracers and PET.

Robust, fiducial-free drift correction for super-resolution imaging

Scientific Reports ◽

10.1038/s41598-021-02850-7 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Michael J. Wester ◽

David J. Schodt ◽

Hanieh Mazloom-Farsibaf ◽

Mohamadreza Fazel ◽

Sandeep Pallikkuth ◽

...

Keyword(s):

Single Molecule ◽

Super Resolution ◽

Processing Algorithm ◽

Post Processing ◽

3D Registration ◽

Drift Correction ◽

Registration Errors ◽

Time Period ◽

Resolution Imaging ◽

Specialized Hardware

AbstractWe describe a robust, fiducial-free method of drift correction for use in single molecule localization-based super-resolution methods. The method combines periodic 3D registration of the sample using brightfield images with a fast post-processing algorithm that corrects residual registration errors and drift between registration events. The method is robust to low numbers of collected localizations, requires no specialized hardware, and provides stability and drift correction for an indefinite time period.

Composite Enclaves: Towards Disaggregated Trusted Execution

IACR Transactions on Cryptographic Hardware and Embedded Systems ◽

10.46586/tches.v2022.i1.630-656 ◽

2021 ◽

pp. 630-656

Author(s):

Moritz Schneider ◽

Aritra Dhar ◽

Ivan Puddu ◽

Kari Kostiainen ◽

Srdjan Čapkun

Keyword(s):

Case Studies ◽

Large Scale ◽

Trusted Computing ◽

Design Time ◽

Wide Range ◽

Trusted Computing Base ◽

Recent Developments ◽

Configurable Hardware ◽

Low Performance ◽

Specialized Hardware

The ever-rising computation demand is forcing the move from the CPU to heterogeneous specialized hardware, which is readily available across modern datacenters through disaggregated infrastructure. On the other hand, trusted execution environments (TEEs), one of the most promising recent developments in hardware security, can only protect code confined in the CPU, limiting TEEs’ potential and applicability to a handful of applications. We observe that the TEEs’ hardware trusted computing base (TCB) is fixed at design time, which in practice leads to using untrusted software to employ peripherals in TEEs. Based on this observation, we propose composite enclaves with a configurable hardware and software TCB, allowing enclaves access to multiple computing and IO resources. Finally, we present two case studies of composite enclaves: i) an FPGA platform based on RISC-V Keystone connected to emulated peripherals and sensors, and ii) a large-scale accelerator. These case studies showcase a flexible but small TCB (2.5 KLoC for IO peripherals and drivers), with a low-performance overhead (only around 220 additional cycles for a context switch), thus demonstrating the feasibility of our approach and showing that it can work with a wide range of specialized hardware.

A High Granularity Approach to NetworkPacket Processing for Latency-TolerantApplications with CUDA (Corvyd)

Avances en Ciencias e Ingeniería ◽

10.18272/aci.v13i2.2142 ◽

2021 ◽

Vol 13 (2) ◽

pp. 7

Author(s):

Maria Pantoja

Keyword(s):

Graphics Processing Units ◽

General Purpose ◽

Packet Processing ◽

Maximum Throughput ◽

Intrusion Prevention ◽

Detection Systems ◽

Enterprise Level ◽

Specialized Hardware ◽

The Cost ◽

Graphics Processing

Currently, practical network packet processing used for In-trusion Detection Systems/Intrusion Prevention Systems (IDS/IPS) tendto belong to one of two disjoint categories: software-only implementa-tions running on general-purpose CPUs, or highly specialized networkhardware implementations using ASICs or FPGAs for the most commonfunctions, general-purpose CPUs for the rest. These approaches cover tryto maximize the performance and minimize the cost, but neither system,when implemented effectively, is affordable to any clients except for thoseat the well-funded enterprise level. In this paper, we aim to improve theperformance of affordable network packet processing in heterogeneoussystems with consumer Graphics Processing Units (GPUs) hardware byoptimizing latency-tolerant packet processing operations, notably IDS,to obtain maximum throughput required by such systems in networkssophisticated enough to demand a dedicated IDS/IPS system, but notenough to justify the high cost of cutting-edge specialized hardware. Inparticular, this project investigated increasing the granularity of OSIlayer-based packet batching over that of previous batching approaches.We demonstrate that highly granular GPU-enabled packet processing isgenerally impractical, compared with existing methods, by implementingour own solution that we call Corvyd, a heterogeneous real-time packetprocessing engine.

On the Optimization of Self-Organization and Self-Management Hardware Resource Allocation for Heterogeneous Clouds

Computers ◽

10.3390/computers10110147 ◽

2021 ◽

Vol 10 (11) ◽

pp. 147

Author(s):

Konstantinos M. Giannoutakis ◽

Christos K. Filelis-Papadopoulos ◽

George A. Gravvanis ◽

Dimitrios Tzovaras

Keyword(s):

High Performance ◽

Evaluation Criteria ◽

End User ◽

Suitability Index ◽

Hardware Resource ◽

Heterogeneous Clouds ◽

Specialized Hardware ◽

Heterogeneous Cloud ◽

Performance Computing ◽

Selection Of

There is a tendency, during the last years, to migrate from the traditional homogeneous clouds and centralized provisioning of resources to heterogeneous clouds with specialized hardware governed in a distributed and autonomous manner. The CloudLightning architecture proposed recently introduced a dynamic way to provision heterogeneous cloud resources, by shifting the selection of underlying resources from the end-user to the system in an efficient way. In this work, an optimized Suitability Index and assessment function are proposed, along with their theoretical analysis, for improving the computational efficiency, energy consumption, service delivery and scalability of the distributed orchestration. The effectiveness of the proposed scheme is being evaluated with the use of simulation, by comparing the optimized methods with the original approach and the traditional centralized resource management, on real and synthetic High Performance Computing applications. Finally, numerical results are presented and discussed regarding the improvements over the defined evaluation criteria.

Principles of Multitrading

Russian Digital Libraries Journal ◽

10.26907/1562-5419-2021-24-5-808-869 ◽

2021 ◽

Vol 24 (5) ◽

pp. 808-869

Author(s):

Феликс Освальдович Каспаринский

Keyword(s):

Information Environment ◽

Software Systems ◽

Financial Instrument ◽

Price Changes ◽

Social Sphere ◽

The Social ◽

Specialized Hardware ◽

Complex Indicators ◽

Kinetics Of ◽

Self Organizing

Modern software and hardware tools provide unprecedented freedom for a variety of activities in the forex markets, from trading to analyzing the feasibility of models of nonlinear processes in self-organizing systems. To reduce risks and increase the efficiency of interaction with stock market instruments, it is proposed to provide variable adaptability of trading by combining trading strategies using several trading accounts of different brokers, multiple financial instruments, and Complex Indicators Tendencies of price changes. As a result of three years of experimental work, the basic principles of multitrading have been formulated and tested, and an information environment has been compiled, contributing to the development of an individualized trading system. The basic concept of organizing a multitrading information environment: the use of specialized hardware and software systems for strategic analysis and forecasting of price changes for an individual financial instrument, tactical selection of a promising financial instrument from the available set, and effective operating activities with orders of trading accounts. It can be expected that the evolution of the principles of multitrading will lead to the creation of analytical systems for predicting the kinetics of non-equilibrium changes in the characteristic parameters of self-organizing cooperative systems for wide application in biology, cybernetics, economics, and the social sphere.

Development of an Immersive Cultural Game using Mixed Reality

10.29117/quarfe.2021.0170 ◽

2021 ◽

Author(s):

Yahia Boray ◽

Hesham Zaky ◽

Omar Osman ◽

Noora Fetais

Keyword(s):

Real World ◽

Mixed Reality ◽

Cultural Practices ◽

Natural Habitat ◽

User Interaction ◽

Virtual Museum ◽

Small Segment ◽

The Real ◽

Current Implementation ◽

Specialized Hardware

This game aims to preserve and spread cultural practices. It introduces new gaming mechanics, which allows user interaction with virtual game objects using hand gestures. The user’s objective is to hunt prey in their natural habitat, which means that the player will physically change his location to hunt a specific prey using his falcon to mimic how the falcon hunts for its prey in the real world. This interaction with the real world, along with incorporation of realistic graphics and mixed reality features, enhances the user’s experience and helps in preserving cultural practices. Previous work tried to achieve the same goal by different approaches that led to different user segments and different usability cases. One major limitation in that work was the accessibility due to the use of specialized hardware. The hardware is accessible to a small segment of users; however, given the new limitations forced by the COVID-19 situation reusing the hardware is prohibited ; and as a result, not many will have access to the developed solution. The current implementation was designed to work on both Android and IOS to have a social interaction between the largest possible numbers of players. Other features that could also contribute to the goal of the project include building a virtual museum and displaying real falcons using the capabilities mixed reality has to offer.

specialized hardware
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming

Using Your Beam Efficiently: Reducing Electron Dose in the STEM via Flyback Compensation

Searching for memory-lighter architectures for OCR-augmented image captioning

Deuterium labeling enables non-invasive 3D proton MR imaging of glucose and neurotransmitter metabolism in the human brain

Robust, fiducial-free drift correction for super-resolution imaging

Composite Enclaves: Towards Disaggregated Trusted Execution

A High Granularity Approach to NetworkPacket Processing for Latency-TolerantApplications with CUDA (Corvyd)

On the Optimization of Self-Organization and Self-Management Hardware Resource Allocation for Heterogeneous Clouds

Principles of Multitrading

Development of an Immersive Cultural Game using Mixed Reality

Export Citation Format

specialized hardwareRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Joint Program and Layout Transformations to Enable Convolutional Operators on Specialized Hardware Based on Constraint Programming

Using Your Beam Efficiently: Reducing Electron Dose in the STEM via Flyback Compensation

Searching for memory-lighter architectures for OCR-augmented image captioning

Deuterium labeling enables non-invasive 3D proton MR imaging of glucose and neurotransmitter metabolism in the human brain

Robust, fiducial-free drift correction for super-resolution imaging

Composite Enclaves: Towards Disaggregated Trusted Execution

A High Granularity Approach to NetworkPacket Processing for Latency-TolerantApplications with CUDA (Corvyd)

On the Optimization of Self-Organization and Self-Management Hardware Resource Allocation for Heterogeneous Clouds

Principles of Multitrading

Development of an Immersive Cultural Game using Mixed Reality

specialized hardware
Recently Published Documents