dedicated hardware Latest Research Papers

Accelerating Whole-Cell Simulations of mRNA Translation Using a Dedicated Hardware

ACS Synthetic Biology ◽

10.1021/acssynbio.1c00415 ◽

2021 ◽

Author(s):

David Shallom ◽

Danny Naiger ◽

Shlomo Weiss ◽

Tamir Tuller

Keyword(s):

Mrna Translation ◽

Whole Cell ◽

Dedicated Hardware

Real-time Noise-suppressed Wide-Dynamic-Range Compression in Ultrahigh-Resolution Neuronal Imaging

10.1101/2021.09.29.462090 ◽

2021 ◽

Author(s):

Bhaskar Jyoti Borah ◽

Chi-Kuang Sun

Keyword(s):

Real Time ◽

Dynamic Range ◽

Contrast Ratio ◽

Nerve Fibers ◽

Wide Dynamic Range ◽

Data Set ◽

Dynamic Range Compression ◽

Weak Intensity ◽

Neuronal Imaging ◽

Dedicated Hardware

SummaryWith a limited dynamic range of an imaging system, there are always regions with signal intensities comparable to the noise level, if the signal intensity distribution is close to or even wider than the available dynamic range. Optical brain/neuronal imaging is such a case where weak-intensity ultrafine structures, such as, nerve fibers, dendrites and dendritic spines, often coexist with ultrabright structures, such as, somas. A high fluorescence-protein concentration makes the soma order-of-magnitude brighter than the adjacent ultrafine structures resulting in an ultra-wide dynamic range. A straightforward enhancement of the weak-intensity structures often leads to saturation of the brighter ones, and might further result in amplification of high-frequency background noises. An adaptive illumination strategy to real-time-compress the dynamic range demands a dedicated hardware to operate and owing to electronic limitations, might encounter a poor effective bandwidth especially when each digitized pixel is required to be illumination optimized. Furthermore, such a method is often not immune to noise-amplification while locally enhancing a weak-intensity structure. We report a dedicated-hardware-free method for rapid noise-suppressed wide-dynamic-range compression so as to enhance visibility of such weak-intensity structures in terms of both contrast-ratio and signal-to-noise ratio while minimizing saturation of the brightest ones. With large-FOV aliasing-free two-photon fluorescence neuronal imaging, we validate its effectiveness by retrieving weak-intensity ultrafine structures amidst a strong noisy background. With compute-unified-device-architecture (CUDA)-acceleration, a time-complexity of <3 ms for a 1000×1000-sized 16-bit data-set is secured, enabling a real-time applicability of the same.

Oil Spill Identification from SAR Images for Low Power Embedded Systems Using CNN

Remote Sensing ◽

10.3390/rs13183606 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3606

Author(s):

Lorenzo Diana ◽

Jia Xu ◽

Luca Fanucci

Keyword(s):

Embedded Systems ◽

Power Consumption ◽

Low Power ◽

Oil Spills ◽

Weather Conditions ◽

Hardware Accelerators ◽

Sar Images ◽

Satellite Systems ◽

High Resolution Images ◽

Dedicated Hardware

Oil spills represent one of the major threats to marine ecosystems. Satellite synthetic-aperture radar (SAR) sensors have been widely used to identify oil spills due to their ability to provide high resolution images during day and night under all weather conditions. In recent years, the use of artificial intelligence (AI) systems, especially convolutional neural networks (CNNs), have led to many important improvements in performing this task. However, most of the previous solutions to this problem have focused on obtaining the best performance under the assumption that there are no constraints on the amount of hardware resources being used. For this reason, the amounts of hardware resources such as memory and power consumption required by previous solutions make them unsuitable for remote embedded systems such as nano and micro-satellites, which usually have very limited hardware capability and very strict limits on power consumption. In this paper, we present a CNN architecture for semantically segmenting SAR images into multiple classes. The proposed CNN is specifically designed to run on remote embedded systems, which have very limited hardware capability and strict limits on power consumption. Even if the performance in terms of results accuracy does not represent a step forward compared with previous solutions, the presented CNN has the important advantage of being able to run on remote embedded systems with limited hardware resources while achieving good performance. The presented CNN is compatible with dedicated hardware accelerators available on the market due to its low memory footprint and small size. It also provides many additional very significant advantages, such as having shorter inference times, requiring shorter training times, and avoiding transmission of irrelevant data. Our goal is to allow embedded low power remote devices such as satellite systems for remote sensing to be able to directly run CNNs on board, so that the amount of data that needs to be transmitted to ground and processed on ground can be substantially reduced, which will be greatly beneficial in significantly reducing the amount of time needed for identification of oil spills from SAR images.

Modern Video Coding: Methods, Challenges and Systems

Journal of Integrated Circuits and Systems ◽

10.29292/jics.v16i2.503 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-12

Author(s):

Roberta De Carvalho Nobre Palau ◽

Bianca Santos da Cunha Silveira ◽

Robson André Domanski ◽

Marta Breunig Loose ◽

Arthur Alves Cerveira ◽

...

Keyword(s):

Video Coding ◽

High Throughput ◽

High Efficiency ◽

High Efficiency Video Coding ◽

Ultrahigh Resolution ◽

Daily Lives ◽

Open Research ◽

Hardware Architectures ◽

Increasing Demand ◽

Dedicated Hardware

With the increasing demand for digital video applications in our daily lives, video coding and decoding become critical tasks that must be supported by several types of devices and systems. This paper presents a discussion of the main challenges to design dedicated hardware architectures based on modern hybrid video coding formats, such as the High Efficiency Video Coding (HEVC), the AOMedia Video 1 (AV1) and the Versatile Video Coding (VVC). The paper discusses eachstep of the hybrid video coding process, highlighting the main challenges for each codec and discussing the main hardware solutions published in the literature. The discussions presented in the paper show that there are still many challenges to be overcome and open research opportunities, especially for the AV1 and VVC codecs. Most of these challenges are related to the high throughput required for processing high and ultrahigh resolution videos in real time and to energy constraints of multimedia-capable devices.

Taylor-Series-Based Reconfigurability of Gamma Correction in Hardware Designs

Electronics ◽

10.3390/electronics10161959 ◽

2021 ◽

Vol 10 (16) ◽

pp. 1959

Author(s):

Dat Ngo ◽

Bongsoon Kang

Keyword(s):

Taylor Series ◽

Hardware Implementation ◽

Processing Technique ◽

Image Processing Technique ◽

Gamma Correction ◽

Challenging Problem ◽

Still Image ◽

System On A Chip ◽

Hardware Designs ◽

Dedicated Hardware

Gamma correction is a common image processing technique that is common in video or still image systems. However, this simple and efficient method is typically expressed using the power law, which gives rise to practical difficulties in designing a reconfigurable hardware implementation. For example, the conventional approach calculates all possible outputs for a pre-determined gamma value, and this information is hardwired into memory components. As a result, reconfigurability is unattainable after deployment. This study proposes using the Taylor series to approximate gamma correction to overcome the aforementioned challenging problem, hence, facilitating the post-deployment reconfigurability of the hardware implementation. In other words, the gamma value is freely adjustable, resulting in the high appropriateness for offloading gamma correction onto its dedicated hardware in system-on-a-chip applications. Finally, the proposed hardware implementation is verified on Zynq UltraScale+ MPSoC ZCU106 Evaluation Kit, and the results demonstrate its superiority against benchmark designs.

Trends in human activity recognition using smartphones

Journal of Reliable Intelligent Environments ◽

10.1007/s40860-021-00147-0 ◽

2021 ◽

Author(s):

Anna Ferrari ◽

Daniela Micucci ◽

Marco Mobilio ◽

Paolo Napoletano

Keyword(s):

Activity Recognition ◽

Human Activity ◽

Human Activities ◽

Human Activity Recognition ◽

Activity Classification ◽

Evaluation Phase ◽

Behavior Tracking ◽

And Behavior ◽

Crowd Surveillance ◽

Dedicated Hardware

AbstractRecognizing human activities and monitoring population behavior are fundamental needs of our society. Population security, crowd surveillance, healthcare support and living assistance, and lifestyle and behavior tracking are some of the main applications that require the recognition of human activities. Over the past few decades, researchers have investigated techniques that can automatically recognize human activities. This line of research is commonly known as Human Activity Recognition (HAR). HAR involves many tasks: from signals acquisition to activity classification. The tasks involved are not simple and often require dedicated hardware, sophisticated engineering, and computational and statistical techniques for data preprocessing and analysis. Over the years, different techniques have been tested and different solutions have been proposed to achieve a classification process that provides reliable results. This survey presents the most recent solutions proposed for each task in the human activity classification process, that is, acquisition, preprocessing, data segmentation, feature extraction, and classification. Solutions are analyzed by emphasizing their strengths and weaknesses. For completeness, the survey also presents the metrics commonly used to evaluate the goodness of a classifier and the datasets of inertial signals from smartphones that are mostly used in the evaluation phase.

Decimal Multiplication in FPGA with a Novel Decimal Adder/Subtractor

Algorithms ◽

10.3390/a14070198 ◽

2021 ◽

Vol 14 (7) ◽

pp. 198

Author(s):

Mário P. Véstias ◽

Horácio C. Neto

Keyword(s):

Arithmetic Functions ◽

Optimized Design ◽

Decimal Arithmetic ◽

Binary Arithmetic ◽

Decimal Adder ◽

Decimal Fractions ◽

Decimal Multiplication ◽

And Performance ◽

Improve State ◽

Dedicated Hardware

Financial and commercial data are mostly represented in decimal format. To avoid errors introduced when converting some decimal fractions to binary, these data are processed with decimal arithmetic. Most processors only have hardwired binary arithmetic units. So, decimal operations are executed with slow software-based decimal arithmetic functions. For the fast execution of decimal operations, dedicated hardware units have been proposed and designed in FPGA. Decimal multiplication is found in most decimal-based applications and so its optimized design is very important for fast execution. In this paper two new parallel decimal multipliers in FPGA are proposed. These are based on a new decimal adder/subtractor also proposed in this paper. The new decimal multipliers improve state-of-the-art parallel decimal multipliers. Compared to previous architectures, implementation results show that the proposed multipliers achieve 26% better area and 12% better performance. Also, the new decimal multipliers reduce the area and performance gap to binary multipliers and are smaller for 32 digit operands.

A Performance Analysis of Virtual Mail Server on Type-2 Hypervisors

Walailak Journal of Science and Technology (WJST) ◽

10.48048/wjst.2021.9845 ◽

2021 ◽

Vol 18 (13) ◽

Author(s):

Faizan Ali KHAJI ◽

Sri Varun POTLURI ◽

Anil Kumar KAKELLI

Keyword(s):

Load Balancing ◽

Virtual Machine ◽

Single Machine ◽

Disaster Recovery ◽

Web Servers ◽

Mail Server ◽

Comparative Results ◽

Dedicated Hardware ◽

The Web

Virtualization is now the most efficient and successful procedure to use server resources. Virtualization is a technology that will efficiently allow the instructions or programs based on the available hardware. With the help of virtualization, we can achieve benefits like reduced capital and operating costs, minimized downtime, increased productivity, and good disaster recovery, etc. This work compares the virtual mail server performance on 3 different Hypervisor software. The 3 different type-2 hypervisor software we chose to compare were Kernel-based Virtual Machine (KVM), Oracle VirtualBox, and VMware. A computer system that sends and also receives an email over a network either on LAN or the internet is called a Mail Server. Although large ISPs and public email services have their dedicated hardware, in most cases, the web servers and mail servers are hosted on a single machine by virtualization. When a Mail server is implemented on the hypervisor, it is then considered as a virtual mail server. A virtual mail server is good at load balancing and has many more benefits. This paper presents how to implement a virtual mail server and shows the comparative results of the virtual mail server on different type-2 hypervisor software. HIGHLIGHTS Virtualization will efficiently run the instructions or programs based on the available hardware The virtual mail server has more significant advantages than the standard mail server. It helps in achieving faster deployment of mail servers with fewer hardware requirements Comparison of the performance of the virtual mail server on three different Type-2 Hypervisor software: Kernel-based Virtual Machine (KVM), Oracle VirtualBox, and VMware The web servers and mail servers can be hosted on a single machine by virtualization A virtual mail server is best for load balancing and disaster recovery as well

Improving Performance of Hardware Adaptive Filter

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05325 ◽

2021 ◽

Vol 23 (06) ◽

pp. 742-745

Author(s):

Ankur Ankur ◽

◽

Dr. Veena Devi ◽

Keyword(s):

Neural Network ◽

Transfer Function ◽

Adaptive Filter ◽

Digital Signal Processor ◽

Adaptive Filters ◽

Digital Signal ◽

Current Data ◽

Current Design ◽

Multiple Threads ◽

Dedicated Hardware

An adaptive filter (AF) is a digital filter that has a transfer function that changes based on changes in the surroundings. Adaptive filters can adjust their weights using cost functions similar to a neural network. Implementation of the adaptive filter in hardware allows it to have higher speed (Consumes lesser number of clock cycles) and hence also saving on power. A regular Digital Signal Processor (DSP) may also be employed to do the same but it will never come close to the performance of dedicated hardware. An improvement in this said hardware will directly boost the performance of all use-cases. Simulation of the existing design gives an idea of the current data flow and architecture. Exploring different potential improvements in design and then weighing the outcome gain vs effort to add the functionality is done. An improvement is chosen and implemented. Once it does the intended functionality, It is profiled to see the improvement in performance. A large Filter task is divided into multiple threads. These are executed sequentially. In the current design, a thread has 3 status registers to indicate whether it’s in progress, pending, next. A scenario in which a certain thread needs to be canceled, it can only be done if the thread is not already in progress, which will lead to wasted clock cycles. Hence the ability to stop a thread executing midway will save those clock cycles.

Accelerating Population Count with a Hardware Co-Processor for MicroBlaze

Journal of Low Power Electronics and Applications ◽

10.3390/jlpea11020020 ◽

2021 ◽

Vol 11 (2) ◽

pp. 20

Author(s):

Iouliia Skliarova

Keyword(s):

Low Cost ◽

Direct Memory Access ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Hardware Accelerator ◽

Instruction Set ◽

Population Count ◽

Field Programmable ◽

Dedicated Hardware

This paper proposes a Field-Programmable Gate Array (FPGA)-based hardware accelerator for assisting the embedded MicroBlaze soft-core processor in calculating population count. The population count is frequently required to be executed in cyber-physical systems and can be applied to large data sets, such as in the case of molecular similarity search in cheminformatics, or assisting with computations performed by binarized neural networks. The MicroBlaze instruction set architecture (ISA) does not support this operation natively, so the count has to be realized as either a sequence of native instructions (in software) or in parallel in a dedicated hardware accelerator. Different hardware accelerator architectures are analyzed and compared to one another and to implementing the population count operation in MicroBlaze. The achieved experimental results with large vector lengths (up to 217) demonstrate that the best hardware accelerator with DMA (Direct Memory Access) is ~31 times faster than the best software version running on MicroBlaze. The proposed architectures are scalable and can easily be adjusted to both smaller and bigger input vector lengths. The entire system was implemented and tested on a Nexys-4 prototyping board containing a low-cost/low-power Artix-7 FPGA.

dedicated hardware
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accelerating Whole-Cell Simulations of mRNA Translation Using a Dedicated Hardware

Real-time Noise-suppressed Wide-Dynamic-Range Compression in Ultrahigh-Resolution Neuronal Imaging

Oil Spill Identification from SAR Images for Low Power Embedded Systems Using CNN

Modern Video Coding: Methods, Challenges and Systems

Taylor-Series-Based Reconfigurability of Gamma Correction in Hardware Designs

Trends in human activity recognition using smartphones

Decimal Multiplication in FPGA with a Novel Decimal Adder/Subtractor

A Performance Analysis of Virtual Mail Server on Type-2 Hypervisors

Improving Performance of Hardware Adaptive Filter

Accelerating Population Count with a Hardware Co-Processor for MicroBlaze

Export Citation Format

dedicated hardwareRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Accelerating Whole-Cell Simulations of mRNA Translation Using a Dedicated Hardware

Real-time Noise-suppressed Wide-Dynamic-Range Compression in Ultrahigh-Resolution Neuronal Imaging

Oil Spill Identification from SAR Images for Low Power Embedded Systems Using CNN

Modern Video Coding: Methods, Challenges and Systems

Taylor-Series-Based Reconfigurability of Gamma Correction in Hardware Designs

Trends in human activity recognition using smartphones

Decimal Multiplication in FPGA with a Novel Decimal Adder/Subtractor

A Performance Analysis of Virtual Mail Server on Type-2 Hypervisors

Improving Performance of Hardware Adaptive Filter

Accelerating Population Count with a Hardware Co-Processor for MicroBlaze

dedicated hardware
Recently Published Documents