memory scheduling
Recently Published Documents


TOTAL DOCUMENTS

44
(FIVE YEARS 1)

H-INDEX

8
(FIVE YEARS 0)

Electronics ◽  
2021 ◽  
Vol 10 (8) ◽  
pp. 972
Author(s):  
Mehdi Pirahandeh ◽  
Shan Ullah ◽  
Deok-Hwan Kim

The gradual increase in latency-sensitive, real-time applications for embedded systems encourages users to share sensor data simultaneously. Streamed sensor data have deficient performance. In this paper, we propose a new edge-based scheduling method with high-bandwidth for decreasing driver-profiling latency. The proposed multi-level memory scheduling method places data in a key-value storage, flushes sensor data when the edge memory is full, and reduces the number of I/O operations, network latency, and the number of REST API calls in the edge cloud. As a result, the proposed method provides significant read/write performance enhancement for real-time embedded systems. In fact, the proposed application improves the number of requests per second by 3.5, 5, and 4 times, respectively, compared with existing light-weight FCN-LSTM, FCN-LSTM, and DeepConvRNN Attention solutions. The proposed application also improves the bandwidth by 5.89, 5.58, and 4.16 times respectively, compared with existing light-weight FCN-LSTM, FCN-LSTM, and DeepConvRNN Attention solutions.



2020 ◽  
Vol 76 (4) ◽  
pp. 3129-3154
Author(s):  
Juan Fang ◽  
Mengxuan Wang ◽  
Zelin Wei

AbstractMultiple CPUs and GPUs are integrated on the same chip to share memory, and access requests between cores are interfering with each other. Memory requests from the GPU seriously interfere with the CPU memory access performance. Requests between multiple CPUs are intertwined when accessing memory, and its performance is greatly affected. The difference in access latency between GPU cores increases the average latency of memory accesses. In order to solve the problems encountered in the shared memory of heterogeneous multi-core systems, we propose a step-by-step memory scheduling strategy, which improve the system performance. The step-by-step memory scheduling strategy first creates a new memory request queue based on the request source and isolates the CPU requests from the GPU requests when the memory controller receives the memory request, thereby preventing the GPU request from interfering with the CPU request. Then, for the CPU request queue, a dynamic bank partitioning strategy is implemented, which dynamically maps it to different bank sets according to different memory characteristics of the application, and eliminates memory request interference of multiple CPU applications without affecting bank-level parallelism. Finally, for the GPU request queue, the criticality is introduced to measure the difference of the memory access latency between the cores. Based on the first ready-first come first served strategy, we implemented criticality-aware memory scheduling to balance the locality and criticality of application access.





Electronics ◽  
2019 ◽  
Vol 8 (4) ◽  
pp. 371 ◽  
Author(s):  
Qinyu Chen ◽  
Yuxiang Fu ◽  
Wenqing Song ◽  
Kaifeng Cheng ◽  
Zhonghai Lu ◽  
...  

Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with stringent latency, power, and area requirements. To address this issue, low bit-width CNNs are proposed as a highly competitive candidate. In this paper, we propose an efficient, scalable accelerator for low bit-width CNNs based on a parallel streaming architecture. With a novel coarse grain task partitioning (CGTP) strategy, the proposed accelerator with heterogeneous computing units, supporting multi-pattern dataflows, can nearly double the throughput for various CNN models on average. Besides, a hardware-friendly algorithm is proposed to simplify the activation and quantification process, which can reduce the power dissipation and area overhead. Based on the optimized algorithm, an efficient reconfigurable three-stage activation-quantification-pooling (AQP) unit with the low power staged blocking strategy is developed, which can process activation, quantification, and max-pooling operations simultaneously. Moreover, an interleaving memory scheduling scheme is proposed to well support the streaming architecture. The accelerator is implemented with TSMC 40 nm technology with a core size of 0.17 mm 2 . It can achieve 7.03 TOPS/W energy efficiency and 4.14 TOPS/mm 2 area efficiency at 100.1 mW, which makes it a promising design for the embedded devices.





2018 ◽  
Vol 17 (4) ◽  
pp. 1-25
Author(s):  
Guan Wang ◽  
Chuanqi Zang ◽  
Lei Ju ◽  
Mengying Zhao ◽  
Xiaojun Cai ◽  
...  


2017 ◽  
Vol 11 (4) ◽  
pp. 2839-2851 ◽  
Author(s):  
Gangyong Jia ◽  
Guangjie Han ◽  
Aohan Li ◽  
Jaime Lloret




Author(s):  
Gustavo A. Chaparro-Baquero ◽  
Shi Sha ◽  
Soamar Homsi ◽  
Wujie Wen ◽  
Gang Quan


Sign in / Sign up

Export Citation Format

Share Document