memory overhead
Recently Published Documents


TOTAL DOCUMENTS

70
(FIVE YEARS 31)

H-INDEX

6
(FIVE YEARS 1)

2021 ◽  
Vol 17 (4) ◽  
pp. 1-23
Author(s):  
Datong Zhang ◽  
Yuhui Deng ◽  
Yi Zhou ◽  
Yifeng Zhu ◽  
Xiao Qin

Data deduplication techniques construct an index consisting of fingerprint entries to identify and eliminate duplicated copies of repeating data. The bottleneck of disk-based index lookup and data fragmentation caused by eliminating duplicated chunks are two challenging issues in data deduplication. Deduplication-based backup systems generally employ containers storing contiguous chunks together with their fingerprints to preserve data locality for alleviating the two issues, which is still inadequate. To address these two issues, we propose a container utilization based hot fingerprint entry distilling strategy to improve the performance of deduplication-based backup systems. We divide the index into three parts: hot fingerprint entries, fragmented fingerprint entries, and useless fingerprint entries. A container with utilization smaller than a given threshold is called a sparse container . Fingerprint entries that point to non-sparse containers are hot fingerprint entries. For the remaining fingerprint entries, if a fingerprint entry matches any fingerprint of forthcoming backup chunks, it is classified as a fragmented fingerprint entry. Otherwise, it is classified as a useless fingerprint entry. We observe that hot fingerprint entries account for a small part of the index, whereas the remaining fingerprint entries account for the majority of the index. This intriguing observation inspires us to develop a hot fingerprint entry distilling approach named HID . HID segregates useless fingerprint entries from the index to improve memory utilization and bypass disk accesses. In addition, HID separates fragmented fingerprint entries to make a deduplication-based backup system directly rewrite fragmented chunks, thereby alleviating adverse fragmentation. Moreover, HID introduces a feature to treat fragmented chunks as unique chunks. This feature compensates for the shortcoming that a Bloom filter cannot directly identify certain duplicated chunks (i.e., the fragmented chunks). To take full advantage of the preceding feature, we propose an evolved HID strategy called EHID . EHID incorporates a Bloom filter, to which only hot fingerprints are mapped. In doing so, EHID exhibits two salient features: (i) EHID avoids disk accesses to identify unique chunks and the fragmented chunks; (ii) EHID slashes the false positive rate of the integrated Bloom filter. These salient features push EHID into the high-efficiency mode. Our experimental results show our approach reduces the average memory overhead of the index by 34.11% and 25.13% when using the Linux dataset and the FSL dataset, respectively. Furthermore, compared with the state-of-the-art method HAR, EHID boosts the average backup throughput by up to a factor of 2.25 with the Linux dataset, and EHID reduces the average disk I/O traffic by up to 66.21% when it comes to the FSL dataset. EHID also marginally improves the system's restore performance.


2021 ◽  
Vol 15 ◽  
Author(s):  
Junyun Zhao ◽  
Siyuan Huang ◽  
Osama Yousuf ◽  
Yutong Gao ◽  
Brian D. Hoskins ◽  
...  

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.


2021 ◽  
Vol 15 ◽  
Author(s):  
Gopalakrishnan Srinivasan ◽  
Kaushik Roy

Spiking neural networks (SNNs), with their inherent capability to learn sparse spike-based input representations over time, offer a promising solution for enabling the next generation of intelligent autonomous systems. Nevertheless, end-to-end training of deep SNNs is both compute- and memory-intensive because of the need to backpropagate error gradients through time. We propose BlocTrain, which is a scalable and complexity-aware incremental algorithm for memory-efficient training of deep SNNs. We divide a deep SNN into blocks, where each block consists of few convolutional layers followed by a classifier. We train the blocks sequentially using local errors from the classifier. Once a given block is trained, our algorithm dynamically figures out easy vs. hard classes using the class-wise accuracy, and trains the deeper block only on the hard class inputs. In addition, we also incorporate a hard class detector (HCD) per block that is used during inference to exit early for the easy class inputs and activate the deeper blocks only for the hard class inputs. We trained ResNet-9 SNN divided into three blocks, using BlocTrain, on CIFAR-10 and obtained 86.4% accuracy, which is achieved with up to 2.95× lower memory requirement during the course of training, and 1.89× compute efficiency per inference (due to early exit strategy) with 1.45× memory overhead (primarily due to classifier weights) compared to end-to-end network. We also trained ResNet-11, divided into four blocks, on CIFAR-100 and obtained 58.21% accuracy, which is one of the first reported accuracy for SNN trained entirely with spike-based backpropagation on CIFAR-100.


Author(s):  
Md. Aaqeel Hasan* ◽  
◽  
Dr. Jaypal Medida ◽  
N. Laxmi Prasanna ◽  
◽  
...  

Internet of Things (IoT) refers to the concept of connecting non-traditional computers and related sources with the help of the internet. This includes incorporating basic computing and communication technologies for daily use into Physical things. Security and Confidentiality are two major challenges in IoT. In the current security mechanisms available for IoT, the limitations in the memory, energy resources, and CPU of IoT devices compromises the critical security specifications in IoT devices. Also, the centralized architectures for security are not appropriate for IoT because of a Single attack point. It is costly to defend against attacks targeted on centralized infrastructure. Therefore, it is important to decentralize the IoT security architecture to meet the requirements of resource constraints. Blockchain is a decentralized encryption system with a large number of uses. However, because of its high computational complexity and poor scalability, the Traditional Blockchain environment is not suitable for IoT applications. So, we introduce a Sliding window protocol to the traditional blockchain so that it will better suit the applications in the IoT environment. Changing the conventional blockchain and introducing a sliding window to it makes it use previous blocks in proof of work to shape the next hash block. SWBC's results are analyzed on a data stream generated from an IoT testbed (Smart Home) in real-time. The results show that the proposed sliding window protocol improves security and reduces memory overhead and consumes fewer resources for Security.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Lifu Zhang ◽  
Tarek S. Abdelrahman

The growth in size and complexity of convolutional neural networks (CNNs) is forcing the partitioning of a network across multiple accelerators during training and pipelining of backpropagation computations over these accelerators. Pipelining results in the use of stale weights. Existing approaches to pipelined training avoid or limit the use of stale weights with techniques that either underutilize accelerators or increase training memory footprint. This paper contributes a pipelined backpropagation scheme that uses stale weights to maximize accelerator utilization and keep memory overhead modest. It explores the impact of stale weights on the statistical efficiency and performance using 4 CNNs (LeNet-5, AlexNet, VGG, and ResNet) and shows that when pipelining is introduced in early layers, training with stale weights converges and results in models with comparable inference accuracies to those resulting from nonpipelined training (a drop in accuracy of 0.4%, 4%, 0.83%, and 1.45% for the 4 networks, respectively). However, when pipelining is deeper in the network, inference accuracies drop significantly (up to 12% for VGG and 8.5% for ResNet-20). The paper also contributes a hybrid training scheme that combines pipelined with nonpipelined training to address this drop. The potential for performance improvement of the proposed scheme is demonstrated with a proof-of-concept pipelined backpropagation implementation in PyTorch on 2 GPUs using ResNet-56/110/224/362, achieving speedups of up to 1.8X over a 1-GPU baseline.


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4597
Author(s):  
Yulan Deng ◽  
Shaohua Teng ◽  
Lunke Fei ◽  
Wei Zhang ◽  
Imad Rida

Age estimation from face images has attracted much attention due to its favorable and many real-world applications such as video surveillance and social networking. However, most existing studies usually learn a single kind of age feature and ignore other appearance features such as gender and race, which have a great influence on the age pattern. In this paper, we proposed a compact multifeature learning and fusion method for age estimation. Specifically, we first used three subnetworks to learn gender, race, and age information. Then, we fused these complementary features to further form more robust features for age estimation. Finally, we engineered a regression-ranking age-feature estimator to convert the fusion features into the exact age numbers. Experimental results on three benchmark databases demonstrated the effectiveness and efficiency of the proposed method on facial age estimation in comparison to previous state-of-the-art methods. Moreover, compared with previous state-of-the-art methods, our model was more compact with only a 20 MB memory overhead and is suitable for deployment on mobile or embedded devices for age estimation.


2021 ◽  
Vol 15 (3) ◽  
pp. 1-21
Author(s):  
Guanhao Wu ◽  
Xiaofeng Gao ◽  
Ge Yan ◽  
Guihai Chen

Influence Maximization (IM) problem is to select influential users to maximize the influence spread, which plays an important role in many real-world applications such as product recommendation, epidemic control, and network monitoring. Nowadays multiple kinds of information can propagate in online social networks simultaneously, but current literature seldom discuss about this phenomenon. Accordingly, in this article, we propose Multiple Influence Maximization (MIM) problem where multiple information can propagate in a single network with different propagation probabilities. The goal of MIM problems is to maximize the overall accumulative influence spreads of different information with the limit of seed budget . To solve MIM problems, we first propose a greedy framework to solve MIM problems which maintains an -approximate ratio. We further propose parallel algorithms based on semaphores, an inter-thread communication mechanism, which significantly improves our algorithms efficiency. Then we conduct experiments for our framework using complex social network datasets with 12k, 154k, 317k, and 1.1m nodes, and the experimental results show that our greedy framework outperforms other heuristic algorithms greatly for large influence spread and parallelization of algorithms reduces running time observably with acceptable memory overhead.


Author(s):  
Siddharth Gupta ◽  
Abhinav Rana ◽  
Lakhsmi M.

Rural areas in India may not necessarily have internet connection and what if they need to know about nearest hospitals, fire stations or if a message has to be broadcasted over a large section of this area. This paper tries to solve all of these problems by implementing a pin-code based approach to map these important locations. Now a days, more advanced technologies such as GPS are more useful in solving these problems in real-time but what this paper tries to do is solve them statically where technology can’t take precedence over daily lives. It discusses a smart, fast algorithm with minimum memory overhead to solve these specific issues. Lack of data sets are a concern in this approach but to automate this we have used Image Processing to automatically detect boundaries of pin-codes on various sub-regions to form a universal Graph which can then be subjected to various algorithms like A*, DFS, BFS, Dijkstra which are local to each type of problem. This leaves scope for further research on developing models that can read images of maps more efficiently to create more accurate data sets which are accurate enough and void of longitudinal and latitudinal details.


2021 ◽  
Vol 51 (1) ◽  
pp. 73-89
Author(s):  
Varsha Mittal ◽  
Durgaprasad Gangodkar ◽  
Bhaskar Pant

Abstract Graph databases are applied in many applications, including science and business, due to their low-complexity, low-overheads, and lower time-complexity. The graph-based storage offers the advantage of capturing the semantic and structural information rather than simply using the Bag-of-Words technique. An approach called Knowledgeable graphs (K-Graph) is proposed to capture semantic knowledge. Documents are stored using graph nodes. Thanks to weighted subgraphs, the frequent subgraphs are extracted and stored in the Fast Embedding Referral Table (FERT). The table is maintained at different levels according to the headings and subheadings of the documents. It reduces the memory overhead, retrieval, and access time of the subgraph needed. The authors propose an approach that will reduce the data redundancy to a larger extent. With real-world datasets, K-graph’s performance and power usage are threefold greater than the current methods. Ninety-nine per cent accuracy demonstrates the robustness of the proposed algorithm.


Sign in / Sign up

Export Citation Format

Share Document