Beyond micro-kernel design: decoupling modularity and protection in Lipto

Author(s):  
P. Druschel ◽  
L.L. Peterson ◽  
N.C. Hutchinson
Keyword(s):  
Author(s):  
Michael D. Schroeder ◽  
David D. Clark ◽  
Jerome H. Saltzer
Keyword(s):  

2021 ◽  
Vol 15 (1) ◽  
pp. 1-10
Author(s):  
Kang Zhao ◽  
Liuyihan Song ◽  
Yingya Zhang ◽  
Pan Pan ◽  
Yinghui Xu ◽  
...  

Thanks to the popularity of GPU and the growth of its computational power, more and more deep learning tasks, such as face recognition, image retrieval and word embedding, can take advantage of extreme classification to improve accuracy. However, it remains a big challenge to train a deep model with millions of classes efficiently due to the huge memory and computation consumption in the last layer. By sampling a small set of classes to avoid the total classes calculation, sampling-based approaches have been proved to be an effective solution. But most of them suffer from the following two issues: i) the important classes are ignored or only partly sampled, such as the methods using random sampling scheme or retrieval techniques of low recall (e.g., locality-sensitive hashing), resulting in the degradation of accuracy; ii) inefficient implementation owing to incompatibility with GPU, like selective softmax. It uses hashing forest to help select classes, but the search process is implemented in CPU. To address the above issues, we propose a new sampling-based softmax called ANN Softmax in this paper. Specifically, we employ binary quantization with inverted file system to improve the recall of important classes. With the help of dedicated kernel design, it can be totally parallelized in mainstream training framework. Then, we find the size of important classes that are recalled by each training sample has a great impact on the final accuracy, so we introduce sample grouping optimization to well approximate the full classes training. Experimental evaluations on two tasks (Embedding Learning and Classification) and ten datasets (e.g., MegaFace, ImageNet, SKU datasets) demonstrate our proposed method maintains the same precision as Full Softmax for different loss objectives, including cross entropy loss, ArcFace, CosFace and D-Softmax loss, with only 1/10 sampled classes, which outperforms the state-of-the-art techniques. Moreover, we implement ANN Soft-max in a complete GPU pipeline that can accelerate the training more than 4.3X. Equipped our method with a 256 GPUs cluster, the time of training a classifier of 300 million classes on our SKU-300M dataset can be reduced to ten days.


Entropy ◽  
2018 ◽  
Vol 20 (12) ◽  
pp. 984 ◽  
Author(s):  
Yi Zhang ◽  
Lulu Wang ◽  
Liandong Wang

Graph kernels are of vital importance in the field of graph comparison and classification. However, how to compare and evaluate graph kernels and how to choose an optimal kernel for a practical classification problem remain open problems. In this paper, a comprehensive evaluation framework of graph kernels is proposed for unattributed graph classification. According to the kernel design methods, the whole graph kernel family can be categorized in five different dimensions, and then several representative graph kernels are chosen from these categories to perform the evaluation. With plenty of real-world and synthetic datasets, kernels are compared by many criteria such as classification accuracy, F1 score, runtime cost, scalability and applicability. Finally, quantitative conclusions are discussed based on the analyses of the extensive experimental results. The main contribution of this paper is that a comprehensive evaluation framework of graph kernels is proposed, which is significant for graph-classification applications and the future kernel research.


1992 ◽  
Vol 40 (2) ◽  
pp. 402-412 ◽  
Author(s):  
J. Jeong ◽  
W.J. Williams
Keyword(s):  

Robotica ◽  
2018 ◽  
Vol 36 (7) ◽  
pp. 1077-1097 ◽  
Author(s):  
Levi DeVries ◽  
Aaron Sims ◽  
Michael D. M. Kutzer

SUMMARYAutonomous multi-agent systems show promise in countless applications, but can be hindered in environments where inter-agent communication is limited. In such cases, this paper considers a scenario where agents communicate intermittently through a cloud server. We derive a graph transformation mapping the kernel of a graph's Laplacian to a desired configuration vector while retaining graph topology characteristics. The transformation facilitates derivation of a self-triggered controller driving agents to prescribed configurations while regulating instances of inter-agent communication. Experimental validation of the theoretical results shows the self-triggered approach drives agents to a desired configuration using fewer control updates than traditional periodic implementations.


Author(s):  
G. J. Popek ◽  
C. S. Kline
Keyword(s):  

Author(s):  
Gautam Ramakrishnan ◽  
Mohit Bhasi ◽  
V. Saicharan ◽  
Leslie Monis ◽  
Sachin D. Patil ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document