Beyond micro-kernel design: decoupling modularity and protection in Lipto

Thanks to the popularity of GPU and the growth of its computational power, more and more deep learning tasks, such as face recognition, image retrieval and word embedding, can take advantage of extreme classification to improve accuracy. However, it remains a big challenge to train a deep model with millions of classes efficiently due to the huge memory and computation consumption in the last layer. By sampling a small set of classes to avoid the total classes calculation, sampling-based approaches have been proved to be an effective solution. But most of them suffer from the following two issues: i) the important classes are ignored or only partly sampled, such as the methods using random sampling scheme or retrieval techniques of low recall (e.g., locality-sensitive hashing), resulting in the degradation of accuracy; ii) inefficient implementation owing to incompatibility with GPU, like selective softmax. It uses hashing forest to help select classes, but the search process is implemented in CPU. To address the above issues, we propose a new sampling-based softmax called ANN Softmax in this paper. Specifically, we employ binary quantization with inverted file system to improve the recall of important classes. With the help of dedicated kernel design, it can be totally parallelized in mainstream training framework. Then, we find the size of important classes that are recalled by each training sample has a great impact on the final accuracy, so we introduce sample grouping optimization to well approximate the full classes training. Experimental evaluations on two tasks (Embedding Learning and Classification) and ten datasets (e.g., MegaFace, ImageNet, SKU datasets) demonstrate our proposed method maintains the same precision as Full Softmax for different loss objectives, including cross entropy loss, ArcFace, CosFace and D-Softmax loss, with only 1/10 sampled classes, which outperforms the state-of-the-art techniques. Moreover, we implement ANN Soft-max in a complete GPU pipeline that can accelerate the training more than 4.3X. Equipped our method with a 256 GPUs cluster, the time of training a classifier of 300 million classes on our SKU-300M dataset can be reduced to ten days.

Download Full-text

A Comprehensive Evaluation of Graph Kernels for Unattributed Graphs

Entropy ◽

10.3390/e20120984 ◽

2018 ◽

Vol 20 (12) ◽

pp. 984 ◽

Cited By ~ 2

Author(s):

Yi Zhang ◽

Lulu Wang ◽

Liandong Wang

Keyword(s):

Comprehensive Evaluation ◽

Classification Problem ◽

Evaluation Framework ◽

Graph Classification ◽

Open Problems ◽

Graph Kernels ◽

Different Dimensions ◽

Synthetic Datasets ◽

Optimal Kernel ◽

Kernel Design

Graph kernels are of vital importance in the field of graph comparison and classification. However, how to compare and evaluate graph kernels and how to choose an optimal kernel for a practical classification problem remain open problems. In this paper, a comprehensive evaluation framework of graph kernels is proposed for unattributed graph classification. According to the kernel design methods, the whole graph kernel family can be categorized in five different dimensions, and then several representative graph kernels are chosen from these categories to perform the evaluation. With plenty of real-world and synthetic datasets, kernels are compared by many criteria such as classification accuracy, F1 score, runtime cost, scalability and applicability. Finally, quantitative conclusions are discussed based on the analyses of the extensive experimental results. The main contribution of this paper is that a comprehensive evaluation framework of graph kernels is proposed, which is significant for graph-classification applications and the future kernel research.

Download Full-text

Kernel design for reduced interference distributions

IEEE Transactions on Signal Processing ◽

10.1109/78.124950 ◽

1992 ◽

Vol 40 (2) ◽

pp. 402-412 ◽

Cited By ~ 298

Author(s):

J. Jeong ◽

W.J. Williams

Keyword(s):

Kernel Design

Download Full-text

A parameter-free kernel design based on cumulative distribution function for correntropy

The 2013 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2013.6707021 ◽

2013 ◽

Author(s):

Jongmin Lee ◽

Pingping Zhu ◽

Jose C. Principe

Keyword(s):

Distribution Function ◽

Cumulative Distribution Function ◽

Cumulative Distribution ◽

Kernel Design

Download Full-text

Kernel Design and Distributed, Self-Triggered Control for Coordination of Autonomous Multi-Agent Configurations

Robotica ◽

10.1017/s0263574718000231 ◽

2018 ◽

Vol 36 (7) ◽

pp. 1077-1097 ◽

Cited By ~ 4

Author(s):

Levi DeVries ◽

Aaron Sims ◽

Michael D. M. Kutzer

Keyword(s):

Experimental Validation ◽

Graph Transformation ◽

Multi Agent Systems ◽

Graph Topology ◽

Agent Communication ◽

Agent Systems ◽

Cloud Server ◽

Multi Agent ◽

Kernel Design ◽

Theoretical Results

SUMMARYAutonomous multi-agent systems show promise in countless applications, but can be hindered in environments where inter-agent communication is limited. In such cases, this paper considers a scenario where agents communicate intermittently through a cloud server. We derive a graph transformation mapping the kernel of a graph's Laplacian to a desired configuration vector while retaining graph topology characteristics. The transformation facilitates derivation of a self-triggered controller driving agents to prescribed configurations while regulating instances of inter-agent communication. Experimental validation of the theoretical results shows the self-triggered approach drives agents to a desired configuration using fewer control updates than traditional periodic implementations.

Download Full-text