Online evolutionary batch size orchestration for scheduling deep learning workloads in GPU clusters

A large amount of training image data is required for solving image classification problems using deep learning (DL) networks. In this study, we aimed to train DL networks with synthetic images generated by using a game engine and determine the effects of the networks on performance when solving real-image classification problems. The study presents the results of using corner detection and nearest three-point selection (CDNTS) layers to classify bird and rotary-wing unmanned aerial vehicle (RW-UAV) images, provides a comprehensive comparison of two different experimental setups, and emphasizes the significant improvements in the performance in deep learning-based networks due to the inclusion of a CDNTS layer. Experiment 1 corresponds to training the commonly used deep learning-based networks with synthetic data and an image classification test on real data. Experiment 2 corresponds to training the CDNTS layer and commonly used deep learning-based networks with synthetic data and an image classification test on real data. In experiment 1, the best area under the curve (AUC) value for the image classification test accuracy was measured as 72%. In experiment 2, using the CDNTS layer, the AUC value for the image classification test accuracy was measured as 88.9%. A total of 432 different combinations of trainings were investigated in the experimental setups. The experiments were trained with various DL networks using four different optimizers by considering all combinations of batch size, learning rate, and dropout hyperparameters. The test accuracy AUC values for networks in experiment 1 ranged from 55% to 74%, whereas the test accuracy AUC values in experiment 2 networks with a CDNTS layer ranged from 76% to 89.9%. It was observed that the CDNTS layer has considerable effects on the image classification accuracy performance of deep learning-based networks. AUC, F-score, and test accuracy measures were used to validate the success of the networks.

Download Full-text

Simba

Communications of the ACM ◽

10.1145/3460227 ◽

2021 ◽

Vol 64 (6) ◽

pp. 107-116

Author(s):

Yakun Sophia Shao ◽

Jason Cemons ◽

Rangharajan Venkatesan ◽

Brian Zimmer ◽

Matthew Fojtik ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Data Locality ◽

Coarse Grained ◽

Batch Size ◽

Peak Performance ◽

Large Scale Systems ◽

High Area ◽

On Chip ◽

And Storage

Package-level integration using multi-chip-modules (MCMs) is a promising approach for building large-scale systems. Compared to a large monolithic die, an MCM combines many smaller chiplets into a larger system, substantially reducing fabrication and design costs. Current MCMs typically only contain a handful of coarse-grained large chiplets due to the high area, performance, and energy overheads associated with inter-chiplet communication. This work investigates and quantifies the costs and benefits of using MCMs with finegrained chiplets for deep learning inference, an application domain with large compute and on-chip storage requirements. To evaluate the approach, we architected, implemented, fabricated, and tested Simba, a 36-chiplet prototype MCM system for deep-learning inference. Each chiplet achieves 4 TOPS peak performance, and the 36-chiplet MCM package achieves up to 128 TOPS and up to 6.1 TOPS/W. The MCM is configurable to support a flexible mapping of DNN layers to the distributed compute and storage units. To mitigate inter-chiplet communication overheads, we introduce three tiling optimizations that improve data locality. These optimizations achieve up to 16% speedup compared to the baseline layer mapping. Our evaluation shows that Simba can process 1988 images/s running ResNet-50 with a batch size of one, delivering an inference latency of 0.50 ms.

Download Full-text

OpSeF: Open source Python framework for collaborative instance segmentation of bioimages

10.1101/2020.04.29.068023 ◽

2020 ◽

Cited By ~ 2

Author(s):

Tobias M. Rasse ◽

Réka Hollandi ◽

Péter Horváth

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Complex Analysis ◽

Ease Of Use ◽

Problem Definition ◽

Training Data ◽

Post Processing ◽

Gpu Clusters ◽

User Tasks ◽

Instance Segmentation

AbstractVarious pre-trained deep learning models for the segmentation of bioimages have been made available as ‘developer-to-end-user’ solutions. They usually require neither knowledge of machine learning nor coding skills, are optimized for ease of use, and deployability on laptops. However, testing these tools individually is tedious and success is uncertain.Here, we present the ‘Op’en ‘Se’gmentation ‘F’ramework (OpSeF), a Python framework for deep learning-based instance segmentation. OpSeF aims at facilitating the collaboration of biomedical users with experienced image analysts. It builds on the analysts’ knowledge in Python, machine learning, and workflow design to solve complex analysis tasks at any scale in a reproducible, well-documented way. OpSeF defines standard inputs and outputs, thereby facilitating modular workflow design and interoperability with other software. Users play an important role in problem definition, quality control, and manual refinement of results. All analyst tasks are optimized for deployment on Linux workstations or GPU clusters, all user tasks may be performed on any laptop in ImageJ.OpSeF semi-automates preprocessing, convolutional neural network (CNN)-based segmentation in 2D or 3D, and post-processing. It facilitates benchmarking of multiple models in parallel. OpSeF streamlines the optimization of parameters for pre- and post-processing such, that an available model may frequently be used without retraining. Even if sufficiently good results are not achievable with this approach, intermediate results can inform the analysts in the selection of the most promising CNN-architecture in which the biomedical user might invest the effort of manually labeling training data.We provide Jupyter notebooks that document sample workflows based on various image collections. Analysts may find these notebooks useful to illustrate common segmentation challenges, as they prepare the advanced user for gradually taking over some of their tasks and completing their projects independently. The notebooks may also be used to explore the analysis options available within OpSeF in an interactive way and to document and share final workflows.Currently, three mechanistically distinct CNN-based segmentation methods, the U-Net implementation used in Cellprofiler 3.0, StarDist, and Cellpose have been integrated within OpSeF. The addition of new networks requires little, the addition of new models requires no coding skills. Thus, OpSeF might soon become both an interactive model repository, in which pre-trained models might be shared, evaluated, and reused with ease.

Download Full-text

Uncertainty Quantification and Optimization of Deep Learning for Fracture Recognition

10.2118/204863-ms ◽

2021 ◽

Author(s):

Ryan Santoso ◽

Xupeng He ◽

Marwa Alsinan ◽

Hyung Kwak ◽

Hussein Hoteit

Keyword(s):

Deep Learning ◽

Uncertainty Quantification ◽

Learning Rate ◽

Batch Size ◽

Global Maximum ◽

Fractured Reservoir ◽

Training Set ◽

Fast Learning ◽

Learning Rates ◽

Aleatoric Uncertainty

Abstract Automatic fracture recognition from borehole images or outcrops is applicable for the construction of fractured reservoir models. Deep learning for fracture recognition is subject to uncertainty due to sparse and imbalanced training set, and random initialization. We present a new workflow to optimize a deep learning model under uncertainty using U-Net. We consider both epistemic and aleatoric uncertainty of the model. We propose a U-Net architecture by inserting dropout layer after every "weighting" layer. We vary the dropout probability to investigate its impact on the uncertainty response. We build the training set and assign uniform distribution for each training parameter, such as the number of epochs, batch size, and learning rate. We then perform uncertainty quantification by running the model multiple times for each realization, where we capture the aleatoric response. In this approach, which is based on Monte Carlo Dropout, the variance map and F1-scores are utilized to evaluate the need to craft additional augmentations or stop the process. This work demonstrates the existence of uncertainty within the deep learning caused by sparse and imbalanced training sets. This issue leads to unstable predictions. The overall responses are accommodated in the form of aleatoric uncertainty. Our workflow utilizes the uncertainty response (variance map) as a measure to craft additional augmentations in the training set. High variance in certain features denotes the need to add new augmented images containing the features, either through affine transformation (rotation, translation, and scaling) or utilizing similar images. The augmentation improves the accuracy of the prediction, reduces the variance prediction, and stabilizes the output. Architecture, number of epochs, batch size, and learning rate are optimized under a fixed-uncertain training set. We perform the optimization by searching the global maximum of accuracy after running multiple realizations. Besides the quality of the training set, the learning rate is the heavy-hitter in the optimization process. The selected learning rate controls the diffusion of information in the model. Under the imbalanced condition, fast learning rates cause the model to miss the main features. The other challenge in fracture recognition on a real outcrop is to optimally pick the parental images to generate the initial training set. We suggest picking images from multiple sides of the outcrop, which shows significant variations of the features. This technique is needed to avoid long iteration within the workflow. We introduce a new approach to address the uncertainties associated with the training process and with the physical problem. The proposed approach is general in concept and can be applied to various deep-learning problems in geoscience.

Download Full-text

Deep Learning Research and Development Platform: Characterizing and Scheduling with QoS Guarantees on GPU Clusters

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2019.2931558 ◽

2020 ◽

Vol 31 (1) ◽

pp. 34-50 ◽

Cited By ~ 4

Author(s):

Zhaoyun Chen ◽

Wei Quan ◽

Mei Wen ◽

Jianbin Fang ◽

Jie Yu ◽

...

Keyword(s):

Deep Learning ◽

Research And Development ◽

Development Platform ◽

Qos Guarantees ◽

Gpu Clusters ◽

Learning Research

Download Full-text

Poster Abstract: Deep Learning Workloads Scheduling with Reinforcement Learning on GPU Clusters

IEEE INFOCOM 2019 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) ◽

10.1109/infcomw.2019.8845276 ◽

2019 ◽

Cited By ~ 2

Author(s):

Zhaoyun Chen ◽

Lei Luo ◽

Wei Quan ◽

Mei Wen ◽

Chunyuan Zhang

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Poster Abstract ◽

Gpu Clusters

Download Full-text

Deep Learning Approach for Building Detection Using LiDAR–Orthophoto Fusion

Journal of Sensors ◽

10.1155/2018/7212307 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 14

Author(s):

Faten Hamed Nahhas ◽

Helmi Z. M. Shafri ◽

Maher Ibrahim Sameen ◽

Biswajeet Pradhan ◽

Shattri Mansor

Keyword(s):

Deep Learning ◽

Dimensionality Reduction ◽

Batch Size ◽

Support Vector ◽

Detection Accuracy ◽

Building Detection ◽

Testing Area ◽

Building Recognition ◽

Proposed Model ◽

High Level

This paper reports on a building detection approach based on deep learning (DL) using the fusion of Light Detection and Ranging (LiDAR) data and orthophotos. The proposed method utilized object-based analysis to create objects, a feature-level fusion, an autoencoder-based dimensionality reduction to transform low-level features into compressed features, and a convolutional neural network (CNN) to transform compressed features into high-level features, which were used to classify objects into buildings and background. The proposed architecture was optimized for the grid search method, and its sensitivity to hyperparameters was analyzed and discussed. The proposed model was evaluated on two datasets selected from an urban area with different building types. Results show that the dimensionality reduction by the autoencoder approach from 21 features to 10 features can improve detection accuracy from 86.06% to 86.19% in the working area and from 77.92% to 78.26% in the testing area. The sensitivity analysis also shows that the selection of the hyperparameter values of the model significantly affects detection accuracy. The best hyperparameters of the model are 128 filters in the CNN model, the Adamax optimizer, 10 units in the fully connected layer of the CNN model, a batch size of 8, and a dropout of 0.2. These hyperparameters are critical to improving the generalization capacity of the model. Furthermore, comparison experiments with the support vector machine (SVM) show that the proposed model with or without dimensionality reduction outperforms the SVM models in the working area. However, the SVM model achieves better accuracy in the testing area than the proposed model without dimensionality reduction. This study generally shows that the use of an autoencoder in DL models can improve the accuracy of building recognition in fused LiDAR–orthophoto data.

Download Full-text

Indirect Stochastic Gradient Quantization and Its Application in Distributed Deep Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5707 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3113-3120 ◽

Cited By ~ 1

Author(s):

Afshin Abdi ◽

Faramarz Fekri

Keyword(s):

Deep Learning ◽

Compression Ratio ◽

Reconstruction Error ◽

Stochastic Gradient ◽

Batch Size ◽

Model Parameters ◽

Direct Compression ◽

Bit Rate ◽

Backpropagation Algorithm ◽

Distributed Training

Transmitting the gradients or model parameters is a critical bottleneck in distributed training of large models. To mitigate this issue, we propose an indirect quantization and compression of stochastic gradients (SG) via factorization. The gist of the idea is that, in contrast to the direct compression methods, we focus on the factors in SGs, i.e., the forward and backward signals in the backpropagation algorithm. We observe that these factors are correlated and generally sparse in most deep models. This gives rise to rethinking of the approaches for quantization and compression of gradients with the ultimate goal of minimizing the error in the final computed gradients subject to the desired communication constraints. We have proposed and theoretically analyzed different indirect SG quantization (ISGQ) methods. The proposed ISGQ reduces the reconstruction error in SGs compared to the direct quantization methods with the same number of quantization bits. Moreover, it can achieve compression gains of more than 100, while the existing traditional quantization schemes can achieve compression ratio of at most 32 (quantizing to 1 bit). Further, for a fixed total batch-size, the required transmission bit-rate per worker decreases in ISGQ as the number of workers increases.

Download Full-text

Object Detection and Tracking using Faster R-CNN

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.c5580.098319 ◽

2019 ◽

Vol 8 (3) ◽

pp. 4894-4900

Keyword(s):

Deep Learning ◽

Object Detection ◽

Batch Size ◽

Multiple Objects ◽

Backbone Networks ◽

Object Detection And Tracking ◽

Detection And Tracking ◽

Size Number ◽

Computer Vision Applications ◽

Deep Learning Model

This paper uses a deep learning model called Faster R-CNN to detect and track objects in images. Two backbone networks such as ResNet-101 and VGG-16 are tested on a self-created dataset and PASCAL VOC dataset. Intersection over union (IoU) technique is used for the purpose of object tracking. The impacts of batch size, number of iterations and learning rate are analysed. The paper finds that ResNet-101 outperforms VGG-16 significantly by 13% on test data. This finding reinforces that deeper network is better in feature extractions and generalizations. IoU is able to track multiple objects and can identify the loss of track. The processing of frames per second is found to be 5 fps. The study has implications for many computer vision applications. For example, the deep learning based object detection and tracking can either augment the capability of LiDARs and Sensors or become an alternative to them in self-driving vehicles.

Download Full-text

Deep Learning-Based In Vitro Detection Method for Cellular Impurities in Human Cell-Processed Therapeutic Products

Applied Sciences ◽

10.3390/app11209755 ◽

2021 ◽

Vol 11 (20) ◽

pp. 9755

Author(s):

Yasunari Matsuzaka ◽

Shinji Kusakawa ◽

Yoshihiro Uesawa ◽

Yoji Sato ◽

Mitsutoshi Satoh

Keyword(s):

Deep Learning ◽

Human Cell ◽

Late Stage ◽

Cell Biology ◽

Prediction Models ◽

Batch Size ◽

Life Threatening ◽

Induced Pluripotent

Automated detection of impurities is in demand for evaluating the quality and safety of human cell-processed therapeutic products in regenerative medicine. Deep learning (DL) is a powerful method for classifying and recognizing images in cell biology, diagnostic medicine, and other fields because it automatically extracts the features from complex cell morphologies. In the present study, we construct prediction models that recognize cancer-cell contamination in continuous long-term (four-day) cell cultures. After dividing the whole dataset into Early- and Late-stage cell images, we found that Late-stage images improved the DL performance. The performance was further improved by optimizing the DL hyperparameters (batch size and learning rate). These findings are first report for the implement of DL-based systems in disease cell-type classification of human cell-processed therapeutic products (hCTPs), that are expected to enable the rapid, automatic classification of induced pluripotent stem cells and other cell treatments for life-threatening or chronic diseases.

Download Full-text