Addressing the issue of processing element under-utilization in general-purpose systolic deep learning accelerators

Deep learning techniques are being increasingly used in the scientific community as a consequence of the high computational capacity of current systems and the increase in the amount of data available as a result of the digitalisation of society in general and the industrial world in particular. In addition, the immersion of the field of edge computing, which focuses on integrating artificial intelligence as close as possible to the client, makes it possible to implement systems that act in real time without the need to transfer all of the data to centralised servers. The combination of these two concepts can lead to systems with the capacity to make correct decisions and act based on them immediately and in situ. Despite this, the low capacity of embedded systems greatly hinders this integration, so the possibility of being able to integrate them into a wide range of micro-controllers can be a great advantage. This paper contributes with the generation of an environment based on Mbed OS and TensorFlow Lite to be embedded in any general purpose embedded system, allowing the introduction of deep learning architectures. The experiments herein prove that the proposed system is competitive if compared to other commercial systems.

Download Full-text

Region-of-Interest-Based Cardiac Image Segmentation with Deep Learning

Applied Sciences ◽

10.3390/app11041965 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1965

Author(s):

Raul-Ronald Galea ◽

Laura Diosan ◽

Anca Andreica ◽

Loredana Popa ◽

Simona Manole ◽

...

Keyword(s):

Image Segmentation ◽

Deep Learning ◽

Diagnostic System ◽

Region Of Interest ◽

General Purpose ◽

Medical Image Segmentation ◽

Dice Similarity Coefficient ◽

Learning Methods ◽

Study Results ◽

Whole Heart

Despite the promising results obtained by deep learning methods in the field of medical image segmentation, lack of sufficient data always hinders performance to a certain degree. In this work, we explore the feasibility of applying deep learning methods on a pilot dataset. We present a simple and practical approach to perform segmentation in a 2D, slice-by-slice manner, based on region of interest (ROI) localization, applying an optimized training regime to improve segmentation performance from regions of interest. We start from two popular segmentation networks, the preferred model for medical segmentation, U-Net, and a general-purpose model, DeepLabV3+. Furthermore, we show that ensembling of these two fundamentally different architectures brings constant benefits by testing our approach on two different datasets, the publicly available ACDC challenge, and the imATFIB dataset from our in-house conducted clinical study. Results on the imATFIB dataset show that the proposed approach performs well with the provided training volumes, achieving an average Dice Similarity Coefficient of the whole heart of 89.89% on the validation set. Moreover, our algorithm achieved a mean Dice value of 91.87% on the ACDC validation, being comparable to the second best-performing approach on the challenge. Our approach provides an opportunity to serve as a building block of a computer-aided diagnostic system in a clinical setting.

Download Full-text

Unsupervised Deep Learning via Affinity Diffusion

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6757 ◽

2020 ◽

Vol 34 (07) ◽

pp. 11029-11036

Author(s):

Jiabo Huang ◽

Qi Dong ◽

Shaogang Gong ◽

Xiatian Zhu

Keyword(s):

Deep Learning ◽

State Of The Art ◽

General Purpose ◽

Training Data ◽

Learning Approach ◽

Model Learning ◽

Feature Representations ◽

Discriminative Feature ◽

Training Samples ◽

Unsupervised Deep Learning

Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.

Download Full-text

Revisiting Resource Management for Deep Learning Framework

Electronics ◽

10.3390/electronics8030327 ◽

2019 ◽

Vol 8 (3) ◽

pp. 327

Author(s):

Erci Xu ◽

Shanshan Li

Keyword(s):

Deep Learning ◽

Resource Management ◽

General Purpose ◽

Resource Assignment ◽

Data Intensive ◽

Learning Framework ◽

Network Layers ◽

Resource Managers ◽

Input Dataset ◽

Efficient Resource

The recent adoption of deep learning for diverse applications has required infrastructures to be scaled horizontally and hybrid configured vertically. As a result, efficient resource management for distributed deep learning (DDL) frameworks is becoming increasingly important. However, existing techniques for scaling DDL applications rely on general-purpose resource managers originally designed for data intensive applications. In contrast, DDL applications present unique challenges for resource management as compared to traditional big data frameworks, such as a different master–slave communication paradigm, deeper ML models that are more computationally and network bounded than I/O, the use of heterogeneous resources (e.g., GPUs, TPUs) and the variable memory requirement. In addition, most DDL frameworks require data scientists to manually configure the task placement and resource assignment to execute DDL models. In this paper, we present Dike, an automatic resource management framework that transparently makes scheduling decisions for placement and resource assignment to DDL workers and parameter servers, based on the unique characteristics of the DDL model (number and type of parameters and neural network layers), node heterogeneity (CPU/GPU ratios), and input dataset. We implemented Dike as a resource manager for DDL jobs in Tensorflow on top of Apache Mesos. We show that Dike significantly outperformed both manual and static assignment of resource offers to Tensorflow tasks, and achieved at least 95% of the optimal throughput for different DDL models such as ResNet and Inception.

Download Full-text

A deep learning approach to automatic gingivitis screening based on classification and localization in RGB photos

Scientific Reports ◽

10.1038/s41598-021-96091-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Wen Li ◽

Yuan Liang ◽

Xuan Zhang ◽

Chao Liu ◽

Lei He ◽

...

Keyword(s):

Deep Learning ◽

Low Income ◽

Dental Health ◽

Area Under The Curve ◽

Cost Effective ◽

General Purpose ◽

Dental Calculus ◽

Task Learning ◽

Large Populations ◽

Dental Diseases

AbstractRoutine dental visit is the most common approach to detect the gingivitis. However, such diagnosis can sometimes be unavailable due to the limited medical resources in certain areas and costly for low-income populations. This study proposes to screen the existence of gingivitis and its irritants, i.e., dental calculus and soft deposits, from oral photos with a novel Multi-Task Learning convolutional neural network (CNN) model. The study can be meaningful for promoting the public dental health, since it sheds light on a cost-effective and ubiquitous solution for the early detection of dental issues. With 625 patients included in this study, the classification Area Under the Curve (AUC) for detecting gingivitis, dental calculus and soft deposits were 87.11%, 80.11%, and 78.57%, respectively; Meanwhile, according to our experiments, the model can also localize the three types of findings on oral photos with moderate accuracy, which enables the model to explain the screen results. By comparing to general-purpose CNNs, we showed our model significantly outperformed on both classification and localization tasks, which indicates the effectiveness of Multi-Task Learning on dental disease detection. In all, the study shows the potential of deep learning for enabling the screening of dental diseases among large populations.

Download Full-text

Development of General-Purpose Processing Element and Network-Based Dataflow Processing System: Part Two

2009 Fourth International Conference on Frontier of Computer Science and Technology ◽

10.1109/fcst.2009.80 ◽

2009 ◽

Author(s):

Tatsuya Sakuma ◽

Masato Honda ◽

Tomohiro Sugahara ◽

Arata Shinozaki ◽

Takayuki Nakatomi ◽

...

Keyword(s):

Processing System ◽

General Purpose ◽

Processing Element ◽

System Part

Download Full-text

C++ based General-purpose Open Source Deep Learning Framework, WICWIU

Journal of KIISE ◽

10.5626/jok.2019.46.3.253 ◽

2019 ◽

Vol 46 (3) ◽

pp. 253-259

Author(s):

Chunmyong Park ◽

Jeewoong Kim ◽

Yunho Kee ◽

Jihyeon Kim ◽

Seonggyeol Yoon ◽

...

Keyword(s):

Deep Learning ◽

Open Source ◽

General Purpose ◽

Learning Framework

Download Full-text

A high performance general purpose processing element for avionic applications

Proceedings of the IEEE 1991 National Aerospace and Electronics Conference NAECON 1991 ◽

10.1109/naecon.1991.165738 ◽

2002 ◽

Author(s):

M.S. Russell ◽

J.C. Hansen ◽

L.J. Merboth

Keyword(s):

High Performance ◽

General Purpose ◽

Processing Element

Download Full-text

A general purpose intelligent surveillance system for mobile devices using Deep Learning

2016 International Joint Conference on Neural Networks (IJCNN) ◽

10.1109/ijcnn.2016.7727563 ◽

2016 ◽

Cited By ~ 3

Author(s):

Antreas Antoniou ◽

Plamen Angelov

Keyword(s):

Deep Learning ◽

Mobile Devices ◽

Surveillance System ◽

General Purpose ◽

Intelligent Surveillance

Download Full-text

Fast activation maximization for molecular sequence design

BMC Bioinformatics ◽

10.1186/s12859-021-04437-5 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Johannes Linder ◽

Georg Seelig

Keyword(s):

Deep Learning ◽

Molecular Design ◽

Protein Sequences ◽

Input Sequence ◽

General Purpose ◽

Design Tool ◽

Learning Models ◽

Molecular Sequence ◽

Regularization Techniques ◽

Fast Activation

Abstract Background Optimization of DNA and protein sequences based on Machine Learning models is becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation, which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, the current version of the method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence. Results Here, we introduce Fast SeqProp, an improved activation maximization method that combines straight-through approximation with normalization across the parameters of the input sequence distribution. Fast SeqProp overcomes bottlenecks in earlier methods arising from input parameters becoming skewed during optimization. Compared to prior methods, Fast SeqProp results in up to 100-fold faster convergence while also finding improved fitness optima for many applications. We demonstrate Fast SeqProp’s capabilities by designing DNA and protein sequences for six deep learning predictors, including a protein structure predictor. Conclusions Fast SeqProp offers a reliable and efficient method for general-purpose sequence optimization through a differentiable fitness predictor. As demonstrated on a variety of deep learning models, the method is widely applicable, and can incorporate various regularization techniques to maintain confidence in the sequence designs. As a design tool, Fast SeqProp may aid in the development of novel molecules, drug therapies and vaccines.

Download Full-text