scholarly journals Review of Deep Learning Methods in Robotic Grasp Detection

2018 ◽  
Vol 2 (3) ◽  
pp. 57 ◽  
Author(s):  
Shehan Caldera ◽  
Alexander Rassau ◽  
Douglas Chai

For robots to attain more general-purpose utility, grasping is a necessary skill to master. Such general-purpose robots may use their perception abilities to visually identify grasps for a given object. A grasp describes how a robotic end-effector can be arranged to securely grab an object and successfully lift it without slippage. Traditionally, grasp detection requires expert human knowledge to analytically form the task-specific algorithm, but this is an arduous and time-consuming approach. During the last five years, deep learning methods have enabled significant advancements in robotic vision, natural language processing, and automated driving applications. The successful results of these methods have driven robotics researchers to explore the use of deep learning methods in task-generalised robotic applications. This paper reviews the current state-of-the-art in regards to the application of deep learning methods to generalised robotic grasping and discusses how each element of the deep learning approach has improved the overall performance of robotic grasp detection. Several of the most promising approaches are evaluated and the most suitable for real-time grasp detection is identified as the one-shot detection method. The availability of suitable volumes of appropriate training data is identified as a major obstacle for effective utilisation of the deep learning approaches, and the use of transfer learning techniques is proposed as a potential mechanism to address this. Finally, current trends in the field and future potential research directions are discussed.

Author(s):  
Shehan Caldera ◽  
Alexander Rassau ◽  
Douglas Chai

In order for robots to attain more general-purpose utility, grasping is a necessary skill to master. Such general-purpose robots may use their perception abilities in order to visually identify grasps for a given object. A grasp describes how a robotic end-effector can be arranged on top of an object to securely grab it between the robotic gripper and successfully lift it without slippage. Traditionally, grasp detection requires expert human knowledge to analytically form the task-specific algorithm, but this is an arduous and time-consuming approach. During the last five years, deep learning methods have enabled significant advancements in robotic vision, natural language processing, and automated driving applications. The successful results of these methods have driven robotics researchers to explore the application of deep learning methods in task generalised robotic applications. This paper reviews the current state-of-the-art in regards to the application of deep learning methods to generalised robotic grasping and discusses how each element of the deep learning approach has improved the overall performance of robotic grasp detection. A number of the most promising approaches are evaluated and the most successful for grasp detection is identified as the one-shot detection method. The availability of suitable volumes of appropriate training data is identified as a major obstacle for effective utilisation of the deep learning approaches, and the use of transfer learning techniques is identified as a potential mechanism to address this. Finally, current trends in the field and future potential research directions are discussed.


Electronics ◽  
2021 ◽  
Vol 10 (12) ◽  
pp. 1372
Author(s):  
Sanjanasri JP ◽  
Vijay Krishna Menon ◽  
Soman KP ◽  
Rajendran S ◽  
Agnieszka Wolk

Linguists have been focused on a qualitative comparison of the semantics from different languages. Evaluation of the semantic interpretation among disparate language pairs like English and Tamil is an even more formidable task than for Slavic languages. The concept of word embedding in Natural Language Processing (NLP) has enabled a felicitous opportunity to quantify linguistic semantics. Multi-lingual tasks can be performed by projecting the word embeddings of one language onto the semantic space of the other. This research presents a suite of data-efficient deep learning approaches to deduce the transfer function from the embedding space of English to that of Tamil, deploying three popular embedding algorithms: Word2Vec, GloVe and FastText. A novel evaluation paradigm was devised for the generation of embeddings to assess their effectiveness, using the original embeddings as ground truths. Transferability across other target languages of the proposed model was assessed via pre-trained Word2Vec embeddings from Hindi and Chinese languages. We empirically prove that with a bilingual dictionary of a thousand words and a corresponding small monolingual target (Tamil) corpus, useful embeddings can be generated by transfer learning from a well-trained source (English) embedding. Furthermore, we demonstrate the usability of generated target embeddings in a few NLP use-case tasks, such as text summarization, part-of-speech (POS) tagging, and bilingual dictionary induction (BDI), bearing in mind that those are not the only possible applications.


2021 ◽  
Vol 11 (4) ◽  
pp. 1965
Author(s):  
Raul-Ronald Galea ◽  
Laura Diosan ◽  
Anca Andreica ◽  
Loredana Popa ◽  
Simona Manole ◽  
...  

Despite the promising results obtained by deep learning methods in the field of medical image segmentation, lack of sufficient data always hinders performance to a certain degree. In this work, we explore the feasibility of applying deep learning methods on a pilot dataset. We present a simple and practical approach to perform segmentation in a 2D, slice-by-slice manner, based on region of interest (ROI) localization, applying an optimized training regime to improve segmentation performance from regions of interest. We start from two popular segmentation networks, the preferred model for medical segmentation, U-Net, and a general-purpose model, DeepLabV3+. Furthermore, we show that ensembling of these two fundamentally different architectures brings constant benefits by testing our approach on two different datasets, the publicly available ACDC challenge, and the imATFIB dataset from our in-house conducted clinical study. Results on the imATFIB dataset show that the proposed approach performs well with the provided training volumes, achieving an average Dice Similarity Coefficient of the whole heart of 89.89% on the validation set. Moreover, our algorithm achieved a mean Dice value of 91.87% on the ACDC validation, being comparable to the second best-performing approach on the challenge. Our approach provides an opportunity to serve as a building block of a computer-aided diagnostic system in a clinical setting.


Cancers ◽  
2021 ◽  
Vol 13 (11) ◽  
pp. 2764
Author(s):  
Xin Yu Liew ◽  
Nazia Hameed ◽  
Jeremie Clos

A computer-aided diagnosis (CAD) expert system is a powerful tool to efficiently assist a pathologist in achieving an early diagnosis of breast cancer. This process identifies the presence of cancer in breast tissue samples and the distinct type of cancer stages. In a standard CAD system, the main process involves image pre-processing, segmentation, feature extraction, feature selection, classification, and performance evaluation. In this review paper, we reviewed the existing state-of-the-art machine learning approaches applied at each stage involving conventional methods and deep learning methods, the comparisons within methods, and we provide technical details with advantages and disadvantages. The aims are to investigate the impact of CAD systems using histopathology images, investigate deep learning methods that outperform conventional methods, and provide a summary for future researchers to analyse and improve the existing techniques used. Lastly, we will discuss the research gaps of existing machine learning approaches for implementation and propose future direction guidelines for upcoming researchers.


Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 667
Author(s):  
Wei Chen ◽  
Qiang Sun ◽  
Xiaomin Chen ◽  
Gangcai Xie ◽  
Huiqun Wu ◽  
...  

The automated classification of heart sounds plays a significant role in the diagnosis of cardiovascular diseases (CVDs). With the recent introduction of medical big data and artificial intelligence technology, there has been an increased focus on the development of deep learning approaches for heart sound classification. However, despite significant achievements in this field, there are still limitations due to insufficient data, inefficient training, and the unavailability of effective models. With the aim of improving the accuracy of heart sounds classification, an in-depth systematic review and an analysis of existing deep learning methods were performed in the present study, with an emphasis on the convolutional neural network (CNN) and recurrent neural network (RNN) methods developed over the last five years. This paper also discusses the challenges and expected future trends in the application of deep learning to heart sounds classification with the objective of providing an essential reference for further study.


Electronics ◽  
2020 ◽  
Vol 9 (11) ◽  
pp. 1757
Author(s):  
María J. Gómez-Silva ◽  
Arturo de la Escalera ◽  
José M. Armingol

Recognizing the identity of a query individual in a surveillance sequence is the core of Multi-Object Tracking (MOT) and Re-Identification (Re-Id) algorithms. Both tasks can be addressed by measuring the appearance affinity between people observations with a deep neural model. Nevertheless, the differences in their specifications and, consequently, in the characteristics and constraints of the available training data for each one of these tasks, arise from the necessity of employing different learning approaches to attain each one of them. This article offers a comparative view of the Double-Margin-Contrastive and the Triplet loss function, and analyzes the benefits and drawbacks of applying each one of them to learn an Appearance Affinity model for Tracking and Re-Identification. A batch of experiments have been conducted, and their results support the hypothesis concluded from the presented study: Triplet loss function is more effective than the Contrastive one when an Re-Id model is learnt, and, conversely, in the MOT domain, the Contrastive loss can better discriminate between pairs of images rendering the same person or not.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Sunil Kumar Prabhakar ◽  
Dong-Ok Won

To unlock information present in clinical description, automatic medical text classification is highly useful in the arena of natural language processing (NLP). For medical text classification tasks, machine learning techniques seem to be quite effective; however, it requires extensive effort from human side, so that the labeled training data can be created. For clinical and translational research, a huge quantity of detailed patient information, such as disease status, lab tests, medication history, side effects, and treatment outcomes, has been collected in an electronic format, and it serves as a valuable data source for further analysis. Therefore, a huge quantity of detailed patient information is present in the medical text, and it is quite a huge challenge to process it efficiently. In this work, a medical text classification paradigm, using two novel deep learning architectures, is proposed to mitigate the human efforts. The first approach is that a quad channel hybrid long short-term memory (QC-LSTM) deep learning model is implemented utilizing four channels, and the second approach is that a hybrid bidirectional gated recurrent unit (BiGRU) deep learning model with multihead attention is developed and implemented successfully. The proposed methodology is validated on two medical text datasets, and a comprehensive analysis is conducted. The best results in terms of classification accuracy of 96.72% is obtained with the proposed QC-LSTM deep learning model, and a classification accuracy of 95.76% is obtained with the proposed hybrid BiGRU deep learning model.


2020 ◽  
Vol 34 (07) ◽  
pp. 11029-11036
Author(s):  
Jiabo Huang ◽  
Qi Dong ◽  
Shaogang Gong ◽  
Xiatian Zhu

Convolutional neural networks (CNNs) have achieved unprecedented success in a variety of computer vision tasks. However, they usually rely on supervised model learning with the need for massive labelled training data, limiting dramatically their usability and deployability in real-world scenarios without any labelling budget. In this work, we introduce a general-purpose unsupervised deep learning approach to deriving discriminative feature representations. It is based on self-discovering semantically consistent groups of unlabelled training samples with the same class concepts through a progressive affinity diffusion process. Extensive experiments on object image classification and clustering show the performance superiority of the proposed method over the state-of-the-art unsupervised learning models using six common image recognition benchmarks including MNIST, SVHN, STL10, CIFAR10, CIFAR100 and ImageNet.


2021 ◽  
Vol 13 (19) ◽  
pp. 3859
Author(s):  
Joby M. Prince Czarnecki ◽  
Sathishkumar Samiappan ◽  
Meilun Zhou ◽  
Cary Daniel McCraine ◽  
Louis L. Wasson

The radiometric quality of remotely sensed imagery is crucial for precision agriculture applications because estimations of plant health rely on the underlying quality. Sky conditions, and specifically shadowing from clouds, are critical determinants in the quality of images that can be obtained from low-altitude sensing platforms. In this work, we first compare common deep learning approaches to classify sky conditions with regard to cloud shadows in agricultural fields using a visible spectrum camera. We then develop an artificial-intelligence-based edge computing system to fully automate the classification process. Training data consisting of 100 oblique angle images of the sky were provided to a convolutional neural network and two deep residual neural networks (ResNet18 and ResNet34) to facilitate learning two classes, namely (1) good image quality expected, and (2) degraded image quality expected. The expectation of quality stemmed from the sky condition (i.e., density, coverage, and thickness of clouds) present at the time of the image capture. These networks were tested using a set of 13,000 images. Our results demonstrated that ResNet18 and ResNet34 classifiers produced better classification accuracy when compared to a convolutional neural network classifier. The best overall accuracy was obtained by ResNet34, which was 92% accurate, with a Kappa statistic of 0.77. These results demonstrate a low-cost solution to quality control for future autonomous farming systems that will operate without human intervention and supervision.


Sign in / Sign up

Export Citation Format

Share Document