LiDAR DNN based self-attitude estimation with learning landscape regularities

AbstractThis paper presents an EKF (extended Kalman filter) based self-attitude estimation method with a LiDAR DNN (deep neural network) learning landscape regularities. The proposed DNN infers the gravity direction from LiDAR data. The point cloud obtained with the LiDAR is transformed to a depth image to be input to the network. It is pre-trained with large synthetic datasets. They are collected in a flight simulator because various gravity vectors can be easily obtained, although this study focuses not only on UAVs. Fine-tuning with datasets collected with real sensors is done after the pre-training. Data augmentation is processed during the training in order to provide higher general versatility. The proposed method integrates angular rates from a gyroscope and the DNN outputs in an EKF. Static validations are performed to show the DNN can infer the gravity direction. Dynamic validations are performed to show the DNN can be used in real-time estimation. Some conventional methods are implemented for comparison.

Download Full-text

Deep Learning-Based Differentiation between Mucinous Cystic Neoplasm and Serous Cystic Neoplasm in the Pancreas Using Endoscopic Ultrasonography

Diagnostics ◽

10.3390/diagnostics11061052 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1052

Author(s):

Leang Sim Nguon ◽

Kangwon Seo ◽

Jung-Hyun Lim ◽

Tae-Jun Song ◽

Sung-Hyun Cho ◽

...

Keyword(s):

Decision Making ◽

Deep Learning ◽

Network Model ◽

Endoscopic Ultrasonography ◽

Data Augmentation ◽

Clinical Information ◽

Training Data ◽

Fine Tuning ◽

Cystic Neoplasm ◽

Cystic Neoplasms

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

Download Full-text

OmniPD: One-Step Person Detection in Top-View Omnidirectional Indoor Scenes

Current Directions in Biomedical Engineering ◽

10.1515/cdbme-2019-0061 ◽

2019 ◽

Vol 5 (1) ◽

pp. 239-244

Author(s):

Jingrui Yu ◽

Roman Seidel ◽

Gangolf Hirtz

Keyword(s):

Data Augmentation ◽

Training Data ◽

Fine Tuning ◽

Single Shot ◽

Augmentation Techniques ◽

Indoor Scenes ◽

One Step ◽

Bounding Boxes ◽

Omnidirectional Images ◽

Fine Tune

AbstractWe propose a one-step person detector for topview omnidirectional indoor scenes based on convolutional neural networks (CNNs). While state of the art person detectors reach competitive results on perspective images, missing CNN architectures as well as training data that follows the distortion of omnidirectional images makes current approaches not applicable to our data. The method predicts bounding boxes of multiple persons directly in omnidirectional images without perspective transformation, which reduces overhead of pre- and post-processing and enables realtime performance. The basic idea is to utilize transfer learning to fine-tune CNNs trained on perspective images with data augmentation techniques for detection in omnidirectional images. We fine-tune two variants of Single Shot MultiBox detectors (SSDs). The first one uses Mobilenet v1 FPN as feature extractor (moSSD). The second one uses ResNet50 v1 FPN (resSSD). Both models are pre-trained on Microsoft Common Objects in Context (COCO) dataset. We fine-tune both models on PASCAL VOC07 and VOC12 datasets, specifically on class person. Random 90-degree rotation and random vertical flipping are used for data augmentation in addition to the methods proposed by original SSD. We reach an average precision (AP) of 67.3%with moSSD and 74.9%with resSSD on the evaluation dataset. To enhance the fine-tuning process, we add a subset of HDA Person dataset and a subset of PIROPO database and reduce the number of perspective images to PASCAL VOC07. The AP rises to 83.2% for moSSD and 86.3% for resSSD, respectively. The average inference speed is 28 ms per image for moSSD and 38 ms per image for resSSD using Nvidia Quadro P6000. Our method is applicable to other CNN-based object detectors and can potentially generalize for detecting other objects in omnidirectional images.

Download Full-text

Transfer learning using AlexNet Convolutional Neural Network for Face Recognition

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k7776.0991120 ◽

2020 ◽

Vol 9 (11) ◽

pp. 285-294

Keyword(s):

Neural Network ◽

Face Recognition ◽

Transfer Learning ◽

Data Augmentation ◽

Recognition System ◽

Training Data ◽

Fine Tuning ◽

Data Sets ◽

Learning Method ◽

Face Recognition System

This research is aimed to achieve high-precision accuracy and for face recognition system. Convolution Neural Network is one of the Deep Learning approaches and has demonstrated excellent performance in many fields, including image recognition of a large amount of training data (such as ImageNet). In fact, hardware limitations and insufficient training data-sets are the challenges of getting high performance. Therefore, in this work the Deep Transfer Learning method using AlexNet pre-trained CNN is proposed to improve the performance of the face-recognition system even for a smaller number of images. The transfer learning method is used to fine-tuning on the last layer of AlexNet CNN model for new classification tasks. The data augmentation (DA) technique also proposed to minimize the over-fitting problem during Deep transfer learning training and to improve accuracy. The results proved the improvement in over-fitting and in performance after using the data augmentation technique. All the experiments were tested on UTeMFD, GTFD, and CASIA-Face V5 small data-sets. As a result, the proposed system achieved a high accuracy as 100% on UTeMFD, 96.67% on GTFD, and 95.60% on CASIA-Face V5 in less than 0.05 seconds of recognition time.

Download Full-text

On Improving an Already Competitive Segmentation Algorithm for the Cell Tracking Challenge - Lessons Learned

10.1101/2021.06.26.450019 ◽

2021 ◽

Author(s):

Tim Scherr ◽

Katharina Loeffler ◽

Oliver Neumann ◽

Ralf Mikut

Keyword(s):

Cell Tracking ◽

Data Augmentation ◽

Signal To Noise Ratio ◽

Data Representation ◽

Lessons Learned ◽

Training Data ◽

Fine Tuning ◽

Data Sets ◽

Cell Nuclei ◽

Distance Map

The virtually error-free segmentation and tracking of densely packed cells and cell nuclei is still a challenging task. Especially in low-resolution and low signal-to-noise-ratio microscopy images erroneously merged and missing cells are common segmentation errors making the subsequent cell tracking even more difficult. In 2020, we successfully participated as team KIT-Sch-GE (1) in the 5th edition of the ISBI Cell Tracking Challenge. With our deep learning-based distance map regression segmentation and our graph-based cell tracking, we achieved multiple top 3 rankings on the diverse data sets. In this manuscript, we show how our approach can be further improved by using another optimizer and by fine-tuning training data augmentation parameters, learning rate schedules, and the training data representation. The fine-tuned segmentation in combination with an improved tracking enabled to further improve our performance in the 6th edition of the Cell Tracking Challenge 2021 as team KIT-Sch-GE (2).

Download Full-text

Robust Wireless Communication for Small Exploration Rovers Equipped with Multiple Antennas by Estimating Attitudes of Rovers in Several Experimental Environments

Journal of Robotics and Mechatronics ◽

10.20965/jrm.2017.p0864 ◽

2017 ◽

Vol 29 (5) ◽

pp. 864-876 ◽

Cited By ~ 1

Author(s):

Masahiko Mikawa ◽

Keyword(s):

Estimation Method ◽

Attitude Estimation ◽

The Body ◽

Training Data ◽

Large Area ◽

Data Set ◽

Wireless Mesh ◽

Surface Exploration ◽

The Face ◽

Micro Gravity

We are developing a robotic system for an asteroid surface exploration. The system consists ofmultiplesmall size rovers, that communicate with each other over a wireless network. Since the rovers configure over a wireless mesh sensor network on an asteroid, it is possible to explore a large area on the asteroid effectively. The rovers will be equipped with a hopping mechanism for transportation, which is suitable for exploration in a micro-gravity environment like a small asteroid’s surface. However, it is difficult to control the rover’s attitude during the landing. Therefore, a cube-shaped rover was designed. As every face has two antennas respectively, the rover has a total of twelve antennas. Furthermore, as the body shape and the antenna arrangements are symmetric, irrespective of the face on top, a reliable communication state among the rovers can be established by selecting the proper antennas on the top face. Therefore, it is important to estimate which face of the rover is on top. This paper presents an attitude estimation method based on the received signal strength indicators (RSSIs) obtained when the twelve antennas communicate among each other. Since the RSSI values change depending on an attitude of the rover and the surrounding environment, a significantly large number of RSSIs were collected as a training data set in different kinds of environments similar to an asteroid; consequently, a classifier for estimating the rover attitude was trained from the data set. A few of the experimental results establish the validity and effectiveness of the proposed exploration system and attitude estimation method.

Download Full-text

Effect of data-augmentation on fine-tuned CNN model performance

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp84-92 ◽

2021 ◽

Vol 10 (1) ◽

pp. 84

Author(s):

Ramaprasad Poojary ◽

Roma Raina ◽

Amit Kumar Mondal

Keyword(s):

Neural Network ◽

Computer Vision ◽

Deep Learning ◽

High Performance ◽

Data Augmentation ◽

Model Performance ◽

Training Data ◽

Fine Tuning ◽

Test Accuracy ◽

Training Time

<span id="docs-internal-guid-cdb76bbb-7fff-978d-961c-e21c41807064"><span>During the last few years, deep learning achieved remarkable results in the field of machine learning when used for computer vision tasks. Among many of its architectures, deep neural network-based architecture known as convolutional neural networks are recently used widely for image detection and classification. Although it is a great tool for computer vision tasks, it demands a large amount of training data to yield high performance. In this paper, the data augmentation method is proposed to overcome the challenges faced due to a lack of insufficient training data. To analyze the effect of data augmentation, the proposed method uses two convolutional neural network architectures. To minimize the training time without compromising accuracy, models are built by fine-tuning pre-trained networks VGG16 and ResNet50. To evaluate the performance of the models, loss functions and accuracies are used. Proposed models are constructed using Keras deep learning framework and models are trained on a custom dataset created from Kaggle CAT vs DOG database. Experimental results showed that both the models achieved better test accuracy when data augmentation is employed, and model constructed using ResNet50 outperformed VGG16 based model with a test accuracy of 90% with data augmentation & 82% without data augmentation.</span></span>

Download Full-text

Named Entity Recognition in Chinese Medical Literature Using Pretraining Models

Scientific Programming ◽

10.1155/2020/8812754 ◽

2020 ◽

Vol 2020 ◽

pp. 1-9

Author(s):

Yu Wang ◽

Yining Sun ◽

Zuchang Ma ◽

Lisheng Gao ◽

Yang Xu

Keyword(s):

Large Scale ◽

Data Augmentation ◽

Medical Literature ◽

Named Entity Recognition ◽

Semantic Knowledge ◽

Training Data ◽

Fine Tuning ◽

Entity Recognition ◽

Small Scale ◽

Named Entity

The medical literature contains valuable knowledge, such as the clinical symptoms, diagnosis, and treatments of a particular disease. Named Entity Recognition (NER) is the initial step in extracting this knowledge from unstructured text and presenting it as a Knowledge Graph (KG). However, the previous approaches of NER have often suffered from small-scale human-labelled training data. Furthermore, extracting knowledge from Chinese medical literature is a more complex task because there is no segmentation between Chinese characters. Recently, the pretraining models, which obtain representations with the prior semantic knowledge on large-scale unlabelled corpora, have achieved state-of-the-art results for a wide variety of Natural Language Processing (NLP) tasks. However, the capabilities of pretraining models have not been fully exploited, and applications of other pretraining models except BERT in specific domains, such as NER in Chinese medical literature, are also of interest. In this paper, we enhance the performance of NER in Chinese medical literature using pretraining models. First, we propose a method of data augmentation by replacing the words in the training set with synonyms through the Mask Language Model (MLM), which is a pretraining task. Then, we consider NER as the downstream task of the pretraining model and transfer the prior semantic knowledge obtained during pretraining to it. Finally, we conduct experiments to compare the performances of six pretraining models (BERT, BERT-WWM, BERT-WWM-EXT, ERNIE, ERNIE-tiny, and RoBERTa) in recognizing named entities from Chinese medical literature. The effects of feature extraction and fine-tuning, as well as different downstream model structures, are also explored. Experimental results demonstrate that the method of data augmentation we proposed can obtain meaningful improvements in the performance of recognition. Besides, RoBERTa-CRF achieves the highest F1-score compared with the previous methods and other pretraining models.

Download Full-text

Large-Scale Printed Chinese Character Recognition for ID Cards Using Deep Learning and Few Samples Transfer Learning

10.20944/preprints202112.0376.v1 ◽

2021 ◽

Author(s):

Yi-Quan Li ◽

Hao-Sen Chang ◽

Daw-Tung Lin

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Transfer Learning ◽

Character Recognition ◽

Large Scale ◽

Data Augmentation ◽

Training Data ◽

Fine Tuning ◽

Training Dataset ◽

Model Parameters

In the field of computer vision, large-scale image classification tasks are both important and highly challenging. With the ongoing advances in deep learning and optical character recognition (OCR) technologies, neural networks designed to perform large-scale classification play an essential role in facilitating OCR systems. In this study, we developed an automatic OCR system designed to identify up to 13,070 large-scale printed Chinese characters by using deep learning neural networks and fine-tuning techniques. The proposed framework comprises four components, including training dataset synthesis and background simulation, image preprocessing and data augmentation, the process of training the model, and transfer learning. The training data synthesis procedure is composed of a character font generation step and a background simulation process. Three background models are proposed to simulate the factors of the background noise and anti-counterfeiting patterns on ID cards. To expand the diversity of the synthesized training dataset, rotation and zooming data augmentation are applied. A massive dataset comprising more than 19.6 million images was thus created to accommodate the variations in the input images and improve the learning capacity of the CNN model. Subsequently, we modified the GoogLeNet neural architecture by replacing the FC layer with a global average pooling layer to avoid overfitting caused by a massive amount of training data. Consequently, the number of model parameters was reduced. Finally, we employed the transfer learning technique to further refine the CNN model using a small number of real data samples. Experimental results show that the overall recognition performance of the proposed approach is significantly better than that of prior methods and thus demonstrate the effectiveness of proposed framework, which exhibited a recognition accuracy as high as 99.39% on the constructed real ID card dataset.

Download Full-text

An Improved Travel Time Estimation Method without Signal Timing Data Based on an Individual Probe Car

CICTP 2016 ◽

10.1061/9780784479896.020 ◽

2016 ◽

Author(s):

Huibing Li ◽

Zhijie Bao ◽

Geng Yang ◽

Chao Peng ◽

Lihua Luo

Keyword(s):

Travel Time ◽

Estimation Method ◽

Time Estimation ◽

Signal Timing ◽

Travel Time Estimation ◽

Individual Probe ◽

Timing Data ◽

Probe Car

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text