Improving real-time CNN-based pupil detection through domain-specific data augmentation

In this paper, we proposed an approach to detect oilseed rape pests based on deep learning, which improves the mean average precision (mAP) to 77.14%; the result increased by 9.7% with the original model. We adopt this model to mobile platform to let every farmer able to use this program, which will diagnose pests in real time and provide suggestions on pest controlling. We designed an oilseed rape pest imaging database with 12 typical oilseed rape pests and compared the performance of five models, SSD w/Inception is chosen as the optimal model. Moreover, for the purpose of the high mAP, we have used data augmentation (DA) and added a dropout layer. The experiments are performed on the Android application we developed, and the result shows that our approach surpasses the original model obviously and is helpful for integrated pest management. This application has improved environmental adaptability, response speed, and accuracy by contrast with the past works and has the advantage of low cost and simple operation, which are suitable for the pest monitoring mission of drones and Internet of Things (IoT).

Download Full-text

Robust Approach to Supervised Deep Neural Network Training for Real-Time Object Classification in Cluttered Indoor Environment

Applied Sciences ◽

10.3390/app11157148 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7148

Author(s):

Bedada Endale ◽

Abera Tullu ◽

Hayoung Shi ◽

Beom-Soo Kang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real Time ◽

Network Architecture ◽

Input Data ◽

Deep Neural Network ◽

Data Augmentation ◽

Object Classification ◽

Training Data ◽

Gradient Descent Algorithm

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.

Download Full-text

Enabling Real-time Sign Language Translation on Mobile Platforms with On-board Depth Cameras

Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies ◽

10.1145/3463498 ◽

2021 ◽

Vol 5 (2) ◽

pp. 1-30

Author(s):

HyeonJung Park ◽

Youngki Lee ◽

JeongGil Ko

Keyword(s):

Real Time ◽

Sign Language ◽

Data Augmentation ◽

Language Translation ◽

Mobile Platforms ◽

Depth Cameras ◽

Language Data ◽

In The Wild ◽

Environmental Robustness ◽

Cloud Servers

In this work we present SUGO, a depth video-based system for translating sign language to text using a smartphone's front camera. While exploiting depth-only videos offer benefits such as being less privacy-invasive compared to using RGB videos, it introduces new challenges which include dealing with low video resolutions and the sensors' sensitiveness towards user motion. We overcome these challenges by diversifying our sign language video dataset to be robust to various usage scenarios via data augmentation and design a set of schemes to emphasize human gestures from the input images for effective sign detection. The inference engine of SUGO is based on a 3-dimensional convolutional neural network (3DCNN) to classify a sequence of video frames as a pre-trained word. Furthermore, the overall operations are designed to be light-weight so that sign language translation takes place in real-time using only the resources available on a smartphone, with no help from cloud servers nor external sensing components. Specifically, to train and test SUGO, we collect sign language data from 20 individuals for 50 Korean Sign Language words, summing up to a dataset of ~5,000 sign gestures and collect additional in-the-wild data to evaluate the performance of SUGO in real-world usage scenarios with different lighting conditions and daily activities. Comprehensively, our extensive evaluations show that SUGO can properly classify sign words with an accuracy of up to 91% and also suggest that the system is suitable (in terms of resource usage, latency, and environmental robustness) to enable a fully mobile solution for sign language translation.

Download Full-text

Development of a real-time machine vision system for functional textile fabric defect detection using a deep YOLOv4 model

Textile Research Journal ◽

10.1177/00405175211034241 ◽

2021 ◽

pp. 004051752110342

Author(s):

Sifundvolesihle Dlamini ◽

Chih-Yuan Kao ◽

Shun-Lian Su ◽

Chung-Feng Jeffrey Kuo

Keyword(s):

Machine Vision ◽

Real Time ◽

Defect Detection ◽

Data Augmentation ◽

Detection System ◽

Vision System ◽

Good Precision ◽

Machine Vision System ◽

Time Machine ◽

Textile Fabric

We introduce a real-time machine vision system we developed with the aim of detecting defects in functional textile fabrics with good precision at relatively fast detection speeds to assist in textile industry quality control. The system consists of image acquisition hardware and image processing software. The software we developed uses data preprocessing techniques to break down raw images to smaller suitable sizes. Filtering is employed to denoise and enhance some features. To generalize and multiply the data to create robustness, we use data augmentation, which is followed by labeling where the defects in the images are labeled and tagged. Lastly, we utilize YOLOv4 for localization where the system is trained with weights of a pretrained model. Our software is deployed with the hardware that we designed to implement the detection system. The designed system shows strong performance in defect detection with precision of [Formula: see text], and recall and [Formula: see text] scores of [Formula: see text] and [Formula: see text], respectively. The detection speed is relatively fast at [Formula: see text] fps with a prediction speed of [Formula: see text] ms. Our system can automatically locate functional textile fabric defects with high confidence in real time.

Download Full-text

DISSECT: DISentangle SharablE ConTent for Multimodal Integration and Crosswise-mapping

10.1101/2020.09.04.283234 ◽

2020 ◽

Author(s):

Geoffrey Schau ◽

Erik Burlingame ◽

Young Hwan Chang

Keyword(s):

Deep Learning ◽

Complete Information ◽

Specific Information ◽

Multimodal Integration ◽

Specific Data ◽

Domain Specific ◽

Cross Domain ◽

Input Feature ◽

Novel Approach ◽

Latent Representations

AbstractDeep learning systems have emerged as powerful mechanisms for learning domain translation models. However, in many cases, complete information in one domain is assumed to be necessary for sufficient cross-domain prediction. In this work, we motivate a formal justification for domain-specific information separation in a simple linear case and illustrate that a self-supervised approach enables domain translation between data domains while filtering out domain-specific data features. We introduce a novel approach to identify domainspecific information from sets of unpaired measurements in complementary data domains by considering a deep learning cross-domain autoencoder architecture designed to learn shared latent representations of data while enabling domain translation. We introduce an orthogonal gate block designed to enforce orthogonality of input feature sets by explicitly removing non-sharable information specific to each domain and illustrate separability of domain-specific information on a toy dataset.

Download Full-text

Comparative Analysis on Deep Learning Approaches for Heavy-Vehicle Detection based on Data Augmentation and Transfer-Learning techniques

Journal of Scientific Research ◽

10.3329/jsr.v13i3.52332 ◽

2021 ◽

Vol 13 (3) ◽

pp. 809-820

Author(s):

V. Sowmya ◽

R. Radha

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Real Time ◽

Transfer Learning ◽

Convolutional Neural Networks ◽

Traffic Management ◽

Data Augmentation ◽

Vehicle Detection ◽

Heavy Vehicles ◽

Detection And Recognition

Vehicle detection and recognition require demanding advanced computational intelligence and resources in a real-time traffic surveillance system for effective traffic management of all possible contingencies. One of the focus areas of deep intelligent systems is to facilitate vehicle detection and recognition techniques for robust traffic management of heavy vehicles. The following are such sophisticated mechanisms: Support Vector Machine (SVM), Convolutional Neural Networks (CNN), Regional Convolutional Neural Networks (R-CNN), You Only Look Once (YOLO) model, etcetera. Accordingly, it is pivotal to choose the precise algorithm for vehicle detection and recognition, which also addresses the real-time environment. In this study, a comparison of deep learning algorithms, such as the Faster R-CNN, YOLOv2, YOLOv3, and YOLOv4, are focused on diverse aspects of the features. Two entities for transport heavy vehicles, the buses and trucks, constitute detection and recognition elements in this proposed work. The mechanics of data augmentation and transfer-learning is implemented in the model; to build, execute, train, and test for detection and recognition to avoid over-fitting and improve speed and accuracy. Extensive empirical evaluation is conducted on two standard datasets such as COCO and PASCAL VOC 2007. Finally, comparative results and analyses are presented based on real-time.

Download Full-text