Synthetic dataset generation for object-to-model deep learning in industrial applications

The availability of large image data sets has been a crucial factor in the success of deep learning-based classification and detection methods. Yet, while data sets for everyday objects are widely available, data for specific industrial use-cases (e.g., identifying packaged products in a warehouse) remains scarce. In such cases, the data sets have to be created from scratch, placing a crucial bottleneck on the deployment of deep learning techniques in industrial applications. We present work carried out in collaboration with a leading UK online supermarket, with the aim of creating a computer vision system capable of detecting and identifying unique supermarket products in a warehouse setting. To this end, we demonstrate a framework for using data synthesis to create an end-to-end deep learning pipeline, beginning with real-world objects and culminating in a trained model. Our method is based on the generation of a synthetic dataset from 3D models obtained by applying photogrammetry techniques to real-world objects. Using 100K synthetic images for 10 classes, an InceptionV3 convolutional neural network was trained, which achieved accuracy of 96% on a separately acquired test set of real supermarket product images. The image generation process supports automatic pixel annotation. This eliminates the prohibitively expensive manual annotation typically required for detection tasks. Based on this readily available data, a one-stage RetinaNet detector was trained on the synthetic, annotated images to produce a detector that can accurately localize and classify the specimen products in real-time.

Download Full-text

Privacy-Preserving Deep Learning for the Detection of Protected Health Information in Real-World Data: Comparative Evaluation (Preprint)

10.2196/preprints.14064 ◽

2019 ◽

Author(s):

Sven Festag ◽

Cord Spreckelsen

Keyword(s):

Deep Learning ◽

Health Information ◽

Clinical Data ◽

Real World ◽

Privacy Preserving ◽

Stochastic Gradient Descent ◽

Data Sets ◽

Learning Approaches ◽

Protected Health Information ◽

Private Data

BACKGROUND Collaborative privacy-preserving training methods allow for the integration of locally stored private data sets into machine learning approaches while ensuring confidentiality and nondisclosure. OBJECTIVE In this work we assess the performance of a state-of-the-art neural network approach for the detection of protected health information in texts trained in a collaborative privacy-preserving way. METHODS The training adopts distributed selective stochastic gradient descent (ie, it works by exchanging local learning results achieved on private data sets). Five networks were trained on separated real-world clinical data sets by using the privacy-protecting protocol. In total, the data sets contain 1304 real longitudinal patient records for 296 patients. RESULTS These networks reached a mean F1 value of 0.955. The gold standard centralized training that is based on the union of all sets and does not take data security into consideration reaches a final value of 0.962. CONCLUSIONS Using real-world clinical data, our study shows that detection of protected health information can be secured by collaborative privacy-preserving training. In general, the approach shows the feasibility of deep learning on distributed and confidential clinical data while ensuring data protection.

Download Full-text

Frontiers of Computer Vision Technologies on Real Estate Property Photographs and Floorplans

New Frontiers in Regional Science: Asian Perspectives - Frontiers of Real Estate Science in Japan ◽

10.1007/978-981-15-8848-8_23 ◽

2021 ◽

pp. 325-337

Author(s):

Yoji Kiyota

Keyword(s):

Artificial Intelligence ◽

Computer Vision ◽

Deep Learning ◽

Real Estate ◽

Image Data ◽

Learning Technologies ◽

Data Sets ◽

Academic Communities ◽

Estate Property ◽

Extract Information

AbstractThis article describes frontier efforts to apply deep learning technologies, which is the greatest innovation of research on artificial intelligence and computer vision, to image data such as real estate property photographs and floorplans. Specifically, attempts to detect property photographs that violate regulations or were misclassified, or to extract information that can be used as new recommendation features from property photographs, were mentioned. Besides, this article introduces an innovation created by providing data sets for academic communities.

Download Full-text

TripRec

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2015010103 ◽

2015 ◽

Vol 11 (1) ◽

pp. 45-65 ◽

Cited By ~ 1

Author(s):

Heli Sun ◽

Jianbin Huang ◽

Xinwei She ◽

Zhou Yang ◽

Jiao Liu ◽

...

Keyword(s):

Real World ◽

Efficient Algorithm ◽

State Of The Art ◽

Synthetic Data ◽

Time Constraints ◽

Data Sets ◽

Generation Process ◽

Time Requirement ◽

Trip Planning ◽

Frequent Item Sets

The problem of trip planning with time constraints aims to find the optimal routes satisfying the maximum time requirement and possessing the highest attraction score. In this paper, a more efficient algorithm TripRec is proposed to solve this problem. Based on the principle of the Aprior algorithm for mining frequent item sets, our method constructs candidate attraction sets containing k attractions by using the join rule on valid sets consisting of k-1 attractions. After all the valid routes from the valid k-1 attraction sets have been obtained, all of the candidate routes for the candidate k-sets can be acquired through a route extension approach. This method exhibits manifest improvement of the efficiency in the valid routes generation process. Then, by determining whether there exists at least one valid route, the paper prunes some candidate attraction sets to gain all the valid sets. The process will continue until no more valid attraction sets can be obtained. In addition, several optimization strategies are employed to greatly enhance the performance of the algorithm. Experimental results on both real-world and synthetic data sets show that our algorithm has the better pruning rate and efficiency compared with the state-of-the-art method.

Download Full-text

Evaluation of Image Forgery Detection Using Multi-Scale Weber Local Descriptors

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213015400163 ◽

2015 ◽

Vol 24 (04) ◽

pp. 1540016 ◽

Cited By ~ 18

Author(s):

Muhammad Hussain ◽

Sahar Qasem ◽

George Bebis ◽

Ghulam Muhammad ◽

Hatim Aboalsamh ◽

...

Keyword(s):

Digital Image ◽

Image Data ◽

Feature Space ◽

Detection Methods ◽

Support Vector ◽

Data Sets ◽

Forgery Detection ◽

Data Set ◽

Multi Scale ◽

Copy Move Forgery Detection

Due to the maturing of digital image processing techniques, there are many tools that can forge an image easily without leaving visible traces and lead to the problem of the authentication of digital images. Based on the assumption that forgery alters the texture micro-patterns in a digital image and texture descriptors can be used for modeling this change; we employed two stat-of-the-art local texture descriptors: multi-scale Weber's law descriptor (multi-WLD) and multi-scale local binary pattern (multi-LBP) for splicing and copy-move forgery detection. As the tamper traces are not visible to open eyes, so the chrominance components of an image encode these traces and were used for modeling tamper traces with the texture descriptors. To reduce the dimension of the feature space and get rid of redundant features, we employed locally learning based (LLB) algorithm. For identifying an image as authentic or tampered, Support vector machine (SVM) was used. This paper presents the thorough investigation for the validation of this forgery detection method. The experiments were conducted on three benchmark image data sets, namely, CASIA v1.0, CASIA v2.0, and Columbia color. The experimental results showed that the accuracy rate of multi-WLD based method was 94.19% on CASIA v1.0, 96.52% on CASIA v2.0, and 94.17% on Columbia data set. It is not only significantly better than multi-LBP based method, but also it outperforms other stat-of-the-art similar forgery detection methods.

Download Full-text

Biological data annotation via a human-augmenting AI-based labeling system

npj Digital Medicine ◽

10.1038/s41746-021-00520-6 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Douwe van der Wal ◽

Iny Jhun ◽

Israa Laklouk ◽

Jeff Nirschl ◽

Lara Richer ◽

...

Keyword(s):

Deep Learning ◽

Large Scale ◽

Microscopic Analysis ◽

Image Data ◽

Cell Types ◽

Biological Data ◽

Data Sets ◽

Data Set ◽

Data Annotation ◽

Labeling System

AbstractBiology has become a prime area for the deployment of deep learning and artificial intelligence (AI), enabled largely by the massive data sets that the field can generate. Key to most AI tasks is the availability of a sufficiently large, labeled data set with which to train AI models. In the context of microscopy, it is easy to generate image data sets containing millions of cells and structures. However, it is challenging to obtain large-scale high-quality annotations for AI models. Here, we present HALS (Human-Augmenting Labeling System), a human-in-the-loop data labeling AI, which begins uninitialized and learns annotations from a human, in real-time. Using a multi-part AI composed of three deep learning models, HALS learns from just a few examples and immediately decreases the workload of the annotator, while increasing the quality of their annotations. Using a highly repetitive use-case—annotating cell types—and running experiments with seven pathologists—experts at the microscopic analysis of biological specimens—we demonstrate a manual work reduction of 90.60%, and an average data-quality boost of 4.34%, measured across four use-cases and two tissue stain types.

Download Full-text

Peer Review #1 of "Synthetic dataset generation for object-to-model deep learning in industrial applications (v0.1)"

10.7287/peerj-cs.222v0.1/reviews/1 ◽

2019 ◽

Keyword(s):

Deep Learning ◽

Peer Review ◽

Industrial Applications ◽

Synthetic Dataset

Download Full-text

A Vision-Based System for Power Transmission Facilities Detection

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.423-426.2547 ◽

2013 ◽

Vol 423-426 ◽

pp. 2547-2554 ◽

Cited By ~ 3

Author(s):

Deng Lu Wu ◽

Bing Feng Li ◽

Wen Tao Li ◽

Yong Xia ◽

Yan Dong Tang

Keyword(s):

Power Transmission ◽

Vision System ◽

Image Data ◽

Power Line ◽

Detection Methods ◽

Real Image ◽

Design And Implementation ◽

Complex Background ◽

Novel Methods

In this paper, we present the design and implementation of a vision system for power transmission facilities detection based on UAV videos. The vision system consists of two main parts, the client part and the server part. The aim of the system is to detect the power transmission facilities in complex background. In order to achieve this aim, several novel methods are proposed for detecting the power transmission facilities which include the power line, the power tower, and the insulator. The experiment results on real image data demonstrate that the proposed methods are accurate and effective. The system is presented to demonstrate the performance of the detection methods.

Download Full-text

Analyzing the impact of missing values and selection bias on fairness

International Journal of Data Science and Analytics ◽

10.1007/s41060-021-00259-z ◽

2021 ◽

Author(s):

Yanchen Wang ◽

Lisa Singh

Keyword(s):

Selection Bias ◽

Real World ◽

Missing Values ◽

Empirical Evaluation ◽

Data Sets ◽

Generation Process ◽

Real World Data ◽

World Data ◽

Impact Prediction ◽

The Impact

AbstractAlgorithmic decision making is becoming more prevalent, increasingly impacting people’s daily lives. Recently, discussions have been emerging about the fairness of decisions made by machines. Researchers have proposed different approaches for improving the fairness of these algorithms. While these approaches can help machines make fairer decisions, they have been developed and validated on fairly clean data sets. Unfortunately, most real-world data have complexities that make them more dirty. This work considers two of these complexities by analyzing the impact of two real-world data issues on fairness—missing values and selection bias—for categorical data. After formulating this problem and showing its existence, we propose fixing algorithms for data sets containing missing values and/or selection bias that use different forms of reweighting and resampling based upon the missing value generation process. We conduct an extensive empirical evaluation on both real-world and synthetic data using various fairness metrics, and demonstrate how different missing values generated from different mechanisms and selection bias impact prediction fairness, even when prediction accuracy remains fairly constant.

Download Full-text

ML-LOO: Detecting Adversarial Examples with Feature Attribution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6140 ◽

2020 ◽

Vol 34 (04) ◽

pp. 6639-6647 ◽

Cited By ~ 1

Author(s):

Puyudi Yang ◽

Jianbo Chen ◽

Cho-Jui Hsieh ◽

Jane-Ling Wang ◽

Michael Jordan

Keyword(s):

State Of The Art ◽

Image Data ◽

Real Data ◽

Detection Methods ◽

Data Sets ◽

Confidence Levels ◽

Significant Difference ◽

Adversarial Examples ◽

Scale Estimate ◽

Complete Access

Deep neural networks obtain state-of-the-art performance on a series of tasks. However, they are easily fooled by adding a small adversarial perturbation to the input. The perturbation is often imperceptible to humans on image data. We observe a significant difference in feature attributions between adversarially crafted examples and original examples. Based on this observation, we introduce a new framework to detect adversarial examples through thresholding a scale estimate of feature attribution scores. Furthermore, we extend our method to include multi-layer feature attributions in order to tackle attacks that have mixed confidence levels. As demonstrated in extensive experiments, our method achieves superior performances in distinguishing adversarial examples from popular attack methods on a variety of real data sets compared to state-of-the-art detection methods. In particular, our method is able to detect adversarial examples of mixed confidence levels, and transfer between different attacking methods. We also show that our method achieves competitive performance even when the attacker has complete access to the detector.

Download Full-text

Sensing Plant Disease Through the Utility of Deep Learning

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.h6463.069820 ◽

2020 ◽

Vol 9 (8) ◽

pp. 649-652

Keyword(s):

Deep Learning ◽

Smart Phone ◽

Image Data ◽

Global Scale ◽

Food Preservation ◽

Healthy Plant ◽

Data Sets ◽

Crop Species ◽

Public Data ◽

The Way

Crop diseases were one of a serious hazard to food preservation, but that the rapid identification continues tough against numerous segments regarding the globe's way to the shortage of mandatory infrastructure. The series of stimulating global Smart phone penetration including up to date advances also latest traits paved the way for deep Learning knowledge practicing public data sets of infected crops and also healthy plant leaves gathered beneath controlled stipulations, A deep CNN to pick out various crop species including its illnesses(disease) is developed. To verify the feasibility of this method that the trained model has to reach a great efficiency on a held-out check set. Then with the help of online sources testing the model toward a collection of pictures gathered from depended. The random selection is only supported by this accuracy implies an awful lot on the pinnacle, general accuracy can be boosted by the more various sets of training records. Overall, The way of training the deep gaining knowledge of forms on increasingly huge plus publicly to be had image data-sets provides a clear pathway closer to telephone-assisted crop ailment report on a big global scale.

Download Full-text