scholarly journals Towards Synthetic AI Training Data for Image Classification in Intralogistic Settings

Author(s):  
Daniel Schoepflin ◽  
Karthik Iyer ◽  
Martin Gomse ◽  
Thorsten Schüppstuhl

Abstract Obtaining annotated data for proper training of AI image classifiers remains a challenge for successful deployment in industrial settings. As a promising alternative to handcrafted annotations, synthetic training data generation has grown in popularity. However, in most cases the pipelines used to generate this data are not of universal nature and have to be redesigned for different domain applications. This requires a detailed formulation of the domain through a semantic scene grammar. We aim to present such a grammar that is based on domain knowledge for the production-supplying transport of components in intralogistic settings. We present a use-case analysis for the domain of production supplying logistics and derive a scene grammar, which can be used to formulate similar problem statements in the domain for the purpose of data generation. We demonstrate the use of this grammar to feed a scene generation pipeline and obtain training data for an AI based image classifier.

Procedia CIRP ◽  
2021 ◽  
Vol 104 ◽  
pp. 1257-1262
Author(s):  
Daniel Schoepflin ◽  
Dirk Holst ◽  
Martin Gomse ◽  
Thorsten Schüppstuhl

Author(s):  
Summaya Mumtaz ◽  
Martin Giese

AbstractIn low-resource domains, it is challenging to achieve good performance using existing machine learning methods due to a lack of training data and mixed data types (numeric and categorical). In particular, categorical variables with high cardinality pose a challenge to machine learning tasks such as classification and regression because training requires sufficiently many data points for the possible values of each variable. Since interpolation is not possible, nothing can be learned for values not seen in the training set. This paper presents a method that uses prior knowledge of the application domain to support machine learning in cases with insufficient data. We propose to address this challenge by using embeddings for categorical variables that are based on an explicit representation of domain knowledge (KR), namely a hierarchy of concepts. Our approach is to 1. define a semantic similarity measure between categories, based on the hierarchy—we propose a purely hierarchy-based measure, but other similarity measures from the literature can be used—and 2. use that similarity measure to define a modified one-hot encoding. We propose two embedding schemes for single-valued and multi-valued categorical data. We perform experiments on three different use cases. We first compare existing similarity approaches with our approach on a word pair similarity use case. This is followed by creating word embeddings using different similarity approaches. A comparison with existing methods such as Google, Word2Vec and GloVe embeddings on several benchmarks shows better performance on concept categorisation tasks when using knowledge-based embeddings. The third use case uses a medical dataset to compare the performance of semantic-based embeddings and standard binary encodings. Significant improvement in performance of the downstream classification tasks is achieved by using semantic information.


2021 ◽  
Vol 18 (4) ◽  
pp. 378-381 ◽  
Author(s):  
Luis A. Bolaños ◽  
Dongsheng Xiao ◽  
Nancy L. Ford ◽  
Jeff M. LeDue ◽  
Pankaj K. Gupta ◽  
...  

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Christine Dewi ◽  
Rung-Ching Chen ◽  
Yan-Ting Liu ◽  
Xiaoyi Jiang ◽  
Kristoko Dwi Hartomo

Sign in / Sign up

Export Citation Format

Share Document