An Engineering Domain Knowledge-Based Framework for Modelling Highly Incomplete Industrial Data

The missing values in industrial data restrict the applications. Although this incomplete data contains enough information for engineers to support subsequent development, there are still too many missing values for algorithms to establish precise models. This is because the engineering domain knowledge is not considered, and valuable information is not fully captured. Therefore, this article proposes an engineering domain knowledge-based framework for modelling incomplete industrial data. The raw datasets are partitioned and processed at different scales. Firstly, the hierarchical features are combined to decrease the missing ratio. In order to fill the missing values in special data, which is identified for classifying the samples, samples with only part of the features presented are fully utilized instead of being removed to establish local imputation model. Then samples are divided into different groups to transfer the information. A series of industrial data is analyzed for verifying the feasibility of the proposed method.

Download Full-text

A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms

Mathematics ◽

10.3390/math9070786 ◽

2021 ◽

Vol 9 (7) ◽

pp. 786

Author(s):

Yenny Villuendas-Rey ◽

Eley Barroso-Cubas ◽

Oscar Camacho-Nieto ◽

Cornelio Yáñez-Márquez

Keyword(s):

Swarm Intelligence ◽

Data Clustering ◽

Incomplete Data ◽

Missing Values ◽

Clustering Algorithms ◽

Bat Algorithm ◽

Hybrid Features ◽

Bee Colony ◽

Learning Tasks ◽

Clustering Data

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.

Download Full-text

Symmetry Breaking and Training from Incomplete Data with Radial Basis Boltzmann Machines

International Journal of Neural Systems ◽

10.1142/s0129065797000318 ◽

1997 ◽

Vol 08 (03) ◽

pp. 301-315 ◽

Cited By ~ 8

Author(s):

Marcel J. Nijman ◽

Hilbert J. Kappen

Keyword(s):

Symmetry Breaking ◽

Incomplete Data ◽

Missing Values ◽

Nearest Neighbor ◽

Boltzmann Machine ◽

K Nearest Neighbor ◽

Data Set ◽

Input Space ◽

Learning Rules ◽

Radial Basis

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.

Download Full-text

Configuration Tree Solver: A Technology for Automated Design and Configuration

16th Design Automation Conference: Volume 1 — Computer Aided and Computational Design ◽

10.1115/detc1990-0039 ◽

1990 ◽

Author(s):

Alexander Kott ◽

Gerald Agin ◽

Dave Fawcett

Keyword(s):

Problem Solving ◽

General Model ◽

General Class ◽

Domain Knowledge ◽

Automated Design ◽

Automotive Component ◽

Business Operation ◽

Knowledge Based ◽

Specific Configuration

Abstract Configuration is a process of generating a definitive description of a product or an order that satisfies a set of specified requirements and known constraints. Knowledge-based technology is an enabling factor in automation of configuration tasks found in the business operation. In this paper, we describe a configuration technique that is well suited for configuring “decomposable” artifacts with reasonably well defined structure and constraints. This technique may be classified as a member of a general class of decompositional approaches to configuration. The domain knowledge is structured as a general model of the artifact, an and-or hierarchy of the artifact’s elements, features, and characteristics. The model includes constraints and local specialists which are attached to the elements of the and-or-tree. Given the specific configuration requirements, the problem solving engine searches for a solution, a subtree, that satisfies the requirements and the applicable constraints. We describe an application of this approach that performs configuration and design of an automotive component.

Download Full-text

Predicting readers’ domain knowledge based on eye-tracking measures

The Electronic Library ◽

10.1108/el-05-2017-0108 ◽

2018 ◽

Vol 36 (6) ◽

pp. 1027-1042 ◽

Cited By ~ 1

Author(s):

Quan Lu ◽

Jiyue Zhang ◽

Jing Chen ◽

Ji Li

Keyword(s):

Logistic Regression ◽

Eye Tracking ◽

Information Search ◽

Domain Knowledge ◽

Fixation Duration ◽

Baseline Model ◽

Content Type ◽

Knowledge Based ◽

Total Fixation ◽

Negative Predictor

Purpose This paper aims to examine the effect of domain knowledge on eye-tracking measures and predict readers’ domain knowledge from these measures in a navigational table of contents (N-TOC) system. Design/methodology/approach A controlled experiment of three reading tasks was conducted in an N-TOC system for 24 postgraduates of Wuhan University. Data including fixation duration, fixation count and inter-scanning transitions were collected and calculated. Participants’ domain knowledge was measured by pre-experiment questionnaires. Logistic regression analysis was leveraged to build the prediction model and the model’s performance was evaluated based on baseline model. Findings The results showed that novices spent significantly more time in fixating on text area than experts, because of the difficulty of understanding the information of text area. Total fixation duration on text area (TFD_T) was a significantly negative predictor of domain knowledge. The prediction performance of logistic regression model using eye-tracking measures was better than baseline model, with the accuracy, precision and F(β = 1) scores to be 0.71, 0.86, 0.79. Originality/value Little research has been reported in literature on investigation of domain knowledge effect on eye-tracking measures during reading and prediction of domain knowledge based on eye-tracking measures. Most studies focus on multimedia learning. With respect to the prediction of domain knowledge, only some studies are found in the field of information search. This paper makes a good contribution to the literature on the effect of domain knowledge on eye-tracking measures during N-TOC reading and predicting domain knowledge.

Download Full-text

Histopathological Image Recognition with Domain Knowledge Based Deep Features

Intelligent Computing Methodologies - Lecture Notes in Computer Science ◽

10.1007/978-3-319-95957-3_38 ◽

2018 ◽

pp. 349-359 ◽

Cited By ~ 5

Author(s):

Gang Zhang ◽

Ming Xiao ◽

Yong-hui Huang

Keyword(s):

Image Recognition ◽

Domain Knowledge ◽

Knowledge Based ◽

Histopathological Image

Download Full-text

Domain Knowledge-Based Compaction

Compression Schemes for Mining Large Datasets ◽

10.1007/978-1-4471-5607-9_6 ◽

2013 ◽

pp. 125-145

Author(s):

T. Ravindra Babu ◽

M. Narasimha Murty ◽

S. V. Subrahmanya

Keyword(s):

Domain Knowledge ◽

Knowledge Based

Download Full-text

Modeling domain knowledge based on the central laws of integrative brain activity

Technology audit and production reserves ◽

10.15587/2312-8372.2016.66520 ◽

2016 ◽

Vol 2 (2(28)) ◽

pp. 33

Author(s):

Сергій Ілліч Доценко

Keyword(s):

Domain Knowledge ◽

Brain Activity ◽

Knowledge Based

Download Full-text

Evolutionary Machine Learning for Classification with Incomplete Data

10.26686/wgtn.17072123 ◽

2021 ◽

Author(s):

◽

Cao Truong Tran

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Genetic Programming ◽

Incomplete Data ◽

Missing Values ◽

Machine Learning Techniques ◽

Feature Construction ◽

Classification Algorithms ◽

Learning Techniques ◽

Effectiveness And Efficiency

<p>Classification is a major task in machine learning and data mining. Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors. Existing most researchers working on classification with incomplete data focused on improving the effectiveness, but did not adequately address the issue of the efficiency of applying the classifiers to classify unseen instances, which is much more important than the act of creating classifiers. A common approach to classification with incomplete data is to use imputation methods to replace missing values with plausible values before building classifiers and classifying unseen instances. This approach provides complete data which can be then used by any classification algorithm, but sophisticated imputation methods are usually computationally intensive, especially for the application process of classification. Another approach to classification with incomplete data is to build a classifier that can directly work with missing values. This approach does not require time for estimating missing values, but it often generates inaccurate and complex classifiers when faced with numerous missing values. A recent approach to classification with incomplete data which also avoids estimating missing values is to build a set of classifiers which then is used to select applicable classifiers for classifying unseen instances. However, this approach is also often inaccurate and takes a long time to find applicable classifiers when faced with numerous missing values. The overall goal of the thesis is to simultaneously improve the effectiveness and efficiency of classification with incomplete data by using evolutionary machine learning techniques for feature selection, clustering, ensemble learning, feature construction and constructing classifiers. The thesis develops approaches for improving imputation for classification with incomplete data by integrating clustering and feature selection with imputation. The approaches improve both the effectiveness and the efficiency of using imputation for classification with incomplete data. The thesis develops wrapper-based feature selection methods to improve input space for classification algorithms that are able to work directly with incomplete data. The methods not only improve the classification accuracy, but also reduce the complexity of classifiers able to work directly with incomplete data. The thesis develops a feature construction method to improve input space for classification algorithms with incomplete data by proposing interval genetic programming-genetic programming with a set of interval functions. The method improves the classification accuracy and reduces the complexity of classifiers. The thesis develops an ensemble approach to classification with incomplete data by integrating imputation, feature selection, and ensemble learning. The results show that the approach is more accurate, and faster than previous common methods for classification with incomplete data. The thesis develops interval genetic programming to directly evolve classifiers for incomplete data. The results show that classifiers generated by interval genetic programming can be more effective and efficient than classifiers generated the combination of imputation and traditional genetic programming. Interval genetic programming is also more effective than common classification algorithms able to work directly with incomplete data. In summary, the thesis develops a range of approaches for simultaneously improving the effectiveness and efficiency of classification with incomplete data by using a range of evolutionary machine learning techniques.</p>

Download Full-text

A Model of Domain Knowledge Content Updating Based on Management Information Interactions

Informacijos mokslai ◽

10.15388/im.2019.85.17 ◽

2019 ◽

Vol 85 ◽

pp. 69-97

Author(s):

Jurij Tekutov ◽

Saulius Gudas ◽

Vitalijus Denisovas ◽

Julija Smirnova

Keyword(s):

Value Chain ◽

Domain Knowledge ◽

Chain Model ◽

Management Information ◽

Process Measures ◽

Knowledge Database ◽

Knowledge Based ◽

Educational Domain ◽

Case System ◽

Knowledge Content

The hierarchical Detailed Value Chain Model and the Elementary Management Cycle model of educational domain knowledge content updating are formally described in this paper, wherein computerized process measures are also proposed. The paper provides a method for updating the knowledge of the analyzed domain, referred to as the “enterprise domain,” based on enterprise modelling in terms of management information interactions. A method was designed, the formal DVCM and EMC descriptions of which are provided in the BPMN notation, allowing to develop a two-level (granular) model for describing the knowledge of educational domain management information interactions. In implementing this model and its algorithms in technological terms, a subsystem of enterprise knowledge has been created in a knowledge-based CASE system (computerized knowledge-based IS engineering), which performs the function of a domain knowledge database.

Download Full-text

Knowledge-Based Configuration of Operating Systems — Problems in Modeling the Domain Knowledge

Wissensbasierte Systeme - Informatik-Fachberichte ◽

10.1007/978-3-642-70840-4_10 ◽

1985 ◽

pp. 121-134 ◽

Cited By ~ 7

Author(s):

Hans Haugeneder ◽

Egbert Lehmann ◽

Peter Struß

Keyword(s):

Operating Systems ◽

Domain Knowledge ◽

Knowledge Based

Download Full-text