An Engineering Domain Knowledge-Based Framework for Modelling Highly Incomplete Industrial Data

2021 ◽  
Vol 17 (4) ◽  
pp. 48-66
Author(s):  
Han Li ◽  
Zhao Liu ◽  
Ping Zhu

The missing values in industrial data restrict the applications. Although this incomplete data contains enough information for engineers to support subsequent development, there are still too many missing values for algorithms to establish precise models. This is because the engineering domain knowledge is not considered, and valuable information is not fully captured. Therefore, this article proposes an engineering domain knowledge-based framework for modelling incomplete industrial data. The raw datasets are partitioned and processed at different scales. Firstly, the hierarchical features are combined to decrease the missing ratio. In order to fill the missing values in special data, which is identified for classifying the samples, samples with only part of the features presented are fully utilized instead of being removed to establish local imputation model. Then samples are divided into different groups to transfer the information. A series of industrial data is analyzed for verifying the feasibility of the proposed method.

Mathematics ◽  
2021 ◽  
Vol 9 (7) ◽  
pp. 786
Author(s):  
Yenny Villuendas-Rey ◽  
Eley Barroso-Cubas ◽  
Oscar Camacho-Nieto ◽  
Cornelio Yáñez-Márquez

Swarm intelligence has appeared as an active field for solving numerous machine-learning tasks. In this paper, we address the problem of clustering data with missing values, where the patterns are described by mixed (or hybrid) features. We introduce a generic modification to three swarm intelligence algorithms (Artificial Bee Colony, Firefly Algorithm, and Novel Bat Algorithm). We experimentally obtain the adequate values of the parameters for these three modified algorithms, with the purpose of applying them in the clustering task. We also provide an unbiased comparison among several metaheuristics based clustering algorithms, concluding that the clusters obtained by our proposals are highly representative of the “natural structure” of data.


1997 ◽  
Vol 08 (03) ◽  
pp. 301-315 ◽  
Author(s):  
Marcel J. Nijman ◽  
Hilbert J. Kappen

A Radial Basis Boltzmann Machine (RBBM) is a specialized Boltzmann Machine architecture that combines feed-forward mapping with probability estimation in the input space, and for which very efficient learning rules exist. The hidden representation of the network displays symmetry breaking as a function of the noise in the dynamics. Thus, generalization can be studied as a function of the noise in the neuron dynamics instead of as a function of the number of hidden units. We show that the RBBM can be seen as an elegant alternative of k-nearest neighbor, leading to comparable performance without the need to store all data. We show that the RBBM has good classification performance compared to the MLP. The main advantage of the RBBM is that simultaneously with the input-output mapping, a model of the input space is obtained which can be used for learning with missing values. We derive learning rules for the case of incomplete data, and show that they perform better on incomplete data than the traditional learning rules on a 'repaired' data set.


Author(s):  
Alexander Kott ◽  
Gerald Agin ◽  
Dave Fawcett

Abstract Configuration is a process of generating a definitive description of a product or an order that satisfies a set of specified requirements and known constraints. Knowledge-based technology is an enabling factor in automation of configuration tasks found in the business operation. In this paper, we describe a configuration technique that is well suited for configuring “decomposable” artifacts with reasonably well defined structure and constraints. This technique may be classified as a member of a general class of decompositional approaches to configuration. The domain knowledge is structured as a general model of the artifact, an and-or hierarchy of the artifact’s elements, features, and characteristics. The model includes constraints and local specialists which are attached to the elements of the and-or-tree. Given the specific configuration requirements, the problem solving engine searches for a solution, a subtree, that satisfies the requirements and the applicable constraints. We describe an application of this approach that performs configuration and design of an automotive component.


2018 ◽  
Vol 36 (6) ◽  
pp. 1027-1042 ◽  
Author(s):  
Quan Lu ◽  
Jiyue Zhang ◽  
Jing Chen ◽  
Ji Li

Purpose This paper aims to examine the effect of domain knowledge on eye-tracking measures and predict readers’ domain knowledge from these measures in a navigational table of contents (N-TOC) system. Design/methodology/approach A controlled experiment of three reading tasks was conducted in an N-TOC system for 24 postgraduates of Wuhan University. Data including fixation duration, fixation count and inter-scanning transitions were collected and calculated. Participants’ domain knowledge was measured by pre-experiment questionnaires. Logistic regression analysis was leveraged to build the prediction model and the model’s performance was evaluated based on baseline model. Findings The results showed that novices spent significantly more time in fixating on text area than experts, because of the difficulty of understanding the information of text area. Total fixation duration on text area (TFD_T) was a significantly negative predictor of domain knowledge. The prediction performance of logistic regression model using eye-tracking measures was better than baseline model, with the accuracy, precision and F(β = 1) scores to be 0.71, 0.86, 0.79. Originality/value Little research has been reported in literature on investigation of domain knowledge effect on eye-tracking measures during reading and prediction of domain knowledge based on eye-tracking measures. Most studies focus on multimedia learning. With respect to the prediction of domain knowledge, only some studies are found in the field of information search. This paper makes a good contribution to the literature on the effect of domain knowledge on eye-tracking measures during N-TOC reading and predicting domain knowledge.


Author(s):  
T. Ravindra Babu ◽  
M. Narasimha Murty ◽  
S. V. Subrahmanya

2021 ◽  
Author(s):  
◽  
Cao Truong Tran

<p>Classification is a major task in machine learning and data mining. Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors.    Existing most researchers working on classification with incomplete data focused on improving the effectiveness, but did not adequately address the issue of the efficiency of applying the classifiers to classify unseen instances, which is much more important than the act of creating classifiers. A common approach to classification with incomplete data is to use imputation methods to replace missing values with plausible values before building classifiers and classifying unseen instances. This approach provides complete data which can be then used by any classification algorithm, but sophisticated imputation methods are usually computationally intensive, especially for the application process of classification. Another approach to classification with incomplete data is to build a classifier that can directly work with missing values. This approach does not require time for estimating missing values, but it often generates inaccurate and complex classifiers when faced with numerous missing values. A recent approach to classification with incomplete data which also avoids estimating missing values is to build a set of classifiers which then is used to select applicable classifiers for classifying unseen instances. However, this approach is also often inaccurate and takes a long time to find applicable classifiers when faced with numerous missing values.   The overall goal of the thesis is to simultaneously improve the effectiveness and efficiency of classification with incomplete data by using evolutionary machine learning techniques for feature selection, clustering, ensemble learning, feature construction and constructing classifiers.   The thesis develops approaches for improving imputation for classification with incomplete data by integrating clustering and feature selection with imputation. The approaches improve both the effectiveness and the efficiency of using imputation for classification with incomplete data.   The thesis develops wrapper-based feature selection methods to improve input space for classification algorithms that are able to work directly with incomplete data. The methods not only improve the classification accuracy, but also reduce the complexity of classifiers able to work directly with incomplete data.   The thesis develops a feature construction method to improve input space for classification algorithms with incomplete data by proposing interval genetic programming-genetic programming with a set of interval functions. The method improves the classification accuracy and reduces the complexity of classifiers.   The thesis develops an ensemble approach to classification with incomplete data by integrating imputation, feature selection, and ensemble learning. The results show that the approach is more accurate, and faster than previous common methods for classification with incomplete data.   The thesis develops interval genetic programming to directly evolve classifiers for incomplete data. The results show that classifiers generated by interval genetic programming can be more effective and efficient than classifiers generated the combination of imputation and traditional genetic programming. Interval genetic programming is also more effective than common classification algorithms able to work directly with incomplete data.    In summary, the thesis develops a range of approaches for simultaneously improving the effectiveness and efficiency of classification with incomplete data by using a range of evolutionary machine learning techniques.</p>


2019 ◽  
Vol 85 ◽  
pp. 69-97
Author(s):  
Jurij Tekutov ◽  
Saulius Gudas ◽  
Vitalijus Denisovas ◽  
Julija Smirnova

The hierarchical Detailed Value Chain Model and the Elementary Management Cycle model of educational domain knowledge content updating are formally described in this paper, wherein computerized process measures are also proposed. The paper provides a method for updating the knowledge of the analyzed domain, referred to as the “enterprise domain,” based on enterprise modelling in terms of management information interactions. A method was designed, the formal DVCM and EMC descriptions of which are provided in the BPMN notation, allowing to develop a two-level (granular) model for describing the knowledge of educational domain management information interactions. In implementing this model and its algorithms in technological terms, a subsystem of enterprise knowledge has been created in a knowledge-based CASE system (computerized knowledge-based IS engineering), which performs the function of a domain knowledge database.


Sign in / Sign up

Export Citation Format

Share Document