Model Bias Characterization Considering Discrete and Continuous Design Variables

Author(s):  
Xiangxue Zhao ◽  
Zhimin Xi ◽  
Hongyi Xu ◽  
Ren-Jye Yang

Model bias can be normally modeled as a regression model to predict potential model errors in the design space with sufficient training data sets. Typically, only continuous design variables are considered since the regression model is mainly designed for response approximation in a continuous space. In reality, many engineering problems have discrete design variables mixed with continuous design variables. Although the regression model of the model bias can still approximate the model errors in various design/operation conditions, accuracy of the bias model degrades quickly with the increase of the discrete design variables. This paper proposes an effective model bias modeling strategy to better approximate the potential model errors in the design/operation space. The essential idea is to firstly determine an optimal base model from all combination models derived from discrete design variables, then allocate majority of the bias training samples to this base model, and build relationships between the base model and other combination models. Two engineering examples are used to demonstrate that the proposed approach possesses better bias modeling accuracy compared to the traditional regression modeling approach. Furthermore, it is shown that bias modeling combined with the baseline simulation model can possess higher model accuracy compared to the direct meta-modeling approach using the same amount of training data sets.

2021 ◽  
pp. 1-12
Author(s):  
Eamon Whalen ◽  
Caitlin Mueller

Abstract Surrogate models are often employed to speed up engineering design optimization; however, they typically require that all training data conform to the same parametrization (e.g. design variables), limiting design freedom and prohibiting the reuse of historical data. In response, this paper proposes Graph-based Surrogate Models (GSMs) for space frame structures. The GSM can accurately predict displacement fields from static loads given the structure's geometry as input, enabling training across multiple parametrizations. GSMs build upon recent advancements in geometric deep learning which have led to the ability to learn on undirected graphs: a natural representation for space frames. To further promote flexible surrogate models, the paper explores transfer learning within the context of engineering design, and demonstrates positive knowledge transfer across data sets of different topologies, complexities, loads and applications, resulting in more flexible and data-efficient surrogate models for space frame structures.


2021 ◽  
Vol 16 (1) ◽  
pp. 1-24
Author(s):  
Yaojin Lin ◽  
Qinghua Hu ◽  
Jinghua Liu ◽  
Xingquan Zhu ◽  
Xindong Wu

In multi-label learning, label correlations commonly exist in the data. Such correlation not only provides useful information, but also imposes significant challenges for multi-label learning. Recently, label-specific feature embedding has been proposed to explore label-specific features from the training data, and uses feature highly customized to the multi-label set for learning. While such feature embedding methods have demonstrated good performance, the creation of the feature embedding space is only based on a single label, without considering label correlations in the data. In this article, we propose to combine multiple label-specific feature spaces, using label correlation, for multi-label learning. The proposed algorithm, mu lti- l abel-specific f eature space e nsemble (MULFE), takes consideration label-specific features, label correlation, and weighted ensemble principle to form a learning framework. By conducting clustering analysis on each label’s negative and positive instances, MULFE first creates features customized to each label. After that, MULFE utilizes the label correlation to optimize the margin distribution of the base classifiers which are induced by the related label-specific feature spaces. By combining multiple label-specific features, label correlation based weighting, and ensemble learning, MULFE achieves maximum margin multi-label classification goal through the underlying optimization framework. Empirical studies on 10 public data sets manifest the effectiveness of MULFE.


Sensors ◽  
2021 ◽  
Vol 21 (5) ◽  
pp. 1573
Author(s):  
Loris Nanni ◽  
Giovanni Minchio ◽  
Sheryl Brahnam ◽  
Gianluca Maguolo ◽  
Alessandra Lumini

Traditionally, classifiers are trained to predict patterns within a feature space. The image classification system presented here trains classifiers to predict patterns within a vector space by combining the dissimilarity spaces generated by a large set of Siamese Neural Networks (SNNs). A set of centroids from the patterns in the training data sets is calculated with supervised k-means clustering. The centroids are used to generate the dissimilarity space via the Siamese networks. The vector space descriptors are extracted by projecting patterns onto the similarity spaces, and SVMs classify an image by its dissimilarity vector. The versatility of the proposed approach in image classification is demonstrated by evaluating the system on different types of images across two domains: two medical data sets and two animal audio data sets with vocalizations represented as images (spectrograms). Results show that the proposed system’s performance competes competitively against the best-performing methods in the literature, obtaining state-of-the-art performance on one of the medical data sets, and does so without ad-hoc optimization of the clustering methods on the tested data sets.


2005 ◽  
Vol 01 (01) ◽  
pp. 129-145 ◽  
Author(s):  
XIAOBO ZHOU ◽  
XIAODONG WANG ◽  
EDWARD R. DOUGHERTY

In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables (gene expressions) and the small number of experimental conditions. Many gene-selection and classification methods have been proposed; however most of these treat gene selection and classification separately, and not under the same model. We propose a Bayesian approach to gene selection using the logistic regression model. The Akaike information criterion (AIC), the Bayesian information criterion (BIC) and the minimum description length (MDL) principle are used in constructing the posterior distribution of the chosen genes. The same logistic regression model is then used for cancer classification. Fast implementation issues for these methods are discussed. The proposed methods are tested on several data sets including those arising from hereditary breast cancer, small round blue-cell tumors, lymphoma, and acute leukemia. The experimental results indicate that the proposed methods show high classification accuracies on these data sets. Some robustness and sensitivity properties of the proposed methods are also discussed. Finally, mixing logistic-regression based gene selection with other classification methods and mixing logistic-regression-based classification with other gene-selection methods are considered.


2006 ◽  
Vol 36 (5) ◽  
pp. 1129-1138 ◽  
Author(s):  
Jennifer L. Rooker Jensen ◽  
Karen S Humes ◽  
Tamara Conner ◽  
Christopher J Williams ◽  
John DeGroot

Although lidar data are widely available from commercial contractors, operational use in North America is still limited by both cost and the uncertainty of large-scale application and associated model accuracy issues. We analyzed whether small-footprint lidar data obtained from five noncontiguous geographic areas with varying species and structural composition, silvicultural practices, and topography could be used in a single regression model to produce accurate estimates of commonly obtained forest inventory attributes on the Nez Perce Reservation in northern Idaho, USA. Lidar-derived height metrics were used as predictor variables in a best-subset multiple linear regression procedure to determine whether a suite of stand inventory variables could be accurately estimated. Empirical relationships between lidar-derived height metrics and field-measured dependent variables were developed with training data and acceptable models validated with an independent subset. Models were then fit with all data, resulting in coefficients of determination and root mean square errors (respectively) for seven biophysical characteristics, including maximum canopy height (0.91, 3.03 m), mean canopy height (0.79, 2.64 m), quadratic mean DBH (0.61, 6.31 cm), total basal area (0.91, 2.99 m2/ha), ellipsoidal crown closure (0.80, 0.08%), total wood volume (0.93, 24.65 m3/ha), and large saw-wood volume (0.75, 28.76 m3/ha). Although these regression models cannot be generalized to other sites without additional testing, the results obtained in this study suggest that for these types of mixed-conifer forests, some biophysical characteristics can be adequately estimated using a single regression model over stands with highly variable structural characteristics and topography.


2021 ◽  
Vol 10 (1) ◽  
pp. 105
Author(s):  
I Gusti Ayu Purnami Indryaswari ◽  
Ida Bagus Made Mahendra

Many Indonesian people, especially in Bali, make pigs as livestock. Pig livestock are susceptible to various types of diseases and there have been many cases of pig deaths due to diseases that cause losses to breeders. Therefore, the author wants to create an Android-based application that can predict the type of disease in pigs by applying the C4.5 Algorithm. The C4.5 algorithm is an algorithm for classifying data in order to obtain a rule that is used to predict something. In this study, 50 training data sets were used with 8 types of diseases in pigs and 31 symptoms of disease. which is then inputted into the system so that the data is processed so that the system in the form of an Android application can predict the type of disease in pigs. In the testing process, it was carried out by testing 15 test data sets and producing an accuracy value that is 86.7%. In testing the application features built using the Kotlin programming language and the SQLite database, it has been running as expected.


Author(s):  
Yasuhisa Hattori ◽  
Hiromu Hashimoto ◽  
Masayuki Ochiai

Abstract The aim of this paper is to develop the general methodology for the optimum design of magnetic head slider for improving the spacing characteristics between head slider and disk surfaces under the static and dynamic operation conditions of hard disk drive and to present an application of the methodology to IBM 3380-type slider design. In the optimum design, the objective function is defined as the weighted sum of minimum spacing, maximum difference of spacing due to variation of radial location of head and maximum amplitude ratio of slider motion. Slider rail width, taper length, taper angle, suspension position and preload are selected as the design variables. Before the optimization of magnetic head slider, the effects of these five design variables on the objective function are examined by the parametric study, and then the optimum design variables are determined by applying the hybrid optimization technique combining the direct search method and the successive quadratic programming (SQP). From the results obtained, the effectiveness of optimum design on the spacing characteristics of magnetic head slider is clarified.


Author(s):  
Hilal Bahlawan ◽  
Mirko Morini ◽  
Michele Pinelli ◽  
Pier Ruggero Spina ◽  
Mauro Venturini

This paper documents the set-up and validation of nonlinear autoregressive exogenous (NARX) models of a heavy-duty single-shaft gas turbine. The considered gas turbine is a General Electric PG 9351FA located in Italy. The data used for model training are time series data sets of several different maneuvers taken experimentally during the start-up procedure and refer to cold, warm and hot start-up. The trained NARX models are used to predict other experimental data sets and comparisons are made among the outputs of the models and the corresponding measured data. Therefore, this paper addresses the challenge of setting up robust and reliable NARX models, by means of a sound selection of training data sets and a sensitivity analysis on the number of neurons. Moreover, a new performance function for the training process is defined to weigh more the most rapid transients. The final aim of this paper is the set-up of a powerful, easy-to-build and very accurate simulation tool which can be used for both control logic tuning and gas turbine diagnostics, characterized by good generalization capability.


Sign in / Sign up

Export Citation Format

Share Document