A Maximum Entropy Approach for Uncertainty Quantification and Analysis of Multifunctional Materials

The Maximum Entropy (ME) method is shown to provide a new approach for quantifying model uncertainty in the presence of complex, heterogeneous data. This is important in model validation of a variety of multifunctional constitutive relations. For example, multifunctional materials contain field-coupled material parameters that should be self-consistent regardless of the measurement. A classical example is piezoelectricity which may be quantified from charge induced by stress or strain induced by an electric field. The proposed tools provide new statistical information to address measurement discrepancies, guide model development, and catalyze materials discovery for data fusion problems. The error between the model outputs and heterogeneous data is quantified and used to formulate a second moment constraint within the entropy functional. This leads to an augmented likelihood function that weights each individual set of data by its respective variance and covariance between each data set. As a first step, the method is evaluated on a piezoelectric ceramic to illustrate how the covariance matrix influences piezoelectric parameter estimation from heterogeneous electric displacement and strain data.

Download Full-text

Application of the Maximum Entropy Method to Multifunctional Materials for Data Fusion and Uncertainty Quantification

Volume 2: Mechanics and Behavior of Active Materials; Structural Health Monitoring; Bioinspired Smart Materials and Systems; Energy Harvesting; Emerging Technologies ◽

10.1115/smasis2018-7960 ◽

2018 ◽

Author(s):

Wei Gao ◽

William S. Oates ◽

Paul R. Miles ◽

Ralph C. Smith

Keyword(s):

Maximum Entropy ◽

Smart Materials ◽

Intelligent Systems ◽

Density Functional ◽

Electric Displacement ◽

Phase Field Model ◽

Density Functional Theory Calculations ◽

Heterogeneous Data ◽

Entropy Method ◽

Model Parameter Uncertainty

Bayesian statistics is a quintessential tool for model validation in many applications including smart materials, adaptive structures, and intelligent systems. It typically uses either experimental data or high-fidelity simulations to infer model parameter uncertainty of reduced order models due to experimental noise and homogenization of quantum or atomistic behavior. When heterogeneous data is available for Bayesian inference, open questions remain on appropriate methods to fuse data and avoid inappropriate weighting on individual data sets. To address this issue, we implement a Bayesian statistical method that begins with maximizing entropy. We show how this method can weight heterogeneous data automatically during the inference process through the error covariance. This Maximum Entropy (ME) method is demonstrated by quantifying uncertainty in 1) a ferroelectric domain structure model and 2) a finite deforming electrostrictive membrane model. The ferroelectric phase field model identifies continuum parameters from multiple density functional theory calculations. In the case of the electrostrictive membrane, parameters are estimated from both mechanical and electric displacement experimental measurements.

Download Full-text

Interblend fusing of genetic algorithm-based attribute selection for clustering heterogeneous data set

Soft Computing ◽

10.1007/s00500-018-3669-9 ◽

2018 ◽

Vol 23 (8) ◽

pp. 2747-2759

Author(s):

J. Dhayanithi ◽

J. Akilandeswari

Keyword(s):

Genetic Algorithm ◽

Heterogeneous Data ◽

Attribute Selection ◽

Data Set ◽

Selection For

Download Full-text

A Comprehensive Set of Impact Data for Common Aerospace Metals

Journal of Computational and Nonlinear Dynamics ◽

10.1115/1.4036760 ◽

2017 ◽

Vol 12 (6) ◽

Cited By ~ 4

Author(s):

M. R. W. Brake ◽

P. L. Reu ◽

D. S. Aragon

Keyword(s):

Digital Image Correlation ◽

Digital Image ◽

Model Development ◽

Coefficient Of Restitution ◽

Image Correlation ◽

Plastic Behavior ◽

Data Set ◽

Force Resolution ◽

Impact Data ◽

The Impact

The results of two sets of impact experiments are reported within. To assist with model development using the impact data reported, the materials are mechanically characterized using a series of standard experiments. The first set of impact data comes from a series of coefficient of restitution (COR) experiments, in which a 2 m long pendulum is used to study “in-context” measurements of the coefficient of restitution for eight different materials (6061-T6 aluminum, phosphor bronze alloy 510, Hiperco, nitronic 60A, stainless steel 304, titanium, copper, and annealed copper). The coefficient of restitution is measured via two different techniques: digital image correlation (DIC) and laser Doppler vibrometry (LDV). Due to the strong agreement of the two different methods, only results from the digital image correlation are reported. The coefficient of restitution experiments are in context as the scales of the geometry and impact velocities are representative of common features in the motivating application for this research. Finally, a series of compliance measurements are detailed for the same set of materials. The compliance measurements are conducted using both nano-indentation and micro-indentation machines, providing sub-nm displacement resolution and μN force resolution. Good agreement is seen for load levels spanned by both machines. As the transition from elastic to plastic behavior occurs at contact displacements on the order of 30 nm, this data set provides a unique insight into the transitionary region.

Download Full-text

An empirical wavelet transform based approach for multivariate data processing application to cardiovascular physiological signals

Bio-Algorithms and Med-Systems ◽

10.1515/bams-2018-0030 ◽

2018 ◽

Vol 14 (4) ◽

Cited By ~ 1

Author(s):

Omkar Singh ◽

Ramesh Kumar Sunkaria

Keyword(s):

Wavelet Transform ◽

Multivariate Data ◽

Heterogeneous Data ◽

Physiological Signals ◽

Data Series ◽

Data Set ◽

Processing Application ◽

Adaptive Wavelet ◽

Empirical Wavelet Transform ◽

Multivariate Signals

Abstract Background This article proposes an extension of empirical wavelet transform (EWT) algorithm for multivariate signals specifically applied to cardiovascular physiological signals. Materials and methods EWT is a newly proposed algorithm for extracting the modes in a signal and is based on the design of an adaptive wavelet filter bank. The proposed algorithm finds an optimum signal in the multivariate data set based on mode estimation strategy and then its corresponding spectra is segmented and utilized for extracting the modes across all the channels of the data set. Results The proposed algorithm is able to find the common oscillatory modes within the multivariate data and can be applied for multichannel heterogeneous data analysis having unequal number of samples in different channels. The proposed algorithm was tested on different synthetic multivariate data and a real physiological trivariate data series of electrocardiogram, respiration, and blood pressure to justify its validation. Conclusions In this article, the EWT is extended for multivariate signals and it was demonstrated that the component-wise processing of multivariate data leads to the alignment of common oscillating modes across the components.

Download Full-text

A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities

Entropy ◽

10.3390/e20080601 ◽

2018 ◽

Vol 20 (8) ◽

pp. 601 ◽

Cited By ~ 3

Author(s):

Paul Darscheid ◽

Anneli Guthke ◽

Uwe Ehret

Keyword(s):

Maximum Entropy ◽

Multinomial Distribution ◽

Entropy Method ◽

Small Sample ◽

Discrete Distributions ◽

Occupation Probability ◽

Small Samples ◽

Data Set ◽

Sample Distribution ◽

Leibler Divergence

When constructing discrete (binned) distributions from samples of a data set, applications exist where it is desirable to assure that all bins of the sample distribution have nonzero probability. For example, if the sample distribution is part of a predictive model for which we require returning a response for the entire codomain, or if we use Kullback–Leibler divergence to measure the (dis-)agreement of the sample distribution and the original distribution of the variable, which, in the described case, is inconveniently infinite. Several sample-based distribution estimators exist which assure nonzero bin probability, such as adding one counter to each zero-probability bin of the sample histogram, adding a small probability to the sample pdf, smoothing methods such as Kernel-density smoothing, or Bayesian approaches based on the Dirichlet and Multinomial distribution. Here, we suggest and test an approach based on the Clopper–Pearson method, which makes use of the binominal distribution. Based on the sample distribution, confidence intervals for bin-occupation probability are calculated. The mean of each confidence interval is a strictly positive estimator of the true bin-occupation probability and is convergent with increasing sample size. For small samples, it converges towards a uniform distribution, i.e., the method effectively applies a maximum entropy approach. We apply this nonzero method and four alternative sample-based distribution estimators to a range of typical distributions (uniform, Dirac, normal, multimodal, and irregular) and measure the effect with Kullback–Leibler divergence. While the performance of each method strongly depends on the distribution type it is applied to, on average, and especially for small sample sizes, the nonzero, the simple “add one counter”, and the Bayesian Dirichlet-multinomial model show very similar behavior and perform best. We conclude that, when estimating distributions without an a priori idea of their shape, applying one of these methods is favorable.

Download Full-text

Using Model Performance to Assess the Representativeness of Data for Model Development and Calibration in Financial Institutions

Risks ◽

10.3390/risks9110204 ◽

2021 ◽

Vol 9 (11) ◽

pp. 204

Author(s):

Chamay Kruger ◽

Willem Daniel Schutte ◽

Tanja Verster

Keyword(s):

Case Studies ◽

Model Development ◽

Model Performance ◽

Data Set ◽

Pooled Data ◽

External Data ◽

First Case ◽

Multivariate Prediction ◽

Credit Data

This paper proposes a methodology that utilises model performance as a metric to assess the representativeness of external or pooled data when it is used by banks in regulatory model development and calibration. There is currently no formal methodology to assess representativeness. The paper provides a review of existing regulatory literature on the requirements of assessing representativeness and emphasises that both qualitative and quantitative aspects need to be considered. We present a novel methodology and apply it to two case studies. We compared our methodology with the Multivariate Prediction Accuracy Index. The first case study investigates whether a pooled data source from Global Credit Data (GCD) is representative when considering the enrichment of internal data with pooled data in the development of a regulatory loss given default (LGD) model. The second case study differs from the first by illustrating which other countries in the pooled data set could be representative when enriching internal data during the development of a LGD model. Using these case studies as examples, our proposed methodology provides users with a generalised framework to identify subsets of the external data that are representative of their Country’s or bank’s data, making the results general and universally applicable.

Download Full-text

A computationally tractable birth-death model that combines phylogenetic and epidemiological data

10.1101/2020.10.21.349068 ◽

2020 ◽

Author(s):

Alexander E. Zarebski ◽

Louis du Plessis ◽

Kris V. Parag ◽

Oliver G. Pybus

Keyword(s):

Transmission Rate ◽

Epidemiological Data ◽

Heterogeneous Data ◽

Pathogen Transmission ◽

Data Sources ◽

Primary Data ◽

Analytic Approximation ◽

Data Set ◽

Combining Data ◽

Birth Death

Inferring the dynamics of pathogen transmission during an outbreak is an important problem in both infectious disease epidemiology and phylodynamics. In mathematical epidemiology, estimates are often informed by time-series of infected cases while in phylodynamics genetic sequences sampled through time are the primary data source. Each data type provides different, and potentially complementary, insights into transmission. However inference methods are typically highly specialised and field-specific. Recent studies have recognised the benefits of combining data sources, which include improved estimates of the transmission rate and number of infected individuals. However, the methods they employ are either computationally prohibitive or require intensive simulation, limiting their real-time utility. We present a novel birth-death phylogenetic model, called TimTam which can be informed by both phylogenetic and epidemiological data. Moreover, we derive a tractable analytic approximation of the TimTam likelihood, the computational complexity of which is linear in the size of the data set. Using the TimTam we show how key parameters of transmission dynamics and the number of unreported infections can be estimated accurately using these heterogeneous data sources. The approximate likelihood facilitates inference on large data sets, an important consideration as such data become increasingly common due to improving sequencing capability.

Download Full-text

Keep it simple - A case study of model development in the context of the Dynamic Stocks and Flows (DSF) task

Journal of Artificial General Intelligence ◽

10.2478/v10229-011-0008-2 ◽

2010 ◽

Vol 2 (2) ◽

pp. 38-51 ◽

Cited By ~ 1

Author(s):

Marc Halbrügge

Keyword(s):

Goodness Of Fit ◽

Model Development ◽

Cognitive Model ◽

Training Data ◽

Sequence Matching ◽

Data Set ◽

Depth Analysis ◽

Stocks And Flows ◽

Matching Techniques

Keep it simple - A case study of model development in the context of the Dynamic Stocks and Flows (DSF) taskThis paper describes the creation of a cognitive model submitted to the ‘Dynamic Stocks and Flows’ (DSF) modeling challenge. This challenge aims at comparing computational cognitive models for human behavior during an open ended control task. Participants in the modeling competition were provided with a simulation environment and training data for benchmarking their models while the actual specification of the competition task was withheld. To meet this challenge, the cognitive model described here was designed and optimized for generalizability. Only two simple assumptions about human problem solving were used to explain the empirical findings of the training data. In-depth analysis of the data set prior to the development of the model led to the dismissal of correlations or other parametric statistics as goodness-of-fit indicators. A new statistical measurement based on rank orders and sequence matching techniques is being proposed instead. This measurement, when being applied to the human sample, also identifies clusters of subjects that use different strategies for the task. The acceptability of the fits achieved by the model is verified using permutation tests.

Download Full-text

Classification of Clinically Significant Prostate Cancer on Multi-Parametric MRI: A Validation Study Comparing Deep Learning and Radiomics

Cancers ◽

10.3390/cancers14010012 ◽

2021 ◽

Vol 14 (1) ◽

pp. 12

Author(s):

Jose M. Castillo T. ◽

Muhammad Arif ◽

Martijn P. A. Starmans ◽

Wiro J. Niessen ◽

Chris H. Bangma ◽

...

Keyword(s):

Prostate Cancer ◽

Deep Learning ◽

Characteristic Curve ◽

Model Development ◽

Learning Model ◽

Multiparametric Mri ◽

Data Sets ◽

Data Set ◽

Test Sets ◽

Deep Learning Model

The computer-aided analysis of prostate multiparametric MRI (mpMRI) could improve significant-prostate-cancer (PCa) detection. Various deep-learning- and radiomics-based methods for significant-PCa segmentation or classification have been reported in the literature. To be able to assess the generalizability of the performance of these methods, using various external data sets is crucial. While both deep-learning and radiomics approaches have been compared based on the same data set of one center, the comparison of the performances of both approaches on various data sets from different centers and different scanners is lacking. The goal of this study was to compare the performance of a deep-learning model with the performance of a radiomics model for the significant-PCa diagnosis of the cohorts of various patients. We included the data from two consecutive patient cohorts from our own center (n = 371 patients), and two external sets of which one was a publicly available patient cohort (n = 195 patients) and the other contained data from patients from two hospitals (n = 79 patients). Using multiparametric MRI (mpMRI), the radiologist tumor delineations and pathology reports were collected for all patients. During training, one of our patient cohorts (n = 271 patients) was used for both the deep-learning- and radiomics-model development, and the three remaining cohorts (n = 374 patients) were kept as unseen test sets. The performances of the models were assessed in terms of their area under the receiver-operating-characteristic curve (AUC). Whereas the internal cross-validation showed a higher AUC for the deep-learning approach, the radiomics model obtained AUCs of 0.88, 0.91 and 0.65 on the independent test sets compared to AUCs of 0.70, 0.73 and 0.44 for the deep-learning model. Our radiomics model that was based on delineated regions resulted in a more accurate tool for significant-PCa classification in the three unseen test sets when compared to a fully automated deep-learning model.

Download Full-text

Application of the Maximum Entropy Method for Determining a Sensitive Distribution in the Renewable Energy Systems

Journal of Energy Resources Technology ◽

10.1115/1.4030268 ◽

2015 ◽

Vol 137 (4) ◽

Cited By ~ 5

Author(s):

Gholamhossein Yari ◽

Zahra Amini Farsani

Keyword(s):

Renewable Energy ◽

Wind Speed ◽

Wind Energy ◽

Maximum Entropy ◽

Energy Systems ◽

Rayleigh Distribution ◽

Entropy Method ◽

Climatic Data ◽

Data Set ◽

Renewable Energy Systems

In the field of the wind energy conversion, a precise determination of the probability distribution of wind speed guarantees an efficient use of the wind energy and enhances the position of wind energy against other forms of energy. The present study thus proposes utilizing an accurate numerical-probabilistic algorithm which is the combination of the Newton’s technique and the maximum entropy (ME) method to determine an important distribution in the renewable energy systems, namely the hyper Rayleigh distribution (HRD) which belongs to the family of Weibull distribution. The HRD is mainly used to model the wind speed and the variations of the solar irradiance level with a negligible error. The purpose of this research is to find the unique solution to an optimization problem which occurs when maximizing Shannon’s entropy. To confirm the accuracy and efficiency of our algorithm, we used the long-term data for the average daily wind speed in Toyokawa for 12 yr to examine the Rayleigh distribution (RD). This data set was obtained from the National Climatic Data Center (NCDC) in Japan. It seems that the RD is more closely fitted to the data. In addition, we presented different simulation studies to check the reliability of the proposed algorithm.

Download Full-text