Does Model Complexity add Value to Asset Allocation? Evidence from Machine Learning Forecasting Models †

Recent advances in artificial intelligence (AI) have led to its widespread industrial adoption, with machine learning systems demonstrating superhuman performance in a significant number of tasks. However, this surge in performance, has often been achieved through increased model complexity, turning such systems into “black box” approaches and causing uncertainty regarding the way they operate and, ultimately, the way that they come to decisions. This ambiguity has made it problematic for machine learning systems to be adopted in sensitive yet critical domains, where their value could be immense, such as healthcare. As a result, scientific interest in the field of Explainable Artificial Intelligence (XAI), a field that is concerned with the development of new methods that explain and interpret machine learning models, has been tremendously reignited over recent years. This study focuses on machine learning interpretability methods; more specifically, a literature review and taxonomy of these methods are presented, as well as links to their programming implementations, in the hope that this survey would serve as a reference point for both theorists and practitioners.

Download Full-text

Machine Learning for Multiple Petrophysical Properties Regression Based on Core Images and Well Logs in a Heterogenous Reservoir

10.2118/206089-ms ◽

2021 ◽

Author(s):

Tao Lin ◽

Mokhles Mezghani ◽

Chicheng Xu ◽

Weichang Li

Keyword(s):

Machine Learning ◽

Model Building ◽

Image Features ◽

Well Logs ◽

Model Complexity ◽

Rock Properties ◽

Core Analysis ◽

Petrophysical Properties ◽

Blind Test ◽

Porosity And Permeability

Abstract Reservoir characterization requires accurate prediction of multiple petrophysical properties such as bulk density (or acoustic impedance), porosity, and permeability. However, it remains a big challenge in heterogeneous reservoirs due to significant diagenetic impacts including dissolution, dolomitization, cementation, and fracturing. Most well logs lack the resolution to obtain rock properties in detail in a heterogenous formation. Therefore, it is pertinent to integrate core images into the prediction workflow. This study presents a new approach to solve the problem of obtaining the high-resolution multiple petrophysical properties, by combining machine learning (ML) algorithms and computer vision (CV) techniques. The methodology can be used to automate the process of core data analysis with a minimum number of plugs, thus reducing human effort and cost and improving accuracy. The workflow consists of conditioning and extracting features from core images, correlating well logs and core analysis with those features to build ML models, and applying the models on new cores for petrophysical properties predictions. The core images are preprocessed and analyzed using color models and texture recognition, to extract image characteristics and core textures. The image features are then aggregated into a profile in depth, resampled and aligned with well logs and core analysis. The ML regression models, including classification and regression trees (CART) and deep neural network (DNN), are trained and validated from the filtered training samples of relevant features and target petrophysical properties. The models are then tested on a blind test dataset to evaluate the prediction performance, to predict target petrophysical properties of grain density, porosity and permeability. The profile of histograms of each target property are computed to analyze the data distribution. The feature vectors are extracted from CV analysis of core images and gamma ray logs. The importance of each feature is generated by CART model to individual target, which may be used to reduce model complexity of future model building. The model performances are evaluated and compared on each target. We achieved reasonably good correlation and accuracy on the models, for example, porosity R2=49.7% and RMSE=2.4 p.u., and logarithmic permeability R2=57.8% and RMSE=0.53. The field case demonstrates that inclusion of core image attributes can improve petrophysical regression in heterogenous reservoirs. It can be extended to a multi-well setting to generate vertical distribution of petrophysical properties which can be integrated into reservoir modeling and characterization. Machine leaning algorithms can help automate the workflow and be flexible to be adjusted to take various inputs for prediction.

Download Full-text

Machine learning approaches to calibrate individual-based infectious disease models

10.1101/2021.01.27.21250484 ◽

2021 ◽

Author(s):

Theresa Reiker ◽

Monica Golumbeanu ◽

Andrew Shattock ◽

Lydia Burgert ◽

Thomas A. Smith ◽

...

Keyword(s):

Machine Learning ◽

Disease Transmission ◽

Goodness Of Fit ◽

Epidemiological Data ◽

Bayesian Optimization ◽

Model Complexity ◽

Learning Approaches ◽

Dimensional Parameter ◽

Novel Approach ◽

Transmission Models

AbstractIndividual-based models have become important tools in the global battle against infectious diseases, yet model complexity can make calibration to biological and epidemiological data challenging. We propose a novel approach to calibrate disease transmission models via a Bayesian optimization framework employing machine learning emulator functions to guide a global search over a multi-objective landscape. We demonstrate our approach by application to an established individual-based model of malaria, optimizing over a high-dimensional parameter space with respect to a portfolio of multiple fitting objectives built from datasets capturing the natural history of malaria transmission and disease progression. Outperforming other calibration methodologies, the new approach quickly reaches an improved final goodness of fit. Per-objective parameter importance and sensitivity diagnostics provided by our approach offer epidemiological insights and enhance trust in predictions through greater interpretability.One Sentence SummaryWe propose a novel, fast, machine learning-based approach to calibrate disease transmission models that outperforms other methodologies

Download Full-text

Prediction of Electropulse-Induced Nonlinear Temperature Variation of Mg Alloy Based on Machine Learning

Korean Journal of Metals and Materials ◽

10.3365/kjmm.2020.58.6.413 ◽

2020 ◽

Vol 58 (6) ◽

pp. 413-422

Author(s):

Jinyeong Yu ◽

Myoungjae Lee ◽

Young Hoon Moon ◽

Yoojeong Noh ◽

Taekyung Lee

Keyword(s):

Neural Network ◽

Machine Learning ◽

Temperature Variation ◽

High Energy ◽

Mg Alloy ◽

Model Complexity ◽

Gradient Boosting ◽

Learning Technology ◽

Extreme Gradient Boosting ◽

Nonlinear Temperature

Electropulse-induced heating has attracted attention due to its high energy efficiency. However, the process gives rise to a nonlinear temperature variation, which is difficult to predict using a traditional physics model. As an alternative, this study employed machine-learning technology to predict such temperature variation for the first time. Mg alloy was exposed to a single electropulse with a variety of pulse magnitudes and durations for this purpose. Nine machine-learning models were established from algorithms from artificial neural network (ANN), deep neural network (DNN), and extreme gradient boosting (XGBoost). The ANN models showed an insufficient predicting capability with respect to the region of peak temperature, where temperature varied most significantly. The DNN models were built by increasing model complexity, enhancing architectures, and tuning hyperparameters. They exhibited a remarkable improvement in predicting capability at the heating-cooling boundary as well as overall estimation. As a result, the DNN-2 model in this group showed the best prediction of nonlinear temperature variation among the machinelearning models built in this study. The XGBoost model exhibited poor predicting performance when default hyperparameters were applied. However, hyperparameter tuning of learning rates and maximum depths resulted in a decent predicting capability with this algorithm. Furthermore, XGBoost models exhibited an extreme reduction in learning time compared with the ANN and DNN models. This advantage is expected to be useful for predicting more complicated cases including various materials, multi-step electropulses, and electrically-assisted forming.

Download Full-text

A Framework for Effective Application of Machine Learning to Microbiome-Based Classification Problems

mBio ◽

10.1128/mbio.00434-20 ◽

2020 ◽

Vol 11 (3) ◽

Cited By ~ 9

Author(s):

Begüm D. Topçuoğlu ◽

Nicholas A. Lesniak ◽

Mack T. Ruffin ◽

Jenna Wiens ◽

Patrick D. Schloss

Keyword(s):

Machine Learning ◽

Logistic Regression ◽

Random Forest ◽

Sequence Data ◽

Characteristic Curve ◽

Predictive Performance ◽

Model Complexity ◽

Support Vector ◽

Classification Problems ◽

Microbial Biomarkers

ABSTRACT Machine learning (ML) modeling of the human microbiome has the potential to identify microbial biomarkers and aid in the diagnosis of many diseases such as inflammatory bowel disease, diabetes, and colorectal cancer. Progress has been made toward developing ML models that predict health outcomes using bacterial abundances, but inconsistent adoption of training and evaluation methods call the validity of these models into question. Furthermore, there appears to be a preference by many researchers to favor increased model complexity over interpretability. To overcome these challenges, we trained seven models that used fecal 16S rRNA sequence data to predict the presence of colonic screen relevant neoplasias (SRNs) (n = 490 patients, 261 controls and 229 cases). We developed a reusable open-source pipeline to train, validate, and interpret ML models. To show the effect of model selection, we assessed the predictive performance, interpretability, and training time of L2-regularized logistic regression, L1- and L2-regularized support vector machines (SVM) with linear and radial basis function kernels, a decision tree, random forest, and gradient boosted trees (XGBoost). The random forest model performed best at detecting SRNs with an area under the receiver operating characteristic curve (AUROC) of 0.695 (interquartile range [IQR], 0.651 to 0.739) but was slow to train (83.2 h) and not inherently interpretable. Despite its simplicity, L2-regularized logistic regression followed random forest in predictive performance with an AUROC of 0.680 (IQR, 0.625 to 0.735), trained faster (12 min), and was inherently interpretable. Our analysis highlights the importance of choosing an ML approach based on the goal of the study, as the choice will inform expectations of performance and interpretability. IMPORTANCE Diagnosing diseases using machine learning (ML) is rapidly being adopted in microbiome studies. However, the estimated performance associated with these models is likely overoptimistic. Moreover, there is a trend toward using black box models without a discussion of the difficulty of interpreting such models when trying to identify microbial biomarkers of disease. This work represents a step toward developing more-reproducible ML practices in applying ML to microbiome research. We implement a rigorous pipeline and emphasize the importance of selecting ML models that reflect the goal of the study. These concepts are not particular to the study of human health but can also be applied to environmental microbiology studies.

Download Full-text