A Global Flood Risk Modeling Framework Built With Climate Models and Machine Learning

The success of Customer Relationship Management (CRM) programs ultimately depends on the firm's ability to identify and leverage differences across customers — a very diffcult task when firms attempt to manage new customers, for whom only the first purchase has been observed. For those customers, the lack of repeated observations poses a structural challenge to inferring unobserved differences across them. This is what we call the “cold start” problem of CRM, whereby companies have difficulties leveraging existing data when they attempt to make inferences about customers at the beginning of their relationship. We propose a solution to the cold start problem by developing a probabilistic machine learning modeling framework that leverages the information collected at the moment of acquisition. The main aspect of the model is that it exibly captures latent dimensions that govern the behaviors observed at acquisition as well as future propensities to buy and to respond to marketing actions using deep exponential families. The model can be integrated with a variety of demand specifications and is exible enough to capture a wide range of heterogeneity structures. We validate our approach in a retail context and empirically demonstrate the model's ability at identifying high-value customers as well as those most sensitive to marketing actions, right after their first purchase.

Download Full-text

An Assessment of Machine Learning Techniques for Replicating Physical Forcing Mechanisms in Climate Models

10.1002/essoar.10501799.1 ◽

2020 ◽

Author(s):

Garrett Limon ◽

Christiane Jablonowski

Keyword(s):

Machine Learning ◽

Climate Models ◽

Machine Learning Techniques ◽

Physical Forcing ◽

Learning Techniques ◽

Forcing Mechanisms

Download Full-text

Nature-based solutions for flood risk reduction: A probabilistic modeling framework

One Earth ◽

10.1016/j.oneear.2021.08.010 ◽

2021 ◽

Vol 4 (9) ◽

pp. 1310-1321

Author(s):

David Lallemant ◽

Perrine Hamel ◽

Mariano Balbi ◽

Tian Ning Lim ◽

Rafael Schmitt ◽

...

Keyword(s):

Risk Reduction ◽

Flood Risk ◽

Probabilistic Modeling ◽

Modeling Framework ◽

Flood Risk Reduction

Download Full-text

IMPROVING THE NORTH AMERICAN MULTI-MODEL ENSEMBLE (NMME) PRECIPITATION FORECASTS AT SEASONAL SCALE OVER THE HIMALAYAN REGION USING MACHINE LEARNING

International Journal of Big Data Mining for Global Warming ◽

10.1142/s263053482150008x ◽

2021 ◽

Author(s):

SOURABH SHRIVASTAVA ◽

RAM AVTAR ◽

PRASANTA KUMAR BAL

Keyword(s):

Machine Learning ◽

North American ◽

Climate Models ◽

Global Climate ◽

Horizontal Resolution ◽

Summer Monsoon Rainfall ◽

Global Climate Models ◽

Himalayan Region ◽

Model Ensemble ◽

The North

The coarse horizontal resolution global climate models (GCMs) have limitations in producing large biases over the mountainous region. Also, single model output or simple multi-model ensemble (SMME) outputs are associated with large biases. While predicting the rainfall extreme events, this study attempts to use an alternative modeling approach by using five different machine learning (ML) algorithms to improve the skill of North American Multi-Model Ensemble (NMME) GCMs during Indian summer monsoon rainfall from 1982 to 2009 by reducing the model biases. Random forest (RF), AdaBoost (Ada), gradient (Grad) boosting, bagging (Bag) and extra (Extra) trees regression models are used and the results from each models are compared against the observations. In simple MME (SMME), a wet bias of 20[Formula: see text]mm/day and an RMSE up to 15[Formula: see text]mm/day are found over the Himalayan region. However, all the ML models can bring down the mean bias up to [Formula: see text][Formula: see text]mm/day and RMSE up to 2[Formula: see text]mm/day. The interannual variability in ML outputs is closer to observation than the SMME. Also, a high correlation from 0.5 to 0.8 is found between in all ML models and then in SMME. Moreover, representation of RF and Grad is found to be best out of all five ML models that represent a high correlation over the Himalayan region. In conclusion, by taking full advantage of different models, the proposed ML-based multi-model ensemble method is shown to be accurate and effective.

Download Full-text

New interpretable machine learning method for single-cell data reveals correlates of clinical response to cancer immunotherapy

10.1101/702118 ◽

2019 ◽

Cited By ~ 3

Author(s):

Evan Greene ◽

Greg Finak ◽

Leonard A. D’Amico ◽

Nina Bhardwaj ◽

Candice D. Church ◽

...

Keyword(s):

Machine Learning ◽

Flow Cytometry ◽

T Cell ◽

Single Cell ◽

Cancer Immunotherapy ◽

Effector Memory ◽

Machine Learning Method ◽

Learning Method ◽

Modeling Framework ◽

Interpretable Machine Learning

AbstractHigh-dimensional single-cell cytometry is routinely used to characterize patient responses to cancer immunotherapy and other treatments. This has produced a wealth of datasets ripe for exploration but whose biological and technical heterogeneity make them difficult to analyze with current tools. We introduce a new interpretable machine learning method for single-cell mass and flow cytometry studies, FAUST, that robustly performs unbiased cell population discovery and annotation. FAUST processes data on a per-sample basis and returns biologically interpretable cell phenotypes that can be compared across studies, making it well-suited for the analysis and integration of complex datasets. We demonstrate how FAUST can be used for candidate biomarker discovery and validation by applying it to a flow cytometry dataset from a Merkel cell carcinoma anti-PD-1 trial and discover new CD4+ and CD8+ effector-memory T cell correlates of outcome co-expressing PD-1, HLA-DR, and CD28. We then use FAUST to validate these correlates in an independent CyTOF dataset from a published metastatic melanoma trial. Importantly, existing state-of-the-art computational discovery approaches as well as prior manual analysis did not detect these or any other statistically significant T cell sub-populations associated with anti-PD-1 treatment in either data set. We further validate our methodology by using FAUST to replicate the discovery of a previously reported myeloid correlate in a different published melanoma trial, and validate the correlate by identifying it de novo in two additional independent trials. FAUST’s phenotypic annotations can be used to perform cross-study data integration in the presence of heterogeneous data and diverse immunophenotyping staining panels, enabling hypothesis-driven inference about cell sub-population abundance through a multivariate modeling framework we call Phenotypic and Functional Differential Abundance (PFDA). We demonstrate this approach on data from myeloid and T cell panels across multiple trials. Together, these results establish FAUST as a powerful and versatile new approach for unbiased discovery in single-cell cytometry.

Download Full-text

Application of all relevant feature selection for failure analysis of parameter-induced simulation crashes in climate models

Geoscientific Model Development Discussions ◽

10.5194/gmdd-8-5419-2015 ◽

2015 ◽

Vol 8 (7) ◽

pp. 5419-5435 ◽

Cited By ~ 1

Author(s):

W. Paja ◽

M. Wrzesie&nacute; ◽

R. Niemiec ◽

W. R. Rudnicki

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Climate Models ◽

Original Study ◽

Relative Importance ◽

Relevant Feature ◽

Machine Learning Methods ◽

Selection For ◽

Robust Prediction ◽

Physical Components

Abstract. The climate models are extremely complex pieces of software. They reflect best knowledge on physical components of the climate, nevertheless, they contain several parameters, which are too weakly constrained by observations, and can potentially lead to a crash of simulation. Recently a study by Lucas et al. (2013) has shown that machine learning methods can be used for predicting which combinations of parameters can lead to crash of simulation, and hence which processes described by these parameters need refined analyses. In the current study we reanalyse the dataset used in this research using different methodology. We confirm the main conclusion of the original study concerning suitability of machine learning for prediction of crashes. We show, that only three of the eight parameters indicated in the original study as relevant for prediction of the crash are indeed strongly relevant, three other are relevant but redundant, and two are not relevant at all. We also show that the variance due to split of data between training and validation sets has large influence both on accuracy of predictions and relative importance of variables, hence only cross-validated approach can deliver robust prediction of performance and relevance of variables.

Download Full-text

Seasonal Prediction of Summer Precipitation in the Middle and Lower Reaches of the Yangtze River Valley: Comparison of Machine Learning and Climate Model Predictions

Water ◽

10.3390/w13223294 ◽

2021 ◽

Vol 13 (22) ◽

pp. 3294

Author(s):

Chentao He ◽

Jiangfeng Wei ◽

Yuanyuan Song ◽

Jing-Jia Luo

Keyword(s):

Neural Network ◽

Machine Learning ◽

Yangtze River ◽

Climate Models ◽

Summer Precipitation ◽

Yangtze River Valley ◽

River Valley ◽

Training Data ◽

The Yangtze River ◽

Learning Methods

The middle and lower reaches of the Yangtze River valley (YRV), which are among the most densely populated regions in China, are subject to frequent flooding. In this study, the predictor importance analysis model was used to sort and select predictors, and five methods (multiple linear regression (MLR), decision tree (DT), random forest (RF), backpropagation neural network (BPNN), and convolutional neural network (CNN)) were used to predict the interannual variation of summer precipitation over the middle and lower reaches of the YRV. Predictions from eight climate models were used for comparison. Of the five tested methods, RF demonstrated the best predictive skill. Starting the RF prediction in December, when its prediction skill was highest, the 70-year correlation coefficient from cross validation of average predictions was 0.473. Using the same five predictors in December 2019, the RF model successfully predicted the YRV wet anomaly in summer 2020, although it had weaker amplitude. It was found that the enhanced warm pool area in the Indian Ocean was the most important causal factor. The BPNN and CNN methods demonstrated the poorest performance. The RF, DT, and climate models all showed higher prediction skills when the predictions start in winter than in early spring, and the RF, DT, and MLR methods all showed better prediction skills than the numerical climate models. Lack of training data was a factor that limited the performance of the machine learning methods. Future studies should use deep learning methods to take full advantage of the potential of ocean, land, sea ice, and other factors for more accurate climate predictions.

Download Full-text

A New Modeling Framework for Geothermal Operational Optimization with Machine Learning (GOOML)

Energies ◽

10.3390/en14206852 ◽

2021 ◽

Vol 14 (20) ◽

pp. 6852

Author(s):

Grant Buster ◽

Paul Siratovich ◽

Nicole Taverna ◽

Michael Rossol ◽

Jon Weers ◽

...

Keyword(s):

Machine Learning ◽

Power Plants ◽

High Reliability ◽

Geothermal Systems ◽

Low Carbon ◽

Modeling Framework ◽

Operational Optimization ◽

Modeling Software ◽

Geothermal Power ◽

Nominal Performance

Geothermal power plants are excellent resources for providing low carbon electricity generation with high reliability. However, many geothermal power plants could realize significant improvements in operational efficiency from the application of improved modeling software. Increased integration of digital twins into geothermal operations will not only enable engineers to better understand the complex interplay of components in larger systems but will also enable enhanced exploration of the operational space with the recent advances in artificial intelligence (AI) and machine learning (ML) tools. Such innovations in geothermal operational analysis have been deterred by several challenges, most notably, the challenge in applying idealized thermodynamic models to imperfect as-built systems with constant degradation of nominal performance. This paper presents GOOML: a new framework for Geothermal Operational Optimization with Machine Learning. By taking a hybrid data-driven thermodynamics approach, GOOML is able to accurately model the real-world performance characteristics of as-built geothermal systems. Further, GOOML can be readily integrated into the larger AI and ML ecosystem for true state-of-the-art optimization. This modeling framework has already been applied to several geothermal power plants and has provided reasonably accurate results in all cases. Therefore, we expect that the GOOML framework can be applied to any geothermal power plant around the world.

Download Full-text

Improving climate model coupling through a complete mesh representation: a case study with E3SM (v1) and MOAB (v5.x)

Geoscientific Model Development ◽

10.5194/gmd-13-2355-2020 ◽

2020 ◽

Vol 13 (5) ◽

pp. 2355-2377

Author(s):

Vijay S. Mahadevan ◽

Iulian Grindeanu ◽

Robert Jacob ◽

Jason Sarich

Keyword(s):

Large Scale ◽

Climate Models ◽

Climate Model ◽

Climate Modeling ◽

System Modeling ◽

Spectral Element ◽

Scientific Workflow ◽

Model Coupling ◽

Earth System ◽

Modeling Framework

Abstract. One of the fundamental factors contributing to the spatiotemporal inaccuracy in climate modeling is the mapping of solution field data between different discretizations and numerical grids used in the coupled component models. The typical climate computational workflow involves evaluation and serialization of the remapping weights during the preprocessing step, which is then consumed by the coupled driver infrastructure during simulation to compute field projections. Tools like Earth System Modeling Framework (ESMF) (Hill et al., 2004) and TempestRemap (Ullrich et al., 2013) offer capability to generate conservative remapping weights, while the Model Coupling Toolkit (MCT) (Larson et al., 2001) that is utilized in many production climate models exposes functionality to make use of the operators to solve the coupled problem. However, such multistep processes present several hurdles in terms of the scientific workflow and impede research productivity. In order to overcome these limitations, we present a fully integrated infrastructure based on the Mesh Oriented datABase (MOAB) (Tautges et al., 2004; Mahadevan et al., 2015) library, which allows for a complete description of the numerical grids and solution data used in each submodel. Through a scalable advancing-front intersection algorithm, the supermesh of the source and target grids are computed, which is then used to assemble the high-order, conservative, and monotonicity-preserving remapping weights between discretization specifications. The Fortran-compatible interfaces in MOAB are utilized to directly link the submodels in the Energy Exascale Earth System Model (E3SM) to enable online remapping strategies in order to simplify the coupled workflow process. We demonstrate the superior computational efficiency of the remapping algorithms in comparison with other state-of-the-science tools and present strong scaling results on large-scale machines for computing remapping weights between the spectral element atmosphere and finite volume discretizations on the polygonal ocean grids.

Download Full-text