Training machine learning models on climate model output yields skillful interpretable seasonal precipitation forecasts

AbstractA barrier to utilizing machine learning in seasonal forecasting applications is the limited sample size of observational data for model training. To circumvent this issue, here we explore the feasibility of training various machine learning approaches on a large climate model ensemble, providing a long training set with physically consistent model realizations. After training on thousands of seasons of climate model simulations, the machine learning models are tested for producing seasonal forecasts across the historical observational period (1980-2020). For forecasting large-scale spatial patterns of precipitation across the western United States, here we show that these machine learning-based models are capable of competing with or outperforming existing dynamical models from the North American Multi Model Ensemble. We further show that this approach need not be considered a ‘black box’ by utilizing machine learning interpretability methods to identify the relevant physical processes that lead to prediction skill.

Download Full-text

Week 3–4 Prediction of Wintertime CONUS Temperature Using Machine Learning Techniques

Frontiers in Climate ◽

10.3389/fclim.2021.697423 ◽

2021 ◽

Vol 3 ◽

Author(s):

Paul Buchmann ◽

Timothy DelSole

Keyword(s):

Machine Learning ◽

Regression Models ◽

Large Scale ◽

Climate Model ◽

Dynamical Model ◽

Learning Models ◽

Machine Learning Model ◽

Climate Model Output ◽

Machine Learning Models ◽

Better Than

This paper shows that skillful week 3–4 predictions of a large-scale pattern of 2 m temperature over the US can be made based on the Nino3.4 index alone, where skillful is defined to be better than climatology. To find more skillful regression models, this paper explores various machine learning strategies (e.g., ridge regression and lasso), including those trained on observations and on climate model output. It is found that regression models trained on climate model output yield more skillful predictions than regression models trained on observations, presumably because of the larger training sample. Nevertheless, the skill of the best machine learning models are only modestly better than ordinary least squares based on the Nino3.4 index. Importantly, this fact is difficult to infer from the parameters of the machine learning model because very different parameter sets can produce virtually identical predictions. For this reason, attempts to interpret the source of predictability from the machine learning model can be very misleading. The skill of machine learning models also are compared to those of a fully coupled dynamical model, CFSv2. The results depend on the skill measure: for mean square error, the dynamical model is slightly worse than the machine learning models; for correlation skill, the dynamical model is only modestly better than machine learning models or the Nino3.4 index. In summary, the best predictions of the large-scale pattern come from machine learning models trained on long climate simulations, but the skill is only modestly better than predictions based on the Nino3.4 index alone.

Download Full-text

Development of Rainfall Prediction Models Using Machine Learning Approaches for Different Agro-Climatic Zones

Advances in Data Mining and Database Management - Handbook of Research on Automated Feature Engineering and Advanced Applications in Data Science ◽

10.4018/978-1-7998-6659-6.ch005 ◽

2021 ◽

pp. 72-94

Author(s):

Diwakar Naidu ◽

Babita Majhi ◽

Surendra Kumar Chandniha

Keyword(s):

Neural Network ◽

Machine Learning ◽

Large Scale ◽

Prediction Models ◽

Machine Learning Algorithms ◽

Learning Approaches ◽

Learning Models ◽

Climatic Zones ◽

Environmental Prediction ◽

Machine Learning Models

This study focuses on modelling the changes in rainfall patterns in different agro-climatic zones due to climate change through statistical downscaling of large-scale climate variables using machine learning approaches. Potential of three machine learning algorithms, multilayer artificial neural network (MLANN), radial basis function neural network (RBFNN), and least square support vector machine (LS-SVM) have been investigated. The large-scale climate variable are obtained from National Centre for Environmental Prediction (NCEP) reanalysis product and used as predictors for model development. Proposed machine learning models are applied to generate projected time series of rainfall for the period 2021-2050 using the Hadley Centre coupled model (HadCM3) B2 emission scenario data as predictors. An increasing trend in anticipated rainfall is observed during 2021-2050 in all the ACZs of Chhattisgarh State. Among the machine learning models, RBFNN found as more feasible technique for modeling of monthly rainfall in this region.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Machine learning approaches to understand and predict rate constants for organic processes in mixtures containing ionic liquids

Physical Chemistry Chemical Physics ◽

10.1039/d0cp04227g ◽

2021 ◽

Vol 23 (4) ◽

pp. 2742-2752

Author(s):

Tamar L. Greaves ◽

Karin S. Schaffarczyk McHale ◽

Raphael F. Burkart-Radke ◽

Jason B. Harper ◽

Tu C. Le

Keyword(s):

Machine Learning ◽

Ionic Liquids ◽

Rate Constants ◽

Learning Approaches ◽

Learning Models ◽

Organic Reaction ◽

Machine Learning Models ◽

Selection Of

Machine learning models were developed for an organic reaction in ionic liquids and validated on a selection of ionic liquids.

Download Full-text

Machine Learning Models for Predicting Attributes of Large-Scale Systems

46th AIAA Aerospace Sciences Meeting and Exhibit ◽

10.2514/6.2008-886 ◽

2008 ◽

Author(s):

Richard Selby

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Models ◽

Large Scale Systems ◽

Machine Learning Models

Download Full-text

QUBO formulations for training machine learning models

Scientific Reports ◽

10.1038/s41598-021-89461-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Prasanna Date ◽

Davis Arthur ◽

Lauren Pusey-Nazzaro

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Large Scale ◽

Support Vector ◽

Quantum Computers ◽

Np Hard ◽

Learning Models ◽

Moore’S Law ◽

Moore's Law ◽

Machine Learning Models

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

A Physics-Infused Deep Learning Model for the Prediction of Refractive Indices and Its Use for the Large-Scale Screening of Organic Compound Space

10.26434/chemrxiv.8796950 ◽

2019 ◽

Author(s):

Mojtaba Haghighatlari ◽

Gaurav Vishwakarma ◽

Mohammad Atif Faiz Afzal ◽

Johannes Hachmann

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Organic Molecules ◽

Learning Model ◽

Training Data ◽

Refractive Indices ◽

Learning Models ◽

Deep Learning Model ◽

Machine Learning Models

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>

Download Full-text

Machine Learning Models for GPU Error Prediction in a Large Scale HPC System

2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) ◽

10.1109/dsn.2018.00022 ◽

2018 ◽

Cited By ~ 14

Author(s):

Bin Nie ◽

Ji Xue ◽

Saurabh Gupta ◽

Tirthak Patel ◽

Christian Engelmann ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Error Prediction ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Impact of extreme weather conditions on European crop production in 2018

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2019.0510 ◽

2020 ◽

Vol 375 (1810) ◽

pp. 20190510 ◽

Cited By ~ 3

Author(s):

Damien Beillouin ◽

Bernhard Schauberger ◽

Ana Bastos ◽

Phillipe Ciais ◽

David Makowski

Keyword(s):

Machine Learning ◽

Crop Production ◽

Large Scale ◽

Extreme Weather ◽

Severe Drought ◽

Yield Losses ◽

Learning Models ◽

Yield Data ◽

Continental Scale ◽

Machine Learning Models

Extreme weather increases the risk of large-scale crop failure. The mechanisms involved are complex and intertwined, hence undermining the identification of simple adaptation levers to help improve the resilience of agricultural production. Based on more than 82 000 yield data reported at the regional level in 17 European countries, we assess how climate affected the yields of nine crop species. Using machine learning models, we analyzed historical yield data since 1901 and then focus on 2018, which has experienced a multiplicity and a diversity of atypical extreme climatic conditions. Machine learning models explain up to 65% of historical yield anomalies. We find that both extremes in temperature and precipitation are associated with negative yield anomalies, but with varying impacts in different parts of Europe. In 2018, Northern and Eastern Europe experienced multiple and simultaneous crop failures—among the highest observed in recent decades. These yield losses were associated with extremely low rainfalls in combination with high temperatures between March and August 2018. However, the higher than usual yields recorded in Southern Europe—caused by favourable spring rainfall conditions—nearly offset the large decrease in Northern European crop production. Our results outline the importance of considering single and compound climate extremes to analyse the causes of yield losses in Europe. We found no clear upward or downward trend in the frequency of extreme yield losses for any of the considered crops between 1990 and 2018. This article is part of the theme issue ‘Impacts of the 2018 severe drought and heatwave in Europe: from site to continental scale'.

Download Full-text