Statistical and machine learning models for optimizing energy in parallel applications

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text

Machine Learning Models for Predicting Attributes of Large-Scale Systems

46th AIAA Aerospace Sciences Meeting and Exhibit ◽

10.2514/6.2008-886 ◽

2008 ◽

Author(s):

Richard Selby

Keyword(s):

Machine Learning ◽

Large Scale ◽

Learning Models ◽

Large Scale Systems ◽

Machine Learning Models

Download Full-text

QUBO formulations for training machine learning models

Scientific Reports ◽

10.1038/s41598-021-89461-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Prasanna Date ◽

Davis Arthur ◽

Lauren Pusey-Nazzaro

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Large Scale ◽

Support Vector ◽

Quantum Computers ◽

Np Hard ◽

Learning Models ◽

Moore’S Law ◽

Moore's Law ◽

Machine Learning Models

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

A Physics-Infused Deep Learning Model for the Prediction of Refractive Indices and Its Use for the Large-Scale Screening of Organic Compound Space

10.26434/chemrxiv.8796950 ◽

2019 ◽

Author(s):

Mojtaba Haghighatlari ◽

Gaurav Vishwakarma ◽

Mohammad Atif Faiz Afzal ◽

Johannes Hachmann

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Organic Molecules ◽

Learning Model ◽

Training Data ◽

Refractive Indices ◽

Learning Models ◽

Deep Learning Model ◽

Machine Learning Models

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>

Download Full-text

Machine Learning Models for GPU Error Prediction in a Large Scale HPC System

2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) ◽

10.1109/dsn.2018.00022 ◽

2018 ◽

Cited By ~ 14

Author(s):

Bin Nie ◽

Ji Xue ◽

Saurabh Gupta ◽

Tirthak Patel ◽

Christian Engelmann ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Error Prediction ◽

Learning Models ◽

Machine Learning Models

Download Full-text

Impact of extreme weather conditions on European crop production in 2018

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2019.0510 ◽

2020 ◽

Vol 375 (1810) ◽

pp. 20190510 ◽

Cited By ~ 3

Author(s):

Damien Beillouin ◽

Bernhard Schauberger ◽

Ana Bastos ◽

Phillipe Ciais ◽

David Makowski

Keyword(s):

Machine Learning ◽

Crop Production ◽

Large Scale ◽

Extreme Weather ◽

Severe Drought ◽

Yield Losses ◽

Learning Models ◽

Yield Data ◽

Continental Scale ◽

Machine Learning Models

Extreme weather increases the risk of large-scale crop failure. The mechanisms involved are complex and intertwined, hence undermining the identification of simple adaptation levers to help improve the resilience of agricultural production. Based on more than 82 000 yield data reported at the regional level in 17 European countries, we assess how climate affected the yields of nine crop species. Using machine learning models, we analyzed historical yield data since 1901 and then focus on 2018, which has experienced a multiplicity and a diversity of atypical extreme climatic conditions. Machine learning models explain up to 65% of historical yield anomalies. We find that both extremes in temperature and precipitation are associated with negative yield anomalies, but with varying impacts in different parts of Europe. In 2018, Northern and Eastern Europe experienced multiple and simultaneous crop failures—among the highest observed in recent decades. These yield losses were associated with extremely low rainfalls in combination with high temperatures between March and August 2018. However, the higher than usual yields recorded in Southern Europe—caused by favourable spring rainfall conditions—nearly offset the large decrease in Northern European crop production. Our results outline the importance of considering single and compound climate extremes to analyse the causes of yield losses in Europe. We found no clear upward or downward trend in the frequency of extreme yield losses for any of the considered crops between 1990 and 2018. This article is part of the theme issue ‘Impacts of the 2018 severe drought and heatwave in Europe: from site to continental scale'.

Download Full-text

Validating the validation: reanalyzing a large-scale comparison of deep learning and machine learning models for bioactivity prediction

Journal of Computer-Aided Molecular Design ◽

10.1007/s10822-019-00274-0 ◽

2020 ◽

Vol 34 (7) ◽

pp. 717-730 ◽

Cited By ~ 9

Author(s):

Matthew C. Robinson ◽

Robert C. Glen ◽

Alpha A. Lee

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Numerical Experiments ◽

Large Scale ◽

Operating Characteristic ◽

Characteristic Curve ◽

Learning Models ◽

Bioactivity Prediction ◽

Operating Characteristic Curve ◽

Machine Learning Models

Abstract Machine learning methods may have the potential to significantly accelerate drug discovery. However, the increasing rate of new methodological approaches being published in the literature raises the fundamental question of how models should be benchmarked and validated. We reanalyze the data generated by a recently published large-scale comparison of machine learning models for bioactivity prediction and arrive at a somewhat different conclusion. We show that the performance of support vector machines is competitive with that of deep learning methods. Additionally, using a series of numerical experiments, we question the relevance of area under the receiver operating characteristic curve as a metric in virtual screening. We further suggest that area under the precision–recall curve should be used in conjunction with the receiver operating characteristic curve. Our numerical experiments also highlight challenges in estimating the uncertainty in model performance via scaffold-split nested cross validation.

Download Full-text

Forecasting residential gas consumption with machine learning algorithms on weather data

E3S Web of Conferences ◽

10.1051/e3sconf/201911105019 ◽

2019 ◽

Vol 111 ◽

pp. 05019

Author(s):

Brian de Keijzer ◽

Pol de Visser ◽

Víctor García Romillo ◽

Víctor Gómez Muñoz ◽

Daan Boesten ◽

...

Keyword(s):

Machine Learning ◽

Energy Consumption ◽

Energy Use ◽

Machine Learning Algorithms ◽

Weather Data ◽

Computational Time ◽

Percentage Error ◽

Learning Models ◽

Gas Consumption ◽

Machine Learning Models

Machine learning models have proven to be reliable methods in the forecasting of energy use in commercial and office buildings. However, little research has been done on energy forecasting in dwellings, mainly due to the difficulty of obtaining household level data while keeping the privacy of inhabitants in mind. Gaining insight into the energy consumption in the near future can be helpful in balancing the grid and insights in how to reduce the energy consumption can be received. In collaboration with OPSCHALER, a measurement campaign on the influence of housing characteristics on energy costs and comfort, several machine learning models were compared on forecasting performance and the computational time needed. Nine months of data containing the mean gas consumption of 52 dwellings on a one hour resolution was used for this research. The first 6 months were used for training, whereas the last 3 months were used to evaluate the models. The results showed that the Deep Neural Network (DNN) performed best with a 50.1 % Mean Absolute Percentage Error (MAPE) on a one hour resolution. When comparing daily and weekly resolutions, the Multivariate Linear Regression (MVLR) outperformed other models, with a 20.1 % and 17.0 % MAPE, respectively. The models were programmed in Python.

Download Full-text

Advanced Machine Learning Models for Large Scale Gene Expression Analysis in Cancer Classification: Deep Learning Versus Classical Models

Communications in Computer and Information Science - Big Data, Cloud and Applications ◽

10.1007/978-3-319-96292-4_17 ◽

2018 ◽

pp. 210-221

Author(s):

Imene Zenbout ◽

Souham Meshoul

Keyword(s):

Gene Expression ◽

Machine Learning ◽

Deep Learning ◽

Expression Analysis ◽

Large Scale ◽

Gene Expression Analysis ◽

Cancer Classification ◽

Learning Models ◽

Classical Models ◽

Machine Learning Models

Download Full-text

Building High Performance Explainable Machine Learning Models for Social Media-based Substance Use Prediction

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821302060009x ◽

2020 ◽

Vol 29 (03n04) ◽

pp. 2060009

Author(s):

Tao Ding ◽

Fatema Hasan ◽

Warren K. Bickel ◽

Shimei Pan

Keyword(s):

Machine Learning ◽

Social Media ◽

Substance Use ◽

Human Behavior ◽

High Performance ◽

Supervised Machine Learning ◽

Learning Models ◽

Wide Range ◽

And Behavior ◽

Machine Learning Models

Social media contain rich information that can be used to help understand human mind and behavior. Social media data, however, are mostly unstructured (e.g., text and image) and a large number of features may be needed to represent them (e.g., we may need millions of unigrams to represent social media texts). Moreover, accurately assessing human behavior is often difficult (e.g., assessing addiction may require medical diagnosis). As a result, the ground truth data needed to train a supervised human behavior model are often difficult to obtain at a large scale. To avoid overfitting, many state-of-the-art behavior models employ sophisticated unsupervised or self-supervised machine learning methods to leverage a large amount of unsupervised data for both feature learning and dimension reduction. Unfortunately, despite their high performance, these advanced machine learning models often rely on latent features that are hard to explain. Since understanding the knowledge captured in these models is important to behavior scientists and public health providers, we explore new methods to build machine learning models that are not only accurate but also interpretable. We evaluate the effectiveness of the proposed methods in predicting Substance Use Disorders (SUD). We believe the methods we proposed are general and applicable to a wide range of data-driven human trait and behavior analysis applications.

Download Full-text