QUBO formulations for training machine learning models

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

Machine learning predictive models of LDL-C in the population of eastern India and its comparison with directly measured and calculated LDL-C

Annals of Clinical Biochemistry International Journal of Laboratory Medicine ◽

10.1177/00045632211046805 ◽

2021 ◽

pp. 000456322110468

Author(s):

Anudeep P P ◽

Suchitra Kumari ◽

Aishvarya S Rajasimman ◽

Saurav Nayak ◽

Pooja Priyadarsini

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Random Forests ◽

Predictive Performance ◽

Support Vector ◽

Learning Models ◽

Complex Interactions ◽

Clinical Biochemistry Laboratory ◽

Study Laboratory ◽

Machine Learning Models

Background LDL-C is a strong risk factor for cardiovascular disorders. The formulas used to calculate LDL-C showed varying performance in different populations. Machine learning models can study complex interactions between the variables and can be used to predict outcomes more accurately. The current study evaluated the predictive performance of three machine learning models—random forests, XGBoost, and support vector Rregression (SVR) to predict LDL-C from total cholesterol, triglyceride, and HDL-C in comparison to linear regression model and some existing formulas for LDL-C calculation, in eastern Indian population. Methods The lipid profiles performed in the clinical biochemistry laboratory of AIIMS Bhubaneswar during 2019–2021, a total of 13,391 samples were included in the study. Laboratory results were collected from the laboratory database. 70% of data were classified as train set and used to develop the three machine learning models and linear regression formula. These models were tested in the rest 30% of the data (test set) for validation. Performance of models was evaluated in comparison to best six existing LDL-C calculating formulas. Results LDL-C predicted by XGBoost and random forests models showed a strong correlation with directly estimated LDL-C (r = 0.98). Two machine learning models performed superior to the six existing and commonly used LDL-C calculating formulas like Friedewald in the study population. When compared in different triglycerides strata also, these two models outperformed the other methods used. Conclusion Machine learning models like XGBoost and random forests can be used to predict LDL-C with more accuracy comparing to conventional linear regression LDL-C formulas.

Download Full-text

Classification Models for Bank Marketing Campaign: Towards Smart Bank Marketing

American Journal of Business and Operations Research ◽

10.54216/ajbor.050102 ◽

2021 ◽

pp. 21-30

Author(s):

Ahmad Freij ◽

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Support Vector ◽

Classification Models ◽

Learning Models ◽

Marketing Campaign ◽

Bank Marketing ◽

Machine Learning Models

In this paper, we have proposed two models of marketing classification which are Support Vector Machine (SVM) and Linear regression, these two models are the most popular and useful models of classification. In this paper, we represent how these two models are used for a case study of a bank marketing campaign, the dataset is related to a bank marketing campaign, and for Applying the machine learning models of classification, the RapidMiner software was used.

Download Full-text

Prediction of the Temperature of Liquid Aluminum and the Dissolved Hydrogen Content in Liquid Aluminum with a Machine Learning Approach

Metals ◽

10.3390/met10030330 ◽

2020 ◽

Vol 10 (3) ◽

pp. 330 ◽

Cited By ~ 1

Author(s):

Moon-Jo Kim ◽

Jong Pil Yun ◽

Ji-Ba-Reum Yang ◽

Seung-Jun Choi ◽

DongEung Kim

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Hydrogen Content ◽

Liquid Aluminum ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Window Method ◽

Dissolved Hydrogen ◽

Machine Learning Models

In aluminum casting, the temperature of liquid aluminum and the dissolved hydrogen density are crucial factors to be controlled for the purpose of both quality control of molten metal and cost efficiency. However, the empirical and numerical approaches to predict these parameters are quite complex and time consuming, and it is necessary to develop an alternative method for rapid prediction with a small number of experiments. In this study, the machine learning models were developed to predict the temperature of liquid aluminum and the dissolved hydrogen content in liquid aluminum. The obtained experimental data was preprocessed to be used for constructing the machine learning models by the sliding time window method. The machine learning models of linear regression, regression tree, Gaussian process regression (GPR), Support vector machine (SVM), and ensembles of regression trees were compared to find the model with the highest performance to predict the target properties. For the prediction of the temperature of liquid aluminum and the dissolved hydrogen content in liquid aluminum, the linear regression and GPR models were selected with the high accuracy of prediction, respectively. In comparison to the numerical modeling, the machine learning modeling had better performance, and was more effective for predicting the target property even with the limited data set when the characteristics of the data were properly considered in data preprocessing.

Download Full-text

Prediction of cancer incidence rates for the European continent using machine learning models

Health Informatics Journal ◽

10.1177/1460458220983878 ◽

2021 ◽

Vol 27 (1) ◽

pp. 146045822098387

Author(s):

Boran Sekeroglu ◽

Kubra Tuncal

Keyword(s):

Neural Network ◽

Colorectal Cancer ◽

Machine Learning ◽

Linear Regression ◽

Support Vector Regression ◽

Incidence Rates ◽

Support Vector ◽

Learning Models ◽

European Continent ◽

Machine Learning Models

Cancer is one of the most important and common public health problems on Earth that can occur in many different types. Treatments and precautions are aimed at minimizing the deaths caused by cancer; however, incidence rates continue to rise. Thus, it is important to analyze and estimate incidence rates to support the determination of more effective precautions. In this research, 2018 Cancer Datasheet of World Health Organization (WHO), is used and all countries on the European Continent are considered to analyze and predict the incidence rates until 2020, for Lung cancer, Breast cancer, Colorectal cancer, Prostate cancer and All types of cancer, which have highest incidence and mortality rates. Each cancer type is trained by six machine learning models namely, Linear Regression, Support Vector Regression, Decision Tree, Long-Short Term Memory neural network, Backpropagation neural network, and Radial Basis Function neural network according to gender types separately. Linear regression and support vector regression outperformed the other models with the [Formula: see text] scores 0.99 and 0.98, respectively, in initial experiments, and then used for prediction of incidence rates of the considered cancer types. The ML models estimated that the maximum rise of incidence rates would be in colorectal cancer for females by 6%.

Download Full-text

Predicting S&P 500 Market Price by Deep Neural Network and Enemble Model

E3S Web of Conferences ◽

10.1051/e3sconf/202021402040 ◽

2020 ◽

Vol 214 ◽

pp. 02040

Author(s):

Feiyu Wang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Deep Neural Network ◽

Market Price ◽

Support Vector ◽

Learning Models ◽

Conventional Machine ◽

Machine Learning Models

The method to predict the movement of stock market has appealed to scientists for decades. In this article, we use three different models to tackle that problem. In particular, we propose a Deep Neural Network (DNN) to predict the intraday direction of SP500 index and compare the DNN with two conventional machine learning models, i.e. linear regression, support vector machine. We demonstrate that DNN is able to predict SP500 index with relatively highest accuracy.

Download Full-text

Short-Term Electricity Generation Forecasting Using Machine Learning Algorithms: A Case Study of the Benin Electricity Community (C.E.B)

TH Wildau Engineering and Natural Sciences Proceedings ◽

10.52825/thwildauensp.v1i.25 ◽

2021 ◽

Vol 1 ◽

Author(s):

Agbassou Guenoupkati ◽

Adekunlé Akim Salami ◽

Mawugno Koffi Kodjo ◽

Kossi Napo

Keyword(s):

Machine Learning ◽

Time Series ◽

Linear Regression ◽

Performance Metrics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Models ◽

Short Term ◽

Machine Learning Models

Time series forecasting in the energy sector is important to power utilities for decision making to ensure the sustainability and quality of electricity supply, and the stability of the power grid. Unfortunately, the presence of certain exogenous factors such as weather conditions, electricity price complicate the task using linear regression models that are becoming unsuitable. The search for a robust predictor would be an invaluable asset for electricity companies. To overcome this difficulty, Artificial Intelligence differs from these prediction methods through the Machine Learning algorithms which have been performing over the last decades in predicting time series on several levels. This work proposes the deployment of three univariate Machine Learning models: Support Vector Regression, Multi-Layer Perceptron, and the Long Short-Term Memory Recurrent Neural Network to predict the electricity production of Benin Electricity Community. In order to validate the performance of these different methods, against the Autoregressive Integrated Mobile Average and Multiple Regression model, performance metrics were used. Overall, the results show that the Machine Learning models outperform the linear regression methods. Consequently, Machine Learning methods offer a perspective for short-term electric power generation forecasting of Benin Electricity Community sources.

Download Full-text

Epigenetic Target Prediction with Accurate Machine Learning Models

10.26434/chemrxiv.13522313 ◽

2021 ◽

Author(s):

Norberto Sánchez-Cruz ◽

Jose L. Medina-Franco

Keyword(s):

Machine Learning ◽

Small Molecules ◽

Predictive Models ◽

Large Scale ◽

Target Prediction ◽

Quantitative Measure ◽

Learning Models ◽

Discovery Research ◽

Drug Discovery Research ◽

Machine Learning Models

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Monitoring the Foliar Nutrients Status of Mango Using Spectroscopy-Based Spectral Indices and PLSR-Combined Machine Learning Models

Remote Sensing ◽

10.3390/rs13040641 ◽

2021 ◽

Vol 13 (4) ◽

pp. 641

Author(s):

Gopal Ramdas Mahajan ◽

Bappa Das ◽

Dayesh Murgaokar ◽

Ittai Herrmann ◽

Katja Berger ◽

...

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Partial Least Square ◽

Least Square ◽

Partial Least Square Regression ◽

Support Vector ◽

Spectral Indices ◽

Learning Models ◽

Leaf Nutrients ◽

Machine Learning Models

Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workflows for precision management of mango orchard nutrients.

Download Full-text

Machine learning models to identify low adherence to influenza vaccination among Korean adults with cardiovascular disease

BMC Cardiovascular Disorders ◽

10.1186/s12872-021-01925-7 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Moojung Kim ◽

Young Jae Kim ◽

Sung Jin Park ◽

Kwang Gi Kim ◽

Pyung Chun Oh ◽

...

Keyword(s):

Machine Learning ◽

Cardiovascular Disease ◽

Influenza Vaccination ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Age Group ◽

Learning Models ◽

Extreme Gradient Boosting ◽

Machine Learning Models

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.

Download Full-text