Machine learning predictive models of LDL-C in the population of eastern India and its comparison with directly measured and calculated LDL-C

Background LDL-C is a strong risk factor for cardiovascular disorders. The formulas used to calculate LDL-C showed varying performance in different populations. Machine learning models can study complex interactions between the variables and can be used to predict outcomes more accurately. The current study evaluated the predictive performance of three machine learning models—random forests, XGBoost, and support vector Rregression (SVR) to predict LDL-C from total cholesterol, triglyceride, and HDL-C in comparison to linear regression model and some existing formulas for LDL-C calculation, in eastern Indian population. Methods The lipid profiles performed in the clinical biochemistry laboratory of AIIMS Bhubaneswar during 2019–2021, a total of 13,391 samples were included in the study. Laboratory results were collected from the laboratory database. 70% of data were classified as train set and used to develop the three machine learning models and linear regression formula. These models were tested in the rest 30% of the data (test set) for validation. Performance of models was evaluated in comparison to best six existing LDL-C calculating formulas. Results LDL-C predicted by XGBoost and random forests models showed a strong correlation with directly estimated LDL-C (r = 0.98). Two machine learning models performed superior to the six existing and commonly used LDL-C calculating formulas like Friedewald in the study population. When compared in different triglycerides strata also, these two models outperformed the other methods used. Conclusion Machine learning models like XGBoost and random forests can be used to predict LDL-C with more accuracy comparing to conventional linear regression LDL-C formulas.

Download Full-text

QUBO formulations for training machine learning models

Scientific Reports ◽

10.1038/s41598-021-89461-4 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Prasanna Date ◽

Davis Arthur ◽

Lauren Pusey-Nazzaro

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Large Scale ◽

Support Vector ◽

Quantum Computers ◽

Np Hard ◽

Learning Models ◽

Moore’S Law ◽

Moore's Law ◽

Machine Learning Models

AbstractTraining machine learning models on classical computers is usually a time and compute intensive process. With Moore’s law nearing its inevitable end and an ever-increasing demand for large-scale data analysis using machine learning, we must leverage non-conventional computing paradigms like quantum computing to train machine learning models efficiently. Adiabatic quantum computers can approximately solve NP-hard problems, such as the quadratic unconstrained binary optimization (QUBO), faster than classical computers. Since many machine learning problems are also NP-hard, we believe adiabatic quantum computers might be instrumental in training machine learning models efficiently in the post Moore’s law era. In order to solve problems on adiabatic quantum computers, they must be formulated as QUBO problems, which is very challenging. In this paper, we formulate the training problems of three machine learning models—linear regression, support vector machine (SVM) and balanced k-means clustering—as QUBO problems, making them conducive to be trained on adiabatic quantum computers. We also analyze the computational complexities of our formulations and compare them to corresponding state-of-the-art classical approaches. We show that the time and space complexities of our formulations are better (in case of SVM and balanced k-means clustering) or equivalent (in case of linear regression) to their classical counterparts.

Download Full-text

Development of Combined Heavy Rain Damage Prediction Models with Machine Learning

Water ◽

10.3390/w11122516 ◽

2019 ◽

Vol 11 (12) ◽

pp. 2516 ◽

Cited By ~ 1

Author(s):

Changhyun Choi ◽

Jeonghwan Kim ◽

Jungwook Kim ◽

Hung Soo Kim

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Prediction Model ◽

Prediction Models ◽

Predictive Performance ◽

Heavy Rain ◽

Learning Models ◽

Damage Prediction ◽

Natural Disaster Management ◽

Machine Learning Models

Adequate forecasting and preparation for heavy rain can minimize life and property damage. Some studies have been conducted on the heavy rain damage prediction model (HDPM), however, most of their models are limited to the linear regression model that simply explains the linear relation between rainfall data and damage. This study develops the combined heavy rain damage prediction model (CHDPM) where the residual prediction model (RPM) is added to the HDPM. The predictive performance of the CHDPM is analyzed to be 4–14% higher than that of HDPM. Through this, we confirmed that the predictive performance of the model is improved by combining the RPM of the machine learning models to complement the linearity of the HDPM. The results of this study can be used as basic data beneficial for natural disaster management.

Download Full-text

News Article Text Classification and Summary for Authors and Topics

10.5121/csit.2020.101401 ◽

2020 ◽

Author(s):

Aviel J. Stein ◽

Janith Weerasinghe ◽

Spiros Mancoridis ◽

Rachel Greenstadt

Keyword(s):

Machine Learning ◽

Random Forests ◽

Text Classification ◽

Authorship Attribution ◽

News Article ◽

Support Vector ◽

The Internet ◽

Original Text ◽

Learning Models ◽

Machine Learning Models

News articles are important for providing timely, historic information. However, the Internet is replete with text that may contain irrelevant or unhelpful information, therefore means of processing it and distilling content is important and useful to human readers as well as information extracting tools. Some common questions we may want to answer are “what is this article about?” and “who wrote it?”. In this work we compare machine learning models for evaluating two common NLP tasks, topic and authorship attribution, on the 2017 Vox Media dataset. Additionally, we use the models to classify on a subsection, about ~20%, of the original text which show to be better for classification than the provided blurbs. Because of the large number of topics, we take into account topic overlap and address it via top-n accuracy and hierarchical groupings of topics. We also consider edge cases in authorship by classifying on inter-topic and intra-topic author distributions. Our results show that both topics and authors readily identifiable consistently perform best when using neural networks rather than support vector, random forests, or naive Bayes classifiers, although the latter methods perform acceptably.

Download Full-text

Empirical asset pricing via machine learning: evidence from the European stock market

Journal of Asset Management ◽

10.1057/s41260-021-00237-x ◽

2021 ◽

Author(s):

Wolfgang Drobetz ◽

Tizian Otto

Keyword(s):

Machine Learning ◽

Stock Returns ◽

Network Architecture ◽

Risk Measures ◽

Predictive Performance ◽

Support Vector ◽

Learning Models ◽

Learning Methods ◽

Machine Learning Methods ◽

Machine Learning Models

AbstractThis paper evaluates the predictive performance of machine learning methods in forecasting European stock returns. Compared to a linear benchmark model, interactions and nonlinear effects help improve the predictive performance. But machine learning models must be adequately trained and tuned to overcome the high dimensionality problem and to avoid overfitting. Across all machine learning methods, the most important predictors are based on price trends and fundamental signals from valuation ratios. However, the models exhibit substantial variation in statistical predictive performance that translate into pronounced differences in economic profitability. The return and risk measures of long-only trading strategies indicate that machine learning models produce sizeable gains relative to our benchmark. Neural networks perform best, also after accounting for transaction costs. A classification-based portfolio formation, utilizing a support vector machine that avoids estimating stock-level expected returns, performs even better than the neural network architecture.

Download Full-text

Classification Models for Bank Marketing Campaign: Towards Smart Bank Marketing

American Journal of Business and Operations Research ◽

10.54216/ajbor.050102 ◽

2021 ◽

pp. 21-30

Author(s):

Ahmad Freij ◽

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Support Vector ◽

Classification Models ◽

Learning Models ◽

Marketing Campaign ◽

Bank Marketing ◽

Machine Learning Models

In this paper, we have proposed two models of marketing classification which are Support Vector Machine (SVM) and Linear regression, these two models are the most popular and useful models of classification. In this paper, we represent how these two models are used for a case study of a bank marketing campaign, the dataset is related to a bank marketing campaign, and for Applying the machine learning models of classification, the RapidMiner software was used.

Download Full-text

Prediction of the Temperature of Liquid Aluminum and the Dissolved Hydrogen Content in Liquid Aluminum with a Machine Learning Approach

Metals ◽

10.3390/met10030330 ◽

2020 ◽

Vol 10 (3) ◽

pp. 330 ◽

Cited By ~ 1

Author(s):

Moon-Jo Kim ◽

Jong Pil Yun ◽

Ji-Ba-Reum Yang ◽

Seung-Jun Choi ◽

DongEung Kim

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Hydrogen Content ◽

Liquid Aluminum ◽

Support Vector ◽

Learning Models ◽

Data Set ◽

Window Method ◽

Dissolved Hydrogen ◽

Machine Learning Models

In aluminum casting, the temperature of liquid aluminum and the dissolved hydrogen density are crucial factors to be controlled for the purpose of both quality control of molten metal and cost efficiency. However, the empirical and numerical approaches to predict these parameters are quite complex and time consuming, and it is necessary to develop an alternative method for rapid prediction with a small number of experiments. In this study, the machine learning models were developed to predict the temperature of liquid aluminum and the dissolved hydrogen content in liquid aluminum. The obtained experimental data was preprocessed to be used for constructing the machine learning models by the sliding time window method. The machine learning models of linear regression, regression tree, Gaussian process regression (GPR), Support vector machine (SVM), and ensembles of regression trees were compared to find the model with the highest performance to predict the target properties. For the prediction of the temperature of liquid aluminum and the dissolved hydrogen content in liquid aluminum, the linear regression and GPR models were selected with the high accuracy of prediction, respectively. In comparison to the numerical modeling, the machine learning modeling had better performance, and was more effective for predicting the target property even with the limited data set when the characteristics of the data were properly considered in data preprocessing.

Download Full-text

Prediction of cancer incidence rates for the European continent using machine learning models

Health Informatics Journal ◽

10.1177/1460458220983878 ◽

2021 ◽

Vol 27 (1) ◽

pp. 146045822098387

Author(s):

Boran Sekeroglu ◽

Kubra Tuncal

Keyword(s):

Neural Network ◽

Colorectal Cancer ◽

Machine Learning ◽

Linear Regression ◽

Support Vector Regression ◽

Incidence Rates ◽

Support Vector ◽

Learning Models ◽

European Continent ◽

Machine Learning Models

Cancer is one of the most important and common public health problems on Earth that can occur in many different types. Treatments and precautions are aimed at minimizing the deaths caused by cancer; however, incidence rates continue to rise. Thus, it is important to analyze and estimate incidence rates to support the determination of more effective precautions. In this research, 2018 Cancer Datasheet of World Health Organization (WHO), is used and all countries on the European Continent are considered to analyze and predict the incidence rates until 2020, for Lung cancer, Breast cancer, Colorectal cancer, Prostate cancer and All types of cancer, which have highest incidence and mortality rates. Each cancer type is trained by six machine learning models namely, Linear Regression, Support Vector Regression, Decision Tree, Long-Short Term Memory neural network, Backpropagation neural network, and Radial Basis Function neural network according to gender types separately. Linear regression and support vector regression outperformed the other models with the [Formula: see text] scores 0.99 and 0.98, respectively, in initial experiments, and then used for prediction of incidence rates of the considered cancer types. The ML models estimated that the maximum rise of incidence rates would be in colorectal cancer for females by 6%.

Download Full-text

Evaluation of machine learning algorithms for prediction of 5-year survivability for seven types of cancers

10.32469/10355/86544 ◽

2020 ◽

Author(s):

◽

Teja Venkat Pavan Kotapati

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Logistic Regression ◽

Decision Trees ◽

Random Forests ◽

Naive Bayes ◽

Support Vector ◽

Multi Layer Perceptron ◽

Learning Models ◽

Machine Learning Models

A lot of research on prediction of cancer survivability has been done by implementing various machine learning models and it has always been a challenging task. In this project, the main focus is to perform a comprehensive evaluation of machine learning models across multiple cancer cohorts and find the models with better prediction capability. Class balancing techniques like oversampling and undersampling were implemented into the models to improve the performance of cancer survival prediction. SEER cancer dataset (1973-2015) was used for this project. After preprocessing, we included a total of 21 independent variables and a dependent variable. Multiple machine learning models like Decision Trees, Logistic Regression, Naive Bayes, Support Vector Machine, Random Forests and Multi-Layer Perceptron were implemented. Bias between training and testing data was eliminated by implementing stratified 10-fold crossvalidation. The experimental design was in such a way that all the machine learning models were implemented across seven cancer cohorts using all eligible records each cohort as well as using two sampling techniques for class balancing. Performance of the machine learning models were compared based on the metrics like Sensitivity, Accuracy, Specificity, Precision, F1 score and AUC scores. A total of 168 experimental models were designed and implemented. Comparison between the predictive models showed that Random Forests have best predicted for cancer survivability, Support Vector Machine came as second-best predictors, Logistic Regression as third, then Decision Trees, Multi-Layer Perceptron and lastly Naive Bayes with least performance. The results clearly indicated that implementing class balancing techniques also improved the performance of the models significantly.

Download Full-text

Predicting S&P 500 Market Price by Deep Neural Network and Enemble Model

E3S Web of Conferences ◽

10.1051/e3sconf/202021402040 ◽

2020 ◽

Vol 214 ◽

pp. 02040

Author(s):

Feiyu Wang

Keyword(s):

Neural Network ◽

Machine Learning ◽

Support Vector Machine ◽

Linear Regression ◽

Deep Neural Network ◽

Market Price ◽

Support Vector ◽

Learning Models ◽

Conventional Machine ◽

Machine Learning Models

The method to predict the movement of stock market has appealed to scientists for decades. In this article, we use three different models to tackle that problem. In particular, we propose a Deep Neural Network (DNN) to predict the intraday direction of SP500 index and compare the DNN with two conventional machine learning models, i.e. linear regression, support vector machine. We demonstrate that DNN is able to predict SP500 index with relatively highest accuracy.

Download Full-text

Short-Term Electricity Generation Forecasting Using Machine Learning Algorithms: A Case Study of the Benin Electricity Community (C.E.B)

TH Wildau Engineering and Natural Sciences Proceedings ◽

10.52825/thwildauensp.v1i.25 ◽

2021 ◽

Vol 1 ◽

Author(s):

Agbassou Guenoupkati ◽

Adekunlé Akim Salami ◽

Mawugno Koffi Kodjo ◽

Kossi Napo

Keyword(s):

Machine Learning ◽

Time Series ◽

Linear Regression ◽

Performance Metrics ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Models ◽

Short Term ◽

Machine Learning Models

Time series forecasting in the energy sector is important to power utilities for decision making to ensure the sustainability and quality of electricity supply, and the stability of the power grid. Unfortunately, the presence of certain exogenous factors such as weather conditions, electricity price complicate the task using linear regression models that are becoming unsuitable. The search for a robust predictor would be an invaluable asset for electricity companies. To overcome this difficulty, Artificial Intelligence differs from these prediction methods through the Machine Learning algorithms which have been performing over the last decades in predicting time series on several levels. This work proposes the deployment of three univariate Machine Learning models: Support Vector Regression, Multi-Layer Perceptron, and the Long Short-Term Memory Recurrent Neural Network to predict the electricity production of Benin Electricity Community. In order to validate the performance of these different methods, against the Autoregressive Integrated Mobile Average and Multiple Regression model, performance metrics were used. Overall, the results show that the Machine Learning models outperform the linear regression methods. Consequently, Machine Learning methods offer a perspective for short-term electric power generation forecasting of Benin Electricity Community sources.

Download Full-text