Comparison of support vector regression and extreme gradient boosting for decomposition-based data-driven 10-day streamflow forecasting

2020 ◽  
Vol 582 ◽  
pp. 124293 ◽  
Author(s):  
Xiang Yu ◽  
Yuhao Wang ◽  
Lifeng Wu ◽  
Genhua Chen ◽  
Lei Wang ◽  
...  
Author(s):  
Simon Wenninger ◽  
Christian Wiethe

AbstractTo achieve ambitious climate goals, it is necessary to increase the rate of purposeful retrofit measures in the building sector. As a result, Energy Performance Certificates have been designed as important evaluation and rating criterion to increase the retrofit rate in the EU and Germany. Yet, today’s most frequently used and legally required methods to quantify building energy performance show low prediction accuracy, as recent research reveals. To enhance prediction accuracy, the research community introduced data-driven methods which obtained promising results. However, there are no insights in how far Energy Quantification Methods are particularly suited for energy performance prediction. In this research article the data-driven methods Artificial Neural Network, D-vine copula quantile regression, Extreme Gradient Boosting, Random Forest, and Support Vector Regression are compared with and validated by real-world Energy Performance Certificates of German residential buildings issued by qualified auditors using the engineering method required by law. The results, tested for robustness and systematic bias, show that all data-driven methods exceed the engineering method by almost 50% in terms of prediction accuracy. In contrast to existing literature favoring Artificial Neural Networks and Support Vector Regression, all tested methods show similar prediction accuracy with marginal advantages for Extreme Gradient Boosting and Support Vector Regression in terms of prediction accuracy. Given the higher prediction accuracy of data-driven methods, it seems appropriate to revise the current legislation prescribing engineering methods. In addition, data-driven methods could support different organizations, e.g., asset management, in decision-making in order to reduce financial risk and to cut expenses.


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 300
Author(s):  
Eslam A. Hussein ◽  
Christopher Thron ◽  
Mehrdad Ghaziasgar ◽  
Antoine Bagula ◽  
Mattia Vaccari

Predicting groundwater availability is important to water sustainability and drought mitigation. Machine-learning tools have the potential to improve groundwater prediction, thus enabling resource planners to: (1) anticipate water quality in unsampled areas or depth zones; (2) design targeted monitoring programs; (3) inform groundwater protection strategies; and (4) evaluate the sustainability of groundwater sources of drinking water. This paper proposes a machine-learning approach to groundwater prediction with the following characteristics: (i) the use of a regression-based approach to predict full groundwater images based on sequences of monthly groundwater maps; (ii) strategic automatic feature selection (both local and global features) using extreme gradient boosting; and (iii) the use of a multiplicity of machine-learning techniques (extreme gradient boosting, multivariate linear regression, random forests, multilayer perceptron and support vector regression). Of these techniques, support vector regression consistently performed best in terms of minimizing root mean square error and mean absolute error. Furthermore, including a global feature obtained from a Gaussian Mixture Model produced models with lower error than the best which could be obtained with local geographical features.


2020 ◽  
Author(s):  
Joaquin Salas ◽  
Dagoberto Pulido ◽  
Omar Montoya ◽  
Isaac Ruiz

Knowing the most likely clinical prognosis for a patient infected with SARS-Cov-2 could offer guidelines for tracking their medical evolution, improving attention, and assigning resources. Aiming to assess a patient's status quantitatively, we explore the analysis of existing clinical information using data-driven methods. Our goal is to extract the characteristics distinguishing between those COVID-19 patients that improve and those who die. In our approach, we select the relevant features using the algorithm of Boruta, a wrapper framework that takes input from classifiers generating relevance assessment of the predictors. Using the extracted features, we train machine learning classifiers, including Random Forests, Support Vector Machine, Extreme Gradient Boosting, and Neural Networks. We assess the performance of the classifiers using Precision-Recall and ROC analysis, establishing the ranges at which risk assessment permits effective decision-making. Our research highlights that local regions present unique sets of essential features, that it is possible to construct effective classifiers based on clinical data, and that an ensemble of classifiers results in the best performing discriminant.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Moojung Kim ◽  
Young Jae Kim ◽  
Sung Jin Park ◽  
Kwang Gi Kim ◽  
Pyung Chun Oh ◽  
...  

Abstract Background Annual influenza vaccination is an important public health measure to prevent influenza infections and is strongly recommended for cardiovascular disease (CVD) patients, especially in the current coronavirus disease 2019 (COVID-19) pandemic. The aim of this study is to develop a machine learning model to identify Korean adult CVD patients with low adherence to influenza vaccination Methods Adults with CVD (n = 815) from a nationally representative dataset of the Fifth Korea National Health and Nutrition Examination Survey (KNHANES V) were analyzed. Among these adults, 500 (61.4%) had answered "yes" to whether they had received seasonal influenza vaccinations in the past 12 months. The classification process was performed using the logistic regression (LR), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGB) machine learning techniques. Because the Ministry of Health and Welfare in Korea offers free influenza immunization for the elderly, separate models were developed for the < 65 and ≥ 65 age groups. Results The accuracy of machine learning models using 16 variables as predictors of low influenza vaccination adherence was compared; for the ≥ 65 age group, XGB (84.7%) and RF (84.7%) have the best accuracies, followed by LR (82.7%) and SVM (77.6%). For the < 65 age group, SVM has the best accuracy (68.4%), followed by RF (64.9%), LR (63.2%), and XGB (61.4%). Conclusions The machine leaning models show comparable performance in classifying adult CVD patients with low adherence to influenza vaccination.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Arturo Moncada-Torres ◽  
Marissa C. van Maaren ◽  
Mathijs P. Hendriks ◽  
Sabine Siesling ◽  
Gijs Geleijnse

AbstractCox Proportional Hazards (CPH) analysis is the standard for survival analysis in oncology. Recently, several machine learning (ML) techniques have been adapted for this task. Although they have shown to yield results at least as good as classical methods, they are often disregarded because of their lack of transparency and little to no explainability, which are key for their adoption in clinical settings. In this paper, we used data from the Netherlands Cancer Registry of 36,658 non-metastatic breast cancer patients to compare the performance of CPH with ML techniques (Random Survival Forests, Survival Support Vector Machines, and Extreme Gradient Boosting [XGB]) in predicting survival using the $$c$$ c -index. We demonstrated that in our dataset, ML-based models can perform at least as good as the classical CPH regression ($$c$$ c -index $$\sim \,0.63$$ ∼ 0.63 ), and in the case of XGB even better ($$c$$ c -index $$\sim 0.73$$ ∼ 0.73 ). Furthermore, we used Shapley Additive Explanation (SHAP) values to explain the models’ predictions. We concluded that the difference in performance can be attributed to XGB’s ability to model nonlinearities and complex interactions. We also investigated the impact of specific features on the models’ predictions as well as their corresponding insights. Lastly, we showed that explainable ML can generate explicit knowledge of how models make their predictions, which is crucial in increasing the trust and adoption of innovative ML techniques in oncology and healthcare overall.


Risks ◽  
2021 ◽  
Vol 9 (11) ◽  
pp. 202
Author(s):  
Ge Gao ◽  
Hongxin Wang ◽  
Pengbin Gao

In China, SMEs are facing financing difficulties, and commercial banks and financial institutions are the main financing channels for SMEs. Thus, a reasonable and efficient credit risk assessment system is important for credit markets. Based on traditional statistical methods and AI technology, a soft voting fusion model, which incorporates logistic regression, support vector machine (SVM), random forest (RF), eXtreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM), is constructed to improve the predictive accuracy of SMEs’ credit risk. To verify the feasibility and effectiveness of the proposed model, we use data from 123 SMEs nationwide that worked with a Chinese bank from 2016 to 2020, including financial information and default records. The results show that the accuracy of the soft voting fusion model is higher than that of a single machine learning (ML) algorithm, which provides a theoretical basis for the government to control credit risk in the future and offers important references for banks to make credit decisions.


Protein-Protein Interactions referred as PPIs perform significant role in biological functions like cell metabolism, immune response, signal transduction etc. Hot spots are small fractions of residues in interfaces and provide substantial binding energy in PPIs. Therefore, identification of hot spots is important to discover and analyze molecular medicines and diseases. The current strategy, alanine scanning isn't pertinent to enormous scope applications since the technique is very costly and tedious. The existing computational methods are poor in classification performance as well as accuracy in prediction. They are concerned with the topological structure and gene expression of hub proteins. The proposed system focuses on hot spots of hub proteins by eliminating redundant as well as highly correlated features using Pearson Correlation Coefficient and Support Vector Machine based feature elimination. Extreme Gradient boosting and LightGBM algorithms are used to ensemble a set of weak classifiers to form a strong classifier. The proposed system shows better accuracy than the existing computational methods. The model can also be used to predict accurate molecular inhibitors for specific PPIs


2021 ◽  
Author(s):  
Leila Zahedi ◽  
Farid Ghareh Mohammadi ◽  
M. Hadi Amini

Machine learning techniques lend themselves as promising decision-making and analytic tools in a wide range of applications. Different ML algorithms have various hyper-parameters. In order to tailor an ML model towards a specific application, a large number of hyper-parameters should be tuned. Tuning the hyper-parameters directly affects the performance (accuracy and run-time). However, for large-scale search spaces, efficiently exploring the ample number of combinations of hyper-parameters is computationally challenging. Existing automated hyper-parameter tuning techniques suffer from high time complexity. In this paper, we propose HyP-ABC, an automatic innovative hybrid hyper-parameter optimization algorithm using the modified artificial bee colony approach, to measure the classification accuracy of three ML algorithms, namely random forest, extreme gradient boosting, and support vector machine. Compared to the state-of-the-art techniques, HyP-ABC is more efficient and has a limited number of parameters to be tuned, making it worthwhile for real-world hyper-parameter optimization problems. We further compare our proposed HyP-ABC algorithm with state-of-the-art techniques. In order to ensure the robustness of the proposed method, the algorithm takes a wide range of feasible hyper-parameter values, and is tested using a real-world educational dataset.


2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Hengrui Chen ◽  
Hong Chen ◽  
Ruiyu Zhou ◽  
Zhizhen Liu ◽  
Xiaoke Sun

The safety issue has become a critical obstacle that cannot be ignored in the marketization of autonomous vehicles (AVs). The objective of this study is to explore the mechanism of AV-involved crashes and analyze the impact of each feature on crash severity. We use the Apriori algorithm to explore the causal relationship between multiple factors to explore the mechanism of crashes. We use various machine learning models, including support vector machine (SVM), classification and regression tree (CART), and eXtreme Gradient Boosting (XGBoost), to analyze the crash severity. Besides, we apply the Shapley Additive Explanations (SHAP) to interpret the importance of each factor. The results indicate that XGBoost obtains the best result (recall = 75%; G-mean = 67.82%). Both XGBoost and Apriori algorithm effectively provided meaningful insights about AV-involved crash characteristics and their relationship. Among all these features, vehicle damage, weather conditions, accident location, and driving mode are the most critical features. We found that most rear-end crashes are conventional vehicles bumping into the rear of AVs. Drivers should be extremely cautious when driving in fog, snow, and insufficient light. Besides, drivers should be careful when driving near intersections, especially in the autonomous driving mode.


Sign in / Sign up

Export Citation Format

Share Document