boosted decision trees
Recently Published Documents


TOTAL DOCUMENTS

117
(FIVE YEARS 69)

H-INDEX

14
(FIVE YEARS 4)

2022 ◽  
pp. 270-292
Author(s):  
Luca Di Persio ◽  
Alberto Borelli

The chapter developed a tree-based method for credit scoring. It is useful because it helps lenders decide whether to grant or reject credit to their applicants. In particular, it proposes a credit scoring model based on boosted decision trees which is a technique consisting of an ensemble of several decision trees to form a single classifier. The analysis used three different publicly available datasets, and then the prediction accuracy of boosted decision trees is compared with the one of support vector machines method.


2021 ◽  
Vol 16 (12) ◽  
pp. C12007
Author(s):  
K. Leonard DeHolton

Abstract The DeepCore sub-array within the IceCube Neutrino Observatory is a densely instrumented region of Antarctic ice designed to observe atmospheric neutrino interactions above 5 GeV via Cherenkov radiation. An essential aspect of any neutrino oscillation analysis is the ability to accurately identify the flavor of neutrino events in the detector. This task is particularly difficult at low energies when very little light is deposited in the detector. Here we discuss the use of machine learning to perform event classification at low energies in IceCube using a boosted decision tree (BDT). A BDT is trained using reconstructed quantities to identify track-like events, which result from muon neutrino charged current interactions. This new method improves the accuracy of particle identification compared to traditional classification methods which rely on univariate straight cuts.


2021 ◽  
Author(s):  
Katarzyna Filus ◽  
Sławomir Nowak ◽  
Joanna Domańska ◽  
Jakub Duda

Abstract Indoor environments are a major challenge in the domain of location-based services due to the inability to use GPS. Currently, Bluetooth Low Energy has been the most commonly used technology for such services due to its low cost, low power consumption, ubiquitous availability in smartphones and the dependence of the signal strength on the distance between devices. The article proposes a system that detects the proximity of a moving object with respect to static points (anchors), evaluates the quality of this prediction and filters out the unreliable results based on custom metrics. We define three metrics: two matrics based on RSSI and Intertial Measurement Unit (IMU) readings and one joint metric. This way the filtering is based on both, the external information (RSSI) and the internal information (IMU). To process the IMU data, we use machine learning activity recognition models (we apply feature selection and compare three models and choose the best one-Gradient Boosted Decision Trees). The proposed system is flexible and can be easily customized. The great majority of operations can be conducted directly on smartphones. The solution is easy to implement, cost-efficient and can be deployed in real-life applications (MICE industry, museums, industry).


2021 ◽  
Vol 11 (22) ◽  
pp. 11076
Author(s):  
Xabier Cid Vidal ◽  
Lorena Dieste Maroñas ◽  
Álvaro Dosil Suárez

The popularity of Machine Learning (ML) has been increasing in recent decades in almost every area, with the commercial and scientific fields being the most notorious ones. In particle physics, ML has been proven a useful resource to make the most of projects such as the Large Hadron Collider (LHC). The main advantage provided by ML is a reduction in the time and effort required for the measurements carried out by experiments, and improvements in the performance. With this work we aim to encourage scientists working with particle colliders to use ML and to try the different alternatives that are available, focusing on the separation of signal and background. We assess some of the most-used libraries in the field, such as Toolkit for Multivariate Data Analysis with ROOT, and also newer and more sophisticated options such as PyTorch and Keras. We also assess the suitability of some of the most common algorithms for signal-background discrimination, such as Boosted Decision Trees, and propose the use of others, namely Neural Networks. We compare the overall performance of different algorithms and libraries in simulated LHC data and produce some guidelines to help analysts deal with different situations. Examples include the use of low or high-level features from particle detectors or the amount of statistics that are available for training the algorithms. Our main conclusion is that the algorithms and libraries used more frequently at LHC collaborations might not always be those that provide the best results for the classification of signal candidates, and fully connected Neural Networks trained with Keras can improve the performance scores in most of the cases we formulate.


Geophysics ◽  
2021 ◽  
pp. 1-55
Author(s):  
Ian Gottschalk ◽  
Rosemary Knight

The ability to relate geophysical measurements to the material properties of the subsurface is fundamental to the successful application of geophysical methods. Estimating the electrical resistivity from material properties can be challenging at many hydrogeologic field sites, which typically lack the spatial density and resolution of the measurements needed to develop an accurate rock physics relationship. We developed rock physics transforms using the machine learning method of gradient-boosted decision trees (GBDT). We adopted as our study area the coastal Salinas Valley, where saltwater intrusion results in changes in resistivity. We used measurements available in boreholes, including salinity and sediment type, to predict the resistivity. In some transforms, we included as predictors in the GBDT algorithm the location of each measurement and the aquifer corresponding to each measurement. We also explored incorporating the predictions of a baseline rock physics transform as a prior term within the objective function of the GBDT algorithm to guide the predictions made by the machine learning algorithm. The use of location and aquifer information improved the predictions of the GBDT transform by 28% compared to when location and aquifer information were not included. After the salinity, the easting of each measurement was the most important predictor, due to the spatial pattern of salinity changes in the area. The next most important predictor was the aquifer corresponding to each measurement. The benefit of including the baseline transform in the objective function was greatest for small datasets and when the accuracy of the baseline transform was already high. Finally, using the resistivity predicted by the GBDT, we generated 1-D resistivity models, which we used to simulate the acquisition of airborne electromagnetic (AEM) data. In most cases, the 1-D resistivity models and corresponding AEM data matched well with the models and data corresponding to the resistivity measured in boreholes.


2021 ◽  
Vol 2069 (1) ◽  
pp. 012230
Author(s):  
Mikael Salonvaara ◽  
Seungjae Lee ◽  
Emishaw Iffa ◽  
Philip Boudreaux ◽  
Simon Pallin ◽  
...  

Abstract Hygrothermal simulations provide insight into the energy performance and moisture durability of building envelope components under dynamic conditions. The inputs required for hygrothermal simulations are extensive, and carrying out simulations and analyses requires expert knowledge. An expert system, the Building Science Advisor (BSA), has been developed to predict the performance and select the energy-efficient and durable building envelope systems for different climates. The BSA consists of decision rules based on expert opinions and thousands of parametric simulation results for selected wall systems. The number of potential wall systems results in millions, too many to simulate all of them. We present how machine learning can help predict durability data, such as mold growth, while minimizing the number of simulations needed to run. The simulation results are used for training and validation of machine learning tools for predicting wall durability. We tested Artificial Neural Network (ANN) and Gradient Boosted Decision Trees (GBDT) for their applicability and model accuracy. Models developed with both methods showed adequate prediction performance (root mean square error of 0.195 and 0.209, respectively). Finally, we introduce how the information supports guidance for envelope design via an easy-to-use web-based tool that does not require the end-user to run hygrothermal simulations.


2021 ◽  
Vol 2069 (1) ◽  
pp. 012107
Author(s):  
B Delcroix ◽  
S Sansregret ◽  
G Larochelle Martin ◽  
A Daoud

Abstract The building sector is responsible for approximately one-third of the total energy consumption, worldwide. This sector is undergoing a major digital transformation, buildings being more and more equipped with connected devices such as smart meters and IoT devices. This transformation offers the opportunity to better monitor and optimize building operations. In the province of Quebec (Canada), most buildings are equipped with smart meters providing electricity usage data every 15 minutes. A current major challenge is to disaggregate the different energy use from smart meter data, a discipline called non-intrusive load monitoring in literature. In this work, the aim is to develop and validate a potentially generalizable model for all houses that identifies the daily share of each energy use based on building information, weather data and smart meter data. Input features are selected and ordered using an aggregated score composed of the correlation coefficient, the feature importance given by a decision tree, and the predictive power score. Two modelling methods based on quantile regression are tested: linear regression (LR) and gradient boosted decision trees (GBDT). Compared to ordinary least squares regression, quantile methods inherently provide more robustness and confidence intervals. Both models are trained and validated using separate datasets collected in 8 houses in Canada where metering and sub-metering were performed during a whole year. Results on the test dataset indicate a better performance of the GBDT model, compared to the LR model, with a coefficient of determination of 0.88 (vs. 0.78), a mean absolute error of 6.34 % (vs. 8.89 %) and a maximum absolute error between the actual and predicted values in 95 % of the cases of 17.2 % (vs. 23.1 %).


2021 ◽  
Vol 11 (20) ◽  
pp. 9769
Author(s):  
Huilin Zheng ◽  
Syed Waseem Abbas Sherazi ◽  
Sang Hyeok Son ◽  
Jong Yun Lee

Wafer maps provide engineers with important information about the root causes of failures during the semiconductor manufacturing process. Through the efficient recognition of the wafer map failure pattern type, the semiconductor manufacturing process and its product performance can be improved, as well as reducing the product cost. Therefore, this paper proposes an accurate model for the automatic recognition of wafer map failure types using a deep learning-based convolutional neural network (DCNN). For this experiment, we use WM811K, which is an open-source real-time wafer map dataset containing wafer map images of nine failure classes. Our research contents can be briefly summarized as follows. First, we use random sampling to extract 500 images from each class of the original image dataset. Then we propose a deep convolutional neural network model to generate a multi-class classification model. Lastly, we evaluate the performance of the proposed prediction model and compare it with three other popular machine learning-based models—logistic regression, random forest, and gradient boosted decision trees—and several well-known deep learning models—VGGNet, ResNet, and EfficientNet. Consequently, the comprehensive analysis showed that the performance of the proposed DCNN model outperformed those of other popular machine learning and deep learning-based prediction models.


Sign in / Sign up

Export Citation Format

Share Document