extremely randomized trees
Recently Published Documents


TOTAL DOCUMENTS

61
(FIVE YEARS 40)

H-INDEX

10
(FIVE YEARS 6)

IEEE Access ◽  
2022 ◽  
pp. 1-1
Author(s):  
Amin Aminifar ◽  
Matin Shokri ◽  
Fazle Rabbi ◽  
Violet Ka I Pun ◽  
Yngve Lamo

2021 ◽  
Vol 3 (1) ◽  
Author(s):  
Roman Hornung

AbstractThe diversity forest algorithm is an alternative candidate node split sampling scheme that makes innovative complex split procedures in random forests possible. While conventional univariable, binary splitting suffices for obtaining strong predictive performance, new complex split procedures can help tackling practically important issues. For example, interactions between features can be exploited effectively by bivariable splitting. With diversity forests, each split is selected from a candidate split set that is sampled in the following way: for $$l = 1, \dots , {nsplits}$$ l = 1 , ⋯ , nsplits : (1) sample one split problem; (2) sample a single or few splits from the split problem sampled in (1) and add this or these splits to the candidate split set. The split problems are specifically structured collections of splits that depend on the respective split procedure considered. This sampling scheme makes innovative complex split procedures computationally tangible while avoiding overfitting. Important general properties of the diversity forest algorithm are evaluated empirically using univariable, binary splitting. Based on 220 data sets with binary outcomes, diversity forests are compared with conventional random forests and random forests using extremely randomized trees. It is seen that the split sampling scheme of diversity forests does not impair the predictive performance of random forests and that the performance is quite robust with regard to the specified nsplits value. The recently developed interaction forests are the first diversity forest method that uses a complex split procedure. Interaction forests allow modeling and detecting interactions between features effectively. Further potential complex split procedures are discussed as an outlook.


2021 ◽  
Author(s):  
Aya Hasan Alkhereibi ◽  
Tadesse Wakjira ◽  
Murat kucukvar ◽  
Uvais Qidwai ◽  
Deepti Muley ◽  
...  

Predicting metro ridership is an essential requirement for efficient metro operation and management. The dependence of metro ridership on the land use densities entails a need for an accurate predictive model. To this end, the current study is aimed to develop a novel machine learning (ML) based model to predict the metro station ridership utilizing the land use densities near metro stations. The ridership data was obtained from Qatar Rail, and the land use data were obtained from the Ministry of Municipality and Environment in Qatar. The land use densities in the catchment area of 800 m around the metro stations have been considered in this study. The non-linear relationship between the metro ridership and land use densities has been captured through different ensemble ML models including random forests, extremely randomized trees, and gradient tree boosting. Results showed that the ML models, once meticulously optimized and trained are capable of producing an accurate prediction for metro ridership. Among the ML models, gradient tree boosting showed the highest prediction capability. The authors concluded that the proposed prediction model can be utilized by both urban and transport planners in their processes to plan the land use around metro stations, predict the transit demand from those plans, and ultimately achieve the optimal use of the transit system i.e., Transit-Oriented Developments.


2021 ◽  
Vol 11 (14) ◽  
pp. 6435
Author(s):  
Dimitrios Stefas ◽  
Nikolaos Gyftokostas ◽  
Panagiotis Kourelias ◽  
Eleni Nanou ◽  
Vasileios Kokkinos ◽  
...  

In the present work, laser-induced breakdown spectroscopy, aided by some machine learning algorithms (i.e., linear discriminant analysis (LDA) and extremely randomized trees (ERT)), is used for the detection of honey adulteration with glucose syrup. In addition, it is shown that instead of the entire LIBS spectrum, the spectral lines of inorganic ingredients of honey (i.e., calcium, sodium, and potassium) can be also used for the detection of adulteration providing efficient discrimination. The constructed predictive models attained high classification accuracies exceeding 90% correct classification.


2021 ◽  
Vol 9 ◽  
Author(s):  
Mingzhu Tang ◽  
Yutao Chen ◽  
Huawei Wu ◽  
Qi Zhao ◽  
Wen Long ◽  
...  

The number of normal samples of wind turbine generators is much larger than the number of fault samples. To solve the problem of imbalanced classification in wind turbine generator fault detection, a cost-sensitive extremely randomized trees (CS-ERT) algorithm is proposed in this paper, in which the cost-sensitive learning method is introduced into an extremely randomized trees (ERT) algorithm. Based on the classification misclassification cost and class distribution, the misclassification cost gain (MCG) is proposed as the score measure of the CS-ERT model growth process to improve the classification accuracy of minority classes. The Hilbert-Schmidt independence criterion lasso (HSICLasso) feature selection method is used to select strongly correlated non-redundant features of doubly-fed wind turbine generators. The effectiveness of the method was verified by experiments on four different failure datasets of wind turbine generators. The experiment results show that average missing detection rate, average misclassification cost and gMean of the improved algorithm better than those of the ERT algorithm. In addition, compared with the CSForest, AdaCost and MetaCost methods, the proposed method has better real-time fault detection performance.


Sign in / Sign up

Export Citation Format

Share Document