Ranger Random Forest-Based Efficient Ensemble Learning Approach for Detecting Malicious URLs

Author(s):  
G. Madhukar Rao ◽  
Dharavath Ramesh
Agronomy ◽  
2021 ◽  
Vol 11 (5) ◽  
pp. 915
Author(s):  
Farrah Melissa Muharam ◽  
Khairudin Nurulhuda ◽  
Zed Zulkafli ◽  
Mohamad Arif Tarmizi ◽  
Asniyani Nur Haidar Abdullah ◽  
...  

Rapid, accurate and inexpensive methods are required to analyze plant traits throughout all crop growth stages for plant phenotyping. Few studies have comprehensively evaluated plant traits from multispectral cameras onboard UAV platforms. Additionally, machine learning algorithms tend to over- or underfit data and limited attention has been paid to optimizing their performance through an ensemble learning approach. This study aims to (1) comprehensively evaluate twelve rice plant traits estimated from aerial unmanned vehicle (UAV)-based multispectral images and (2) introduce Random Forest AdaBoost (RFA) algorithms as an optimization approach for estimating plant traits. The approach was tested based on a farmer’s field in Terengganu, Malaysia, for the off-season from February to June 2018, involving five rice cultivars and three nitrogen (N) rates. Four bands, thirteen indices and Random Forest-AdaBoost (RFA) regression models were evaluated against the twelve plant traits according to the growth stages. Among the plant traits, plant height, green leaf and storage organ biomass, and foliar nitrogen (N) content were estimated well, with a coefficient of determination (R2) above 0.80. In comparing the bands and indices, red, Normalized Difference Vegetation Index (NDVI), Ratio Vegetation Index (RVI), Red-Edge Wide Dynamic Range Vegetation Index (REWDRVI) and Red-Edge Soil Adjusted Vegetation Index (RESAVI) were remarkable in estimating all plant traits at tillering, booting and milking stages with R2 values ranging from 0.80–0.99 and root mean square error (RMSE) values ranging from 0.04–0.22. Milking was found to be the best growth stage to conduct estimations of plant traits. In summary, our findings demonstrate that an ensemble learning approach can improve the accuracy as well as reduce under/overfitting in plant phenotyping algorithms.


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1603 ◽  
Author(s):  
Fatemeh Behjati Ardakani ◽  
Florian Schmidt ◽  
Marcel H. Schulz

Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs). Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups. Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier applied to the data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal. Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697).


F1000Research ◽  
2019 ◽  
Vol 7 ◽  
pp. 1603 ◽  
Author(s):  
Fatemeh Behjati Ardakani ◽  
Florian Schmidt ◽  
Marcel H. Schulz

Background: Understanding the location and cell-type specific binding of Transcription Factors (TFs) is important in the study of gene regulation. Computational prediction of TF binding sites is challenging, because TFs often bind only to short DNA motifs and cell-type specific co-factors may work together with the same TF to determine binding. Here, we consider the problem of learning a general model for the prediction of TF binding using DNase1-seq data and TF motif description in form of position specific energy matrices (PSEMs). Methods: We use TF ChIP-seq data as a gold-standard for model training and evaluation. Our contribution is a novel ensemble learning approach using random forest classifiers. In the context of the ENCODE-DREAM in vivo TF binding site prediction challenge we consider different learning setups. Results: Our results indicate that the ensemble learning approach is able to better generalize across tissues and cell-types compared to individual tissue-specific classifiers or a classifier built based upon data aggregated across tissues. Furthermore, we show that incorporating DNase1-seq peaks is essential to reduce the false positive rate of TF binding predictions compared to considering the raw DNase1 signal. Conclusions: Analysis of important features reveals that the models preferentially select motifs of other TFs that are close interaction partners in existing protein protein-interaction networks. Code generated in the scope of this project is available on GitHub: https://github.com/SchulzLab/TFAnalysis (DOI: 10.5281/zenodo.1409697).


Synlett ◽  
2020 ◽  
Author(s):  
Akira Yada ◽  
Kazuhiko Sato ◽  
Tarojiro Matsumura ◽  
Yasunobu Ando ◽  
Kenji Nagata ◽  
...  

AbstractThe prediction of the initial reaction rate in the tungsten-catalyzed epoxidation of alkenes by using a machine learning approach is demonstrated. The ensemble learning framework used in this study consists of random sampling with replacement from the training dataset, the construction of several predictive models (weak learners), and the combination of their outputs. This approach enables us to obtain a reasonable prediction model that avoids the problem of overfitting, even when analyzing a small dataset.


Sign in / Sign up

Export Citation Format

Share Document