On the spatiotemporal generalization of machine learning and ensemble models for simulating built‐up land expansion

2021 ◽  
Author(s):  
Hossein Shafizadeh‐Moghadam ◽  
Roozbeh Valavi ◽  
Ali Asghari ◽  
Masoud Minaei ◽  
Yuji Murayama
2020 ◽  
Vol 287 (1920) ◽  
pp. 20192882 ◽  
Author(s):  
Maya Wardeh ◽  
Kieran J. Sharkey ◽  
Matthew Baylis

Diseases that spread to humans from animals, zoonoses, pose major threats to human health. Identifying animal reservoirs of zoonoses and predicting future outbreaks are increasingly important to human health and well-being and economic stability, particularly where research and resources are limited. Here, we integrate complex networks and machine learning approaches to develop a new approach to identifying reservoirs. An exhaustive dataset of mammal–pathogen interactions was transformed into networks where hosts are linked via their shared pathogens. We present a methodology for identifying important and influential hosts in these networks. Ensemble models linking network characteristics with phylogeny and life-history traits are then employed to predict those key hosts and quantify the roles they undertake in pathogen transmission. Our models reveal drivers explaining host importance and demonstrate how these drivers vary by pathogen taxa. Host importance is further integrated into ensemble models to predict reservoirs of zoonoses of various pathogen taxa and quantify the extent of pathogen sharing between humans and mammals. We establish predictors of reservoirs of zoonoses, showcasing host influence to be a key factor in determining these reservoirs. Finally, we provide new insight into the determinants of zoonosis-sharing, and contrast these determinants across major pathogen taxa.


Energies ◽  
2021 ◽  
Vol 14 (23) ◽  
pp. 7834
Author(s):  
Christopher Hecht ◽  
Jan Figgener ◽  
Dirk Uwe Sauer

Electric vehicles may reduce greenhouse gas emissions from individual mobility. Due to the long charging times, accurate planning is necessary, for which the availability of charging infrastructure must be known. In this paper, we show how the occupation status of charging infrastructure can be predicted for the next day using machine learning models— Gradient Boosting Classifier and Random Forest Classifier. Since both are ensemble models, binary training data (occupied vs. available) can be used to provide a certainty measure for predictions. The prediction may be used to adapt prices in a high-load scenario, predict grid stress, or forecast available power for smart or bidirectional charging. The models were chosen based on an evaluation of 13 different, typically used machine learning models. We show that it is necessary to know past charging station usage in order to predict future usage. Other features such as traffic density or weather have a limited effect. We show that a Gradient Boosting Classifier achieves 94.8% accuracy and a Matthews correlation coefficient of 0.838, making ensemble models a suitable tool. We further demonstrate how a model trained on binary data can perform non-binary predictions to give predictions in the categories “low likelihood” to “high likelihood”.


Sensors ◽  
2020 ◽  
Vol 20 (9) ◽  
pp. 2559 ◽  
Author(s):  
Celestine Iwendi ◽  
Suleman Khan ◽  
Joseph Henry Anajemba ◽  
Mohit Mittal ◽  
Mamdouh Alenezi ◽  
...  

The pursuit to spot abnormal behaviors in and out of a network system is what led to a system known as intrusion detection systems for soft computing besides many researchers have applied machine learning around this area. Obviously, a single classifier alone in the classifications seems impossible to control network intruders. This limitation is what led us to perform dimensionality reduction by means of correlation-based feature selection approach (CFS approach) in addition to a refined ensemble model. The paper aims to improve the Intrusion Detection System (IDS) by proposing a CFS + Ensemble Classifiers (Bagging and Adaboost) which has high accuracy, high packet detection rate, and low false alarm rate. Machine Learning Ensemble Models with base classifiers (J48, Random Forest, and Reptree) were built. Binary classification, as well as Multiclass classification for KDD99 and NSLKDD datasets, was done while all the attacks were named as an anomaly and normal traffic. Class labels consisted of five major attacks, namely Denial of Service (DoS), Probe, User-to-Root (U2R), Root to Local attacks (R2L), and Normal class attacks. Results from the experiment showed that our proposed model produces 0 false alarm rate (FAR) and 99.90% detection rate (DR) for the KDD99 dataset, and 0.5% FAR and 98.60% DR for NSLKDD dataset when working with 6 and 13 selected features.


2021 ◽  
Author(s):  
Mohamed A.M. Iesa ◽  
Abhinandan P Shirahatt ◽  
Harsha Sharma ◽  
Mohit Kumar Goyal ◽  
Amit Shrivastava ◽  
...  

Author(s):  
Wasiur Rhmann ◽  
Gufran Ahmad Ansari

Software engineering repositories have been attracted by researchers to mine useful information about the different quality attributes of the software. These repositories have been helpful to software professionals to efficiently allocate various resources in the life cycle of software development. Software fault prediction is a quality assurance activity. In fault prediction, software faults are predicted before actual software testing. As exhaustive software testing is impossible, the use of software fault prediction models can help the proper allocation of testing resources. Various machine learning techniques have been applied to create software fault prediction models. In this study, ensemble models are used for software fault prediction. Change metrics-based data are collected for an open-source android project from GIT repository and code-based metrics data are obtained from PROMISE data repository and datasets kc1, kc2, cm1, and pc1 are used for experimental purpose. Results showed that ensemble models performed better compared to machine learning and hybrid search-based algorithms. Bagging ensemble was found to be more effective in the prediction of faults in comparison to soft and hard voting.


2020 ◽  
Vol 391 ◽  
pp. 282-291 ◽  
Author(s):  
Juan Jesús Ruiz-Aguilar ◽  
Daniel Urda ◽  
José Antonio Moscoso-López ◽  
Javier González-Enrique ◽  
Ignacio J. Turias

2018 ◽  
Vol 20 (6) ◽  
pp. 2185-2199 ◽  
Author(s):  
Yanju Zhang ◽  
Ruopeng Xie ◽  
Jiawei Wang ◽  
André Leier ◽  
Tatiana T Marquez-Lago ◽  
...  

AbstractAs a newly discovered post-translational modification (PTM), lysine malonylation (Kmal) regulates a myriad of cellular processes from prokaryotes to eukaryotes and has important implications in human diseases. Despite its functional significance, computational methods to accurately identify malonylation sites are still lacking and urgently needed. In particular, there is currently no comprehensive analysis and assessment of different features and machine learning (ML) methods that are required for constructing the necessary prediction models. Here, we review, analyze and compare 11 different feature encoding methods, with the goal of extracting key patterns and characteristics from residue sequences of Kmal sites. We identify optimized feature sets, with which four commonly used ML methods (random forest, support vector machines, K-nearest neighbor and logistic regression) and one recently proposed [Light Gradient Boosting Machine (LightGBM)] are trained on data from three species, namely, Escherichia coli, Mus musculus and Homo sapiens, and compared using randomized 10-fold cross-validation tests. We show that integration of the single method-based models through ensemble learning further improves the prediction performance and model robustness on the independent test. When compared to the existing state-of-the-art predictor, MaloPred, the optimal ensemble models were more accurate for all three species (AUC: 0.930, 0.923 and 0.944 for E. coli, M. musculus and H. sapiens, respectively). Using the ensemble models, we developed an accessible online predictor, kmal-sp, available at http://kmalsp.erc.monash.edu/. We hope that this comprehensive survey and the proposed strategy for building more accurate models can serve as a useful guide for inspiring future developments of computational methods for PTM site prediction, expedite the discovery of new malonylation and other PTM types and facilitate hypothesis-driven experimental validation of novel malonylated substrates and malonylation sites.


2020 ◽  
Author(s):  
Roberto Silva ◽  
Bruna Barreira ◽  
Fernando Xavier ◽  
Antonio Saraiva ◽  
Carlos Cugnasca

The COVID-19 pandemics will impact the demand for healthcare severely. It is essential to continually monitor and predict the expected number of new cases for each country. We explored the use of econometrics, machine learning, and ensemble models to predict the number of new cases per day for Brazil, China, Italy, and South Korea. These models can be used to make predictions in the short term, complementing the epidemiological models. Our main findings were: (i) there is no single best model for all countries; (ii) ensembles can, in some instances, improve the results of individual models; and (iii) the ML models had worse results due to the lack of data.


2021 ◽  
Vol 13 (14) ◽  
pp. 2678
Author(s):  
Haixiao Ge ◽  
Fei Ma ◽  
Zhenwang Li ◽  
Zhengzheng Tan ◽  
Changwen Du

Accurate and timely detection of phenology at plot scale in rice breeding trails is crucial for understanding the heterogeneity of varieties and guiding field management. Traditionally, remote sensing studies of phenology detection have heavily relied on the time-series vegetation index (VI) data. However, the methodology based on time-series VI data was often limited by the temporal resolution. In this study, three types of ensemble models including hard voting (majority voting), soft voting (weighted majority voting) and model stacking, were proposed to identify the principal phenological stages of rice based on unmanned aerial vehicle (UAV) RGB imagery. These ensemble models combined RGB-VIs, color space (e.g., RGB and HSV) and textures derived from UAV-RGB imagery, and five machine learning algorithms (random forest; k-nearest neighbors; Gaussian naïve Bayes; support vector machine and logistic regression) as base models to estimate phenological stages in rice breeding. The phenological estimation models were trained on the dataset of late-maturity cultivars and tested independently on the dataset of early-medium-maturity cultivars. The results indicated that all ensemble models outperform individual machine learning models in all datasets. The soft voting strategy provided the best performance for identifying phenology with the overall accuracy of 90% and 93%, and the mean F1-scores of 0.79 and 0.81, respectively, in calibration and validation datasets, which meant that the overall accuracy and mean F1-scores improved by 5% and 7%, respectively, in comparison with those of the best individual model (GNB), tested in this study. Therefore, the ensemble models demonstrated great potential in improving the accuracy of phenology detection in rice breeding.


Sign in / Sign up

Export Citation Format

Share Document