scholarly journals Contrasting determinants for the introduction and establishment success of exotic birds in Taiwan using decision trees models

PeerJ ◽  
2017 ◽  
Vol 5 ◽  
pp. e3092 ◽  
Author(s):  
Shih-Hsiung Liang ◽  
Bruno Andreas Walther ◽  
Bao-Sen Shieh

Background Biological invasions have become a major threat to biodiversity, and identifying determinants underlying success at different stages of the invasion process is essential for both prevention management and testing ecological theories. To investigate variables associated with different stages of the invasion process in a local region such as Taiwan, potential problems using traditional parametric analyses include too many variables of different data types (nominal, ordinal, and interval) and a relatively small data set with too many missing values. Methods We therefore used five decision tree models instead and compared their performance. Our dataset contains 283 exotic bird species which were transported to Taiwan; of these 283 species, 95 species escaped to the field successfully (introduction success); of these 95 introduced species, 36 species reproduced in the field of Taiwan successfully (establishment success). For each species, we collected 22 variables associated with human selectivity and species traits which may determine success during the introduction stage and establishment stage. For each decision tree model, we performed three variable treatments: (I) including all 22 variables, (II) excluding nominal variables, and (III) excluding nominal variables and replacing ordinal values with binary ones. Five performance measures were used to compare models, namely, area under the receiver operating characteristic curve (AUROC), specificity, precision, recall, and accuracy. Results The gradient boosting models performed best overall among the five decision tree models for both introduction and establishment success and across variable treatments. The most important variables for predicting introduction success were the bird family, the number of invaded countries, and variables associated with environmental adaptation, whereas the most important variables for predicting establishment success were the number of invaded countries and variables associated with reproduction. Discussion Our final optimal models achieved relatively high performance values, and we discuss differences in performance with regard to sample size and variable treatments. Our results showed that, for both the establishment model and introduction model, the number of invaded countries was the most important or second most important determinant, respectively. Therefore, we suggest that future success for introduction and establishment of exotic birds may be gauged by simply looking at previous success in invading other countries. Finally, we found that species traits related to reproduction were more important in establishment models than in introduction models; importantly, these determinants were not averaged but either minimum or maximum values of species traits. Therefore, we suggest that in addition to averaged values, reproductive potential represented by minimum and maximum values of species traits should be considered in invasion studies.

2008 ◽  
Vol 12 (3) ◽  
Author(s):  
Jozef Zurada ◽  
Peng C. Lam

For many years lenders have been using traditional statistical techniques such as logistic regression and discriminant analysis to more precisely distinguish between creditworthy customers who are granted loans and non-creditworthy customers who are denied loans. More recently new machine learning techniques such as neural networks, decision trees, and support vector machines have been successfully employed to classify loan applicants into those who are likely to pay a loan off or default upon a loan. Accurate classification is beneficial to lenders in terms of increased financial profits or reduced losses and to loan applicants who can avoid overcommitment. This paper examines a historical data set from consumer loans issued by a German bank to individuals whom the bank considered to be qualified customers. The data set consists of the financial attributes of each customer and includes a mixture of loans that the customers paid off or defaulted upon. The paper examines and compares the classification accuracy rates of three decision tree techniques as well as analyzes their ability to generate easy to understand rules.


Geophysics ◽  
2020 ◽  
Vol 85 (4) ◽  
pp. WA147-WA158
Author(s):  
Kaibo Zhou ◽  
Jianyu Zhang ◽  
Yusong Ren ◽  
Zhen Huang ◽  
Luanxiao Zhao

Lithology identification based on conventional well-logging data is of great importance for geologic features characterization and reservoir quality evaluation in the exploration and production development of petroleum reservoirs. However, there are some limitations in the traditional lithology identification process: (1) It is very time consuming to build a model so that it cannot realize real-time lithology identification during well drilling, (2) it must be modeled by experienced geologists, which consumes a lot of manpower and material resources, and (3) the imbalance of labeled data in well-log data may reduce the classification performance of the model. We have developed a gradient boosting decision tree (GBDT) algorithm combining synthetic minority oversampling technique (SMOTE) to realize fast and automatic lithology identification. First, the raw well-log data are normalized by maximum and minimum normalization algorithm. Then, SMOTE is adopted to balance the number of samples in each class in training process. Next, a lithology identification model is built by GBDT to fit the preprocessed training data set. Finally, the built model is verified with the testing data set. The experimental results indicate that the proposed approach improves the lithology identification performance compared with other machine-learning approaches.


2009 ◽  
Vol 15 (12) ◽  
pp. 2852-2860 ◽  
Author(s):  
TIM M. BLACKBURN ◽  
PHILLIP CASSEY ◽  
JULIE L. LOCKWOOD

Energies ◽  
2019 ◽  
Vol 12 (13) ◽  
pp. 2522 ◽  
Author(s):  
Mengting Yao ◽  
Yun Zhu ◽  
Junjie Li ◽  
Hua Wei ◽  
Penghui He

Line loss rate plays an essential role in evaluating the economic operation of power systems. However, in a low voltage (LV) distribution network, calculating line loss rate has become more cumbersome due to poor configuration of the measuring and detecting device, the difficulty in collecting operational data, and the excessive number of components and nodes. Most previous studies mainly focused on the approaches to calculate or predict line loss rate, but rarely involve the evaluation of the prediction results. In this paper, we propose an approach based on a gradient boosting decision tree (GBDT), to predict line loss rate. GBDT inherits the advantages of both statistical models and AI approaches, and can identify the complex and nonlinear relationship while computing the relative importance among variables. An empirical study on a data set in a city demonstrates that our proposed approach performs well in predicting line loss rate, given a large number of unlabeled examples. Experiments and analysis also confirmed the effectiveness of our proposed approach in anomaly detection and practical project management.


Symmetry ◽  
2018 ◽  
Vol 10 (9) ◽  
pp. 386 ◽  
Author(s):  
Walaa Alajali ◽  
Wei Zhou ◽  
Sheng Wen ◽  
Yu Wang

Traffic prediction is a critical task for intelligent transportation systems (ITS). Prediction at intersections is challenging as it involves various participants, such as vehicles, cyclists, and pedestrians. In this paper, we propose a novel approach for the accurate intersection traffic prediction by introducing extra data sources other than road traffic volume data into the prediction model. In particular, we take advantage of the data collected from the reports of road accidents and roadworks happening near the intersections. In addition, we investigate two types of learning schemes, namely batch learning and online learning. Three popular ensemble decision tree models are used in the batch learning scheme, including Gradient Boosting Regression Trees (GBRT), Random Forest (RF) and Extreme Gradient Boosting Trees (XGBoost), while the Fast Incremental Model Trees with Drift Detection (FIMT-DD) model is adopted for the online learning scheme. The proposed approach is evaluated using public data sets released by the Victorian Government of Australia. The results indicate that the accuracy of intersection traffic prediction can be improved by incorporating nearby accidents and roadworks information.


2021 ◽  
Vol 2021 ◽  
pp. 1-13
Author(s):  
Ciyun Lin ◽  
Hongli Zhang ◽  
Bowen Gong ◽  
Dayong Wu

Traffic safety is affected by many complex factors. Mind wandering (MW) is a fatal cause affecting driving safety and is hard to be detected and prevented due to its uncertain and complex occurrence mechanism. The aim of this study was to propose a framework for analyzing and predicting MW based on readily available driving status data. The data used in this study are the single-trip information collected by the questionnaire, which includes drivers’ personal characteristics, contextual information in which MW occurs, and in-vehicle environmental factors. After investigating the extent of factors that influence MW, these chosen factors are used to forecast MW. Based on these results, we select factors reliable to be obtained in real life to forecast MW. To verify that the new factors explored are useful in improving the forecast accuracy, the compared analysis is conducted with the results found by our approach and the existing approaches. We compare results obtained by four machine-learning-enabled forecasting approaches on a real-life data set. The result shows that the factors found in this paper can significantly improve forecast accuracy. The confusion matrix, ROC curves, and AUC are conducted, and the performance of the gradient boosting decision tree algorithm is better than other forecast approaches. The importance rankings of most factors obtained by the Gradient Boosting Decision Tree and questionnaire are the same.


2018 ◽  
Vol 7 (2.28) ◽  
pp. 337
Author(s):  
Jason Gierman ◽  
Oliver Strong ◽  
Gongzhu Hu ◽  
. .

Student retention is an issue of high priority for many colleges and universities. Keeping students in school is the very basic condition for them to achieve their goals for going to colleges in the first place. A lot of research and practices have been done across institutions to improve student retention rates, but colleges and universities are still trying to figure out what are the factors that are most important to student retention. In this paper, we present our experiments of building predictive models, particularly decision tree models, to fit in the overall prediction of full time student retention. The data set of 1,965 cases from 1987 to 2000 obtained from the Delta Cost Project Database of the American Institutes for Research has 541 variables. We used variable selection measures like R-Squared to reduce to 45 variables and build decision tree models to fit the training data. Eight variables were identified to be most influential to the retention rates. Our experiments show that the decision trees with moderate depth are suitable for creating retention model. 


2020 ◽  
Vol 2020 ◽  
pp. 1-13
Author(s):  
Licheng Qu ◽  
Minghao Zhang ◽  
Zhaolu Li ◽  
Wei Li

As a typical time series, the length of the data sequence is critical to the accuracy of traffic state prediction. In order to fully explore the causality between traffic data, this study established a temporal backtracking and multistep delay model based on recurrent neural networks (RNNs) to learn and extract the long- and short-term dependencies of the traffic state data. With a real traffic data set, the coordinate descent algorithm was employed to search and determine the optimal backtracking length of traffic sequence, and multistep delay predictions were performed to demonstrate the relationship between delay steps and prediction accuracies. Besides, the performances were compared between three variants of RNNs (LSTM, GRU, and BiLSTM) and 6 frequently used models, which are decision tree (DT), support vector machine (SVM), k-nearest neighbour (KNN), random forest (RF), gradient boosting decision tree (GBDT), and stacked autoencoder (SAE). The prediction results of 10 consecutive delay steps suggest that the accuracies of RNNs are far superior to those of other models because of the more powerful and accurate pattern representing ability in time series. It is also proved that RNNs can learn and mine longer time dependencies.


Sign in / Sign up

Export Citation Format

Share Document