scholarly journals Assessing Biotic and Abiotic Effects on Biodiversity Index Using Machine Learning

Forests ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 461
Author(s):  
Mahmoud Bayat ◽  
Harold Burkhart ◽  
Manouchehr Namiranian ◽  
Seyedeh Kosar Hamidi ◽  
Sahar Heidari ◽  
...  

Forest ecosystems play multiple important roles in meeting the habitat needs of different organisms and providing a variety of services to humans. Biodiversity is one of the structural features in dynamic and complex forest ecosystems. One of the most challenging issues in assessing forest ecosystems is understanding the relationship between biodiversity and environmental factors. The aim of this study was to investigate the effect of biotic and abiotic factors on tree diversity of Hyrcanian forests in northern Iran. For this purpose, we analyzed tree diversity in 8 forest sites in different locations from east to west of the Caspian Sea. 15,988 trees were measured in 655 circular permanent sample plots (0.1 ha). A combination of machine learning methods was used for modeling and investigating the relationship between tree diversity and biotic and abiotic factors. Machine learning models included generalized additive models (GAMs), support vector machine (SVM), random forest (RF) and K-nearest–neighbor (KNN). To determine the most important factors related to tree diversity we used from variables such as the average diameter at breast height (DBH) in the plot, basal area in largest trees (BAL), basal area (BA), number of trees per hectare, tree species, slope, aspect and elevation. A comparison of RMSEs, relative RMSEs, and the coefficients of determination of the different methods, showed that the random forest (RF) method resulted in the best models among all those tested. Based on the results of the RF method, elevation, BA and BAL were recognized as the most influential factors defining variation of tree diversity.

Author(s):  
M. Esfandiari ◽  
S. Jabari ◽  
H. McGrath ◽  
D. Coleman

Abstract. Flood is one of the most damaging natural hazards in urban areas in many places around the world as well as the city of Fredericton, New Brunswick, Canada. Recently, Fredericton has been flooded in two consecutive years in 2018 and 2019. Due to the complicated behaviour of water when a river overflows its bank, estimating the flood extent is challenging. The issue gets even more challenging when several different factors are affecting the water flow, like the land texture or the surface flatness, with varying degrees of intensity. Recently, machine learning algorithms and statistical methods are being used in many research studies for generating flood susceptibility maps using topographical, hydrological, and geological conditioning factors. One of the major issues that researchers have been facing is the complexity and the number of features required to input in a machine-learning algorithm to produce acceptable results. In this research, we used Random Forest to model the 2018 flood in Fredericton and analyzed the effect of several combinations of 12 different flood conditioning factors. The factors were tested against a Sentinel-2 optical satellite image available around the flood peak day. The highest accuracy was obtained using only 5 factors namely, altitude, slope, aspect, distance from the river, and land-use/cover with 97.57% overall accuracy and 95.14% kappa coefficient.


2021 ◽  
pp. 289-301
Author(s):  
B. Martín ◽  
J. González–Arias ◽  
J. A. Vicente–Vírseda

Our aim was to identify an optimal analytical approach for accurately predicting complex spatio–temporal patterns in animal species distribution. We compared the performance of eight modelling techniques (generalized additive models, regression trees, bagged CART, k–nearest neighbors, stochastic gradient boosting, support vector machines, neural network, and random forest –enhanced form of bootstrap. We also performed extreme gradient boosting –an enhanced form of radiant boosting– to predict spatial patterns in abundance of migrating Balearic shearwaters based on data gathered within eBird. Derived from open–source datasets, proxies of frontal systems and ocean productivity domains that have been previously used to characterize the oceanographic habitats of seabirds were quantified, and then used as predictors in the models. The random forest model showed the best performance according to the parameters assessed (RMSE value and R2). The correlation between observed and predicted abundance with this model was also considerably high. This study shows that the combination of machine learning techniques and massive data provided by open data sources is a useful approach for identifying the long–term spatial–temporal distribution of species at regional spatial scales.


Author(s):  
Shuxin Chen ◽  
Weimin Sun ◽  
Ying He

Abstract Measuring the stellar parameters of A-type stars is more difficult than FGK stars because of the sparse features in their spectra and the degeneracy between effective temperature (Teff ) and gravity (logg). Modeling the relationship between fundamental stellar parameters and features through Machine Learning is possible because we can employ the advantage of big data rather than sparse known features. As soon as the model is successfully trained, it can be an efficient approach for predicting Teff and logg for A-type stars especially when there is large uncertainty in the continuum caused by flux calibration or extinction. In this paper, A- type stars are selected from LAMOST DR7 with signal-to-noise ratio greater than 50 and the Teff ranging within 7000K to 8500K. We perform the Random Forest (RF) algorithm, one of the most widely used Machine Learning algorithms to establish the regressio,relationship between the flux of all wavelengths and their corresponding stellar parameters((Teff ) and (logg) respectively). The trained RF model not only can regress the stellar parameters but also can obtain the rank of the wavelength based on their sensibility to parameters.According to the rankings, we define line indices by merging adjacent wavelengths. The objectively defined line indices in this work are amendments to Lick indices including some weak lines. We use the Support Vector Regression algorithm based on our new defined line indices to measure the temperature and gravity and use some common stars from Simbad to evaluate our result. In addition, the Gaia HR diagram is used for checking the accuracy of Teff and logg.


2021 ◽  
Author(s):  
Luoshu He ◽  
Suhui Ma ◽  
Jiangling Zhu ◽  
Xinyu Xiong ◽  
Yangang Li ◽  
...  

Abstract Purpose The local microclimate of different slope aspects in the same area can not only impact soil environment and plant community but also affect soil microbial community. However, the relationship between aboveground plant communities and belowground soil microbial communities on various slope aspects has not been well understood.Methods We investigated the above- and belowground relationship on different slope aspects and explored how soil properties influence this relationship. Plant community attributes were evaluated by plant species richness and plant total basal area. Soil microbial community was assessed based on both 16S rRNA and ITS rRNA, using High-throughput Illumina sequencing. Results There was no significant correlation between plant richness and soil bacterial community composition on the north slope, but there was a positive correlation on the south slope and a significantly negative correlation on the flat site. There was a significantly negative correlation between soil fungal community composition and plant total basal area, which did not change with the slope aspect. In addition, there was no significant correlation between plant community species richness and soil microbial species richness.Conclusions In subalpine coniferous forests, the relationship between plant-soil bacteria varies with slope aspect, but the plant-soil fungi relationship is relatively consistent across different slope aspects. These results can improve our understanding of the relationship between plant and soil microorganisms in forest ecosystems under microtopographic changes and have important implications for the conservation of biodiversity and forest management in subalpine coniferous forests.


2020 ◽  
Vol 12 (11) ◽  
pp. 4748
Author(s):  
Minrui Zheng ◽  
Wenwu Tang ◽  
Akinwumi Ogundiran ◽  
Jianxin Yang

Settlement models help to understand the social–ecological functioning of landscape and associated land use and land cover change. One of the issues of settlement modeling is that models are typically used to explore the relationship between settlement locations and associated influential factors (e.g., slope and aspect). However, few studies in settlement modeling adopted landscape visibility analysis. Landscape visibility provides useful information for understanding human decision-making associated with the establishment of settlements. In the past years, machine learning algorithms have demonstrated their capabilities in improving the performance of the settlement modeling and particularly capturing the nonlinear relationship between settlement locations and their drivers. However, simulation models using machine learning algorithms in settlement modeling are still not well studied. Moreover, overfitting issues and optimization of model parameters are major challenges for most machine learning algorithms. Therefore, in this study, we sought to pursue two research objectives. First, we aimed to evaluate the contribution of viewsheds and landscape visibility to the simulation modeling of - settlement locations. The second objective is to examine the performance of the machine learning algorithm-based simulation models for settlement location studies. Our study region is located in the metropolitan area of Oyo Empire, Nigeria, West Africa, ca. AD 1570–1830, and its pre-Imperial antecedents, ca. AD 1360–1570. We developed an event-driven spatial simulation model enabled by random forest algorithm to represent dynamics in settlement systems in our study region. Experimental results demonstrate that viewsheds and landscape visibility may offer more insights into unveiling the underlying mechanism that drives settlement locations. Random forest algorithm, as a machine learning algorithm, provide solid support for establishing the relationship between settlement occurrences and their drivers.


2016 ◽  
Vol 74 (1) ◽  
pp. 102-111 ◽  
Author(s):  
Szymon Smoliński ◽  
Krzysztof Radtke

Marine spatial planning (MSP) is considered a valuable tool in the ecosystem-based management of marine areas. Predictive modelling may be applied in the MSP framework to obtain spatially explicit information about biodiversity patterns. The growing number of statistical approaches used for this purpose implies the urgent need for comparisons between different predictive techniques. In this study, we evaluated the performance of selected machine learning and regression-based methods that were applied for modelling fish community indices. We hypothesized that habitat features can influence fish assemblage and investigated the effect of environmental gradients on demersal fish diversity (species richness and Shannon–Weaver Index). We used fish data from the Baltic International Trawl Surveys (2001–2014) and maps of six potential predictors: bottom salinity, depth, seabed slope, growth season bottom temperature, seabed sediments and annual mean bottom current velocity. We compared the performance of six alternative modelling approaches: generalized linear models, generalized additive models, multivariate adaptive regression splines, support vector machines, boosted regression trees and random forests. We applied repeated 10-fold cross-validation, using accuracy as the measure of model quality. Finally, we selected random forest as the best performing algorithm and implemented it for the spatial prediction of fish diversity from the Baltic Proper to the Kattegat. To obtain information on the data reliability and confidence of the developed models, which are essential for MSP, we estimated the uncertainty of predictions with standard deviation of predictions obtained from all the trees in the ensemble random forest method. We showed how state-of-the-art predictive techniques, based on easily available data and simple Geographic Information System tools, can be used to obtain reliable spatial information about fish diversity. Our comparative work highlighted the potential of machine learning method to reduce prediction error in modelling of demersal fish diversity in the framework of MSP.


2021 ◽  
Vol 11 ◽  
Author(s):  
Jianyong Wu ◽  
Conghe Song ◽  
Eric A. Dubinsky ◽  
Jill R. Stewart

Current microbial source tracking techniques that rely on grab samples analyzed by individual endpoint assays are inadequate to explain microbial sources across space and time. Modeling and predicting host sources of microbial contamination could add a useful tool for watershed management. In this study, we tested and evaluated machine learning models to predict the major sources of microbial contamination in a watershed. We examined the relationship between microbial sources, land cover, weather, and hydrologic variables in a watershed in Northern California, United States. Six models, including K-nearest neighbors (KNN), Naïve Bayes, Support vector machine (SVM), simple neural network (NN), Random Forest, and XGBoost, were built to predict major microbial sources using land cover, weather and hydrologic variables. The results showed that these models successfully predicted microbial sources classified into two categories (human and non-human), with the average accuracy ranging from 69% (Naïve Bayes) to 88% (XGBoost). The area under curve (AUC) of the receiver operating characteristic (ROC) illustrated XGBoost had the best performance (average AUC = 0.88), followed by Random Forest (average AUC = 0.84), and KNN (average AUC = 0.74). The importance index obtained from Random Forest indicated that precipitation and temperature were the two most important factors to predict the dominant microbial source. These results suggest that machine learning models, particularly XGBoost, can predict the dominant sources of microbial contamination based on the relationship of microbial contaminants with daily weather and land cover, providing a powerful tool to understand microbial sources in water.


2021 ◽  
Vol 1 (1) ◽  
pp. 407-413
Author(s):  
Nur Heri Cahyana ◽  
Yuli Fauziah ◽  
Agus Sasmito Aribowo

This study aims to determine the best methods of tree-based ensemble machine learning to classify the datasets used, a total of 34 datasets. This study also wants to know the relationship between the number of records and columns of the test dataset with the number of estimators (trees) for each ensemble model, namely Random Forest, Extra Tree Classifier, AdaBoost, and Gradient Bosting. The four methods will be compared to the maximum accuracy and the number of estimators when tested to classify the test dataset. Based on the results of the experiments above, tree-based ensemble machine learning methods have been obtained and the best number of estimators for the classification of each dataset used in the study. The Extra Tree method is the best classifier method for binary-class and multi-class. Random Forest is good for multi-classes, and AdaBoost is a pretty good method for binary-classes. The number of rows, columns and data classes is positively correlated with the number of estimators. This means that to process a dataset with a large row, column or class size requires more estimators than processing a dataset with a small row, column or class size. However, the relationship between the number of classes and accuracy is negatively correlated, meaning that the accuracy will decrease if there are more classes for classification.


Sign in / Sign up

Export Citation Format

Share Document