scholarly journals A Combined Strategy of Improved Variable Selection and Ensemble Algorithm to Map the Growing Stem Volume of Planted Coniferous Forest

2021 ◽  
Vol 13 (22) ◽  
pp. 4631
Author(s):  
Xiaodong Xu ◽  
Hui Lin ◽  
Zhaohua Liu ◽  
Zilin Ye ◽  
Xinyu Li ◽  
...  

Remote sensing technology is becoming mainstream for mapping the growing stem volume (GSV) and overcoming the shortage of traditional labor-consumed approaches. Naturally, the GSV estimation accuracy utilizing remote sensing imagery is highly related to the variable selection methods and algorithms. Thus, to reduce the uncertainty caused by variables and models, this paper proposes a combined strategy involving improved variable selection with the collinearity test and the secondary ensemble algorithm to obtain the optimally combined variables and extract a reliable GSV from several base models. Our study extracted four types of alternative variables from the Sentinel-1A and Sentinel-2A image datasets, including vegetation indices, spectral reflectance variables, backscattering coefficients, and texture features. Then, an improved variable selection criterion with the collinearity test was developed and evaluated based on machine learning algorithms (classification and regression trees (CART), k-nearest neighbors (KNN), support vector regression (SVR), and artificial neural network (ANN)) considering the correlation between variables and GSV (with random forest (RF), distance correlation coefficient (DC), maximal information coefficient (MIC), and Pearson correlation coefficient (PCC) as evaluation metrics), and the collinearity among the variables. Additionally, we proposed a secondary ensemble with an improved weighted average approach (IWA) to estimate the reliable forest GSV using the first ensemble models constructed by Bagging and AdaBoost. The experimental results demonstrated that the proposed variable selection criterion efficiently obtained the optimal combined variable set without affecting the forest GSV mapping accuracy. Specifically, considering the first ensemble, the relative root mean square error (rRMSE) values ranged from 21.91% to 30.28% for Bagging and 23.33% to 31.49% for AdaBoost, respectively. After the secondary ensemble involving the IWA, the rRMSE values ranged from 18.89% to 21.34%. Furthermore, the variance of the GSV mapped by the secondary ensemble with various ranking methods was significantly reduced. The results prove that the proposed combined strategy has great potential to reduce the GSV mapping uncertainty imposed by current variable selection approaches and algorithms.

2021 ◽  
Author(s):  
Xiaotong Zhu ◽  
Jinhui Jeanne Huang

<p>Remote sensing monitoring has the characteristics of wide monitoring range, celerity, low cost for long-term dynamic monitoring of water environment. With the flourish of artificial intelligence, machine learning has enabled remote sensing inversion of seawater quality to achieve higher prediction accuracy. However, due to the physicochemical property of the water quality parameters, the performance of algorithms differs a lot. In order to improve the predictive accuracy of seawater quality parameters, we proposed a technical framework to identify the optimal machine learning algorithms using Sentinel-2 satellite and in-situ seawater sample data. In the study, we select three algorithms, i.e. support vector regression (SVR), XGBoost and deep learning (DL), and four seawater quality parameters, i.e. dissolved oxygen (DO), total dissolved solids (TDS), turbidity(TUR) and chlorophyll-a (Chla). The results show that SVR is a more precise algorithm to inverse DO (R<sup>2</sup> = 0.81). XGBoost has the best accuracy for Chla and Tur inversion (R<sup>2</sup> = 0.75 and 0.78 respectively) while DL performs better in TDS (R<sup>2</sup> =0.789). Overall, this research provides a theoretical support for high precision remote sensing inversion of offshore seawater quality parameters based on machine learning.</p>


Forests ◽  
2019 ◽  
Vol 10 (12) ◽  
pp. 1073 ◽  
Author(s):  
Li ◽  
Li ◽  
Li ◽  
Liu

Forest biomass is a major store of carbon and plays a crucial role in the regional and global carbon cycle. Accurate forest biomass assessment is important for monitoring and mapping the status of and changes in forests. However, while remote sensing-based forest biomass estimation in general is well developed and extensively used, improving the accuracy of biomass estimation remains challenging. In this paper, we used China’s National Forest Continuous Inventory data and Landsat 8 Operational Land Imager data in combination with three algorithms, either the linear regression (LR), random forest (RF), or extreme gradient boosting (XGBoost), to establish biomass estimation models based on forest type. In the modeling process, two methods of variable selection, e.g., stepwise regression and variable importance-base method, were used to select optimal variable subsets for LR and machine learning algorithms (e.g., RF and XGBoost), respectively. Comfortingly, the accuracy of models was significantly improved, and thus the following conclusions were drawn: (1) Variable selection is very important for improving the performance of models, especially for machine learning algorithms, and the influence of variable selection on XGBoost is significantly greater than that of RF. (2) Machine learning algorithms have advantages in aboveground biomass (AGB) estimation, and the XGBoost and RF models significantly improved the estimation accuracy compared with the LR models. Despite that the problems of overestimation and underestimation were not fully eliminated, the XGBoost algorithm worked well and reduced these problems to a certain extent. (3) The approach of AGB modeling based on forest type is a very advantageous method for improving the performance at the lower and higher values of AGB. Some conclusions in this paper were probably different as the study area changed. The methods used in this paper provide an optional and useful approach for improving the accuracy of AGB estimation based on remote sensing data, and the estimation of AGB was a reference basis for monitoring the forest ecosystem of the study area.


2019 ◽  
Vol 20 (S2) ◽  
Author(s):  
Varun Khanna ◽  
Lei Li ◽  
Johnson Fung ◽  
Shoba Ranganathan ◽  
Nikolai Petrovsky

Abstract Background Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. Results Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. Conclusion We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.


2022 ◽  
pp. 1-20
Author(s):  
Salim Moudache ◽  
◽  
Mourad Badri

This work aims to investigate the potential, from different perspectives, of a risk model to support Cross-Version Fault and Severity Prediction (CVFSP) in object-oriented software. The risk of a class is addressed from the perspective of two particular factors: the number of faults it can contain and their severity. We used various object-oriented metrics to capture the two risk factors. The risk of a class is modeled using the concept of Euclidean distance. We used a dataset collected from five successive versions of an open-source Java software system (ANT). We investigated different variants of the considered risk model, based on various combinations of object-oriented metrics pairs. We used different machine learning algorithms for building the prediction models: Naive Bayes (NB), J48, Random Forest (RF), Support Vector Machines (SVM) and Multilayer Perceptron (ANN). We investigated the effectiveness of the prediction models for Cross-Version Fault and Severity Prediction (CVFSP), using data of prior versions of the considered system. We also investigated if the considered risk model can give as output the Empirical Risk (ER) of a class, a continuous value considering both the number of faults and their different levels of severity. We used different techniques for building the prediction models: Linear Regression (LR), Gaussian Process (GP), Random forest (RF) and M5P (two decision trees algorithms), SmoReg and Artificial Neural Network (ANN). The considered risk model achieves acceptable results for both cross-version binary fault prediction (a g-mean of 0.714, an AUC of 0.725) and cross-version multi-classification of levels of severity (a g-mean of 0.758, an AUC of 0.771). The model also achieves good results in the estimation of the empirical risk of a class by considering both the number of faults and their levels of severity (intra-version analysis with a correlation coefficient of 0.659, cross-version analysis with a correlation coefficient of 0.486).


2021 ◽  
Vol 11 (21) ◽  
pp. 10062
Author(s):  
Aimin Li ◽  
Meng Fan ◽  
Guangduo Qin ◽  
Youcheng Xu ◽  
Hailong Wang

Monitoring open water bodies accurately is important for assessing the role of ecosystem services in the context of human survival and climate change. There are many methods available for water body extraction based on remote sensing images, such as the normalized difference water index (NDWI), modified NDWI (MNDWI), and machine learning algorithms. Based on Landsat-8 remote sensing images, this study focuses on the effects of six machine learning algorithms and three threshold methods used to extract water bodies, evaluates the transfer performance of models applied to remote sensing images in different periods, and compares the differences among these models. The results are as follows. (1) Various algorithms require different numbers of samples to reach their optimal consequence. The logistic regression algorithm requires a minimum of 110 samples. As the number of samples increases, the order of the optimal model is support vector machine, neural network, random forest, decision tree, and XGBoost. (2) The accuracy evaluation performance of each machine learning on the test set cannot represent the local area performance. (3) When these models are directly applied to remote sensing images in different periods, the AUC indicators of each machine learning algorithm for three regions all show a significant decline, with a decrease range of 0.33–66.52%, and the differences among the different algorithm performances in the three areas are obvious. Generally, the decision tree algorithm has good transfer performance among the machine learning algorithms with area under curve (AUC) indexes of 0.790, 0.518, and 0.697 in the three areas, respectively, and the average value is 0.668. The Otsu threshold algorithm is the optimal among threshold methods, with AUC indexes of 0.970, 0.617, and 0.908 in the three regions respectively and an average AUC of 0.832.


2021 ◽  
Vol 13 (24) ◽  
pp. 5166
Author(s):  
Jianjun Wang ◽  
Qi Zhou ◽  
Jiali Shang ◽  
Chang Liu ◽  
Tingxuan Zhuang ◽  
...  

In recent years, the delay in sowing has become a major obstacle to high wheat yield in Jiangsu Province, one of the major wheat producing areas in China; hence, it is necessary to screen wheat varieties are resilient for late sowing. This study aimed to provide an effective, fast, and non-destructive monitoring method of soil plant analysis development (SPAD) values, which can represent leaf chlorophyll contents, for late-sown winter wheat variety screening. This study acquired multispectral images using an unmanned aerial vehicle (UAV) at the overwintering stage of winter wheat growth, and further processed these images to extract reflectance of five single spectral bands and calculated 26 spectral vegetation indices. Based on these 31 variables, this study combined three variable selection methods (i.e., recursive feature elimination (RFE), random forest (RF), and Pearson correlation coefficient (r)) with four machine learning algorithms (i.e., random forest regression (RFR), linear kernel-based support vector regression (SVR), radial basis function (RBF) kernel-based SVR, and sigmoid kernel-based SVR), resulted in seven SVR models (i.e., RFE-SVR_linear, RF-SVR_linear, RF-SVR_RBF, RF-SVR_sigmoid, r-SVR_linear, r-SVR_RBF, and r-SVR_sigmoid) and three RFR models (i.e., RFE-RFR, RF-RFR, and r-RFR). The performances of the 10 machine learning models were evaluated and compared with each other according to the achieved coefficient of determination (R2), residual prediction deviation (RPD), root mean square error (RMSE), and relative RMSE (RRMSE) in SPAD estimation. Of the 10 models, the best one was the RF-SVR_sigmoid model, which was the combination of the RF variable selection method and the sigmoid kernel-based SVR algorithm. It achieved high accuracy in estimating SPAD values of the wheat canopy (R2 = 0.754, RPD = 2.017, RMSE = 1.716 and RRMSE = 4.504%). The newly developed UAV- and machine learning-based model provided a promising and real time method to monitor chlorophyll contents at the overwintering stage, which can benefit late-sown winter wheat variety screening.


2018 ◽  
Vol 10 (10) ◽  
pp. 1522 ◽  
Author(s):  
Gina Leonita ◽  
Monika Kuffer ◽  
Richard Sliuzas ◽  
Claudio Persello

The survey-based slum mapping (SBSM) program conducted by the Indonesian government to reach the national target of “cities without slums” by 2019 shows mapping inconsistencies due to several reasons, e.g., the dependency on the surveyor’s experiences and the complexity of the slum indicators set. By relying on such inconsistent maps, it will be difficult to monitor the national slum upgrading program’s progress. Remote sensing imagery combined with machine learning algorithms could support the reduction of these inconsistencies. This study evaluates the performance of two machine learning algorithms, i.e., support vector machine (SVM) and random forest (RF), for slum mapping in support of the slum mapping campaign in Bandung, Indonesia. Recognizing the complexity in differentiating slum and formal areas in Indonesia, the study used a combination of spectral, contextual, and morphological features. In addition, sequential feature selection (SFS) combined with the Hilbert–Schmidt independence criterion (HSIC) was used to select significant features for classifying slums. Overall, the highest accuracy (88.5%) was achieved by the SVM with SFS using contextual, morphological, and spectral features, which is higher than the estimated accuracy of the SBSM. To evaluate the potential of machine learning-based slum mapping (MLBSM) in support of slum upgrading programs, interviews were conducted with several local and national stakeholders. Results show that local acceptance for a remote sensing-based slum mapping approach varies among stakeholder groups. Therefore, a locally adapted framework is required to combine ground surveys with robust and consistent machine learning methods, for being able to deal with big data, and to allow the rapid extraction of consistent information on the dynamics of slums at a large scale.


2021 ◽  
Author(s):  
Ahmed A Al-Jaishi ◽  
Monica Taljaard ◽  
Melissa D Al-Jaishi ◽  
Sheikh S Abdullah ◽  
Lehana Thabane ◽  
...  

Abstract Background: Cluster randomized trials (CRTs) are becoming an increasingly important design. However, authors do not always adhere to requirements to explicitly identify the design as cluster randomized in titles and abstracts, making retrieval from bibliographic databases difficult. Machine learning algorithms may improve their identification and retrieval. Therefore, we aimed to develop machine learning algorithms that accurately determine whether a bibliographic citation is a CRT report. Methods: We trained, internally validated, and externally validated two convolutional neural networks and one support vector machines (SVM) algorithms to predict whether a citation is a CRT report or not. We exclusively used the information in an article citation, including the title, abstract, keywords, and subject headings. The algorithms' output was a probability from 0 to 1. We assessed algorithm performance using the area under the receiver operating characteristic (AUC) curves. Each algorithm's performance was evaluated individually and together as an ensemble. We randomly selected 5000 from 87,633 citations to train and internally validate our algorithms. Of the 5000 selected citations, 589 (12%) were confirmed CRT reports. We then externally validated our algorithms on an independent set of 1916 randomized trial citations, with 665 (35%) confirmed CRT reports. Results: In internal validation, the ensemble algorithm discriminated best for identifying CRT reports with an AUC of 98.6% (95% confidence interval: 97.8%, 99.4%), sensitivity of 97.7% (94.3%, 100%), and specificity of 85.0% (81.8%, 88.1%). In external validation, the ensemble algorithm had an AUC of 97.8 % (97.0%, 98.5%), sensitivity of 97.6% (96.4%, 98.6%), and specificity of 78.2% (75.9%, 80.4%)). All three individual algorithms performed well, but less so than the ensemble. Conclusions: We successfully developed high-performance algorithms that identified whether a citation was a CRT report with high sensitivity and moderately high specificity. We provide open-source software to facilitate the use of our algorithms in practice.


2019 ◽  
Vol 11 (22) ◽  
pp. 2716 ◽  
Author(s):  
Xianju Li ◽  
Zhuang Tang ◽  
Weitao Chen ◽  
Lizhe Wang

Land cover classification (LCC) of complex landscapes is attractive to the remote sensing community but poses great challenges. In complex open pit mining and agricultural development landscapes (CMALs), the landscape-specific characteristics limit the accuracy of LCC. The combination of traditional feature engineering and machine learning algorithms (MLAs) is not sufficient for LCC in CMALs. Deep belief network (DBN) methods achieved success in some remote sensing applications because of their excellent unsupervised learning ability in feature extraction. The usability of DBN has not been investigated in terms of LCC of complex landscapes and integrating multimodal inputs. A novel multimodal and multi-model deep fusion strategy based on DBN was developed and tested for fine LCC (FLCC) of CMALs in a 109.4 km 2 area of Wuhan City, China. First, low-level and multimodal spectral–spatial and topographic features derived from ZiYuan-3 imagery were extracted and fused. The features were then input into a DBN for deep feature learning. The developed features were fed to random forest and support vector machine (SVM) algorithms for classification. Experiments were conducted that compared the deep features with the softmax function and low-level features with MLAs. Five groups of training, validation, and test sets were performed with some spatial auto-correlations. A spatially independent test set and generalized McNemar tests were also employed to assess the accuracy. The fused model of DBN-SVM achieved overall accuracies (OAs) of 94.74%± 0.35% and 81.14% in FLCC and LCC, respectively, which significantly outperformed almost all other models. From this model, only three of the twenty land covers achieved OAs below 90%. In general, the developed model can contribute to FLCC and LCC in CMALs, and more deep learning algorithm-based models should be investigated in future for the application of FLCC and LCC in complex landscapes.


2018 ◽  
Vol 2 (1) ◽  
pp. 31-46
Author(s):  
Soumya K. Das ◽  
Prakash P. S. ◽  
Bharath Aithal

Building extraction has been a challenging task due to complex structures and features of various land use with matching spectral and spatial attributes in a satellite data. We attempted to extract building as features using machine-learning algorithms such as Support Vector Machine (SVM), Random Forests (RF), Artificial Neural Network (ANN) and Improved Ensemble Technique as Gradient Boosting. The techniques used increases their classification accuracies using spectral properties as well as indices such as Normalized Difference Vegetation Index (NDVI) as attributes. Extracted results through various methods, performance of three different machine learning such as Ensemble method, RF and SVM are applied and results are analyzed for their behavior in different building distribution. Different algorithms showed variations in accuracies and performance in different built-up conditions. Ensemble algorithm performed very well in all conditions followed by RF and SVM performed better in coarse resolution, while ANN performed better in high resolution and overall accuracies of all algorithms increased with better spatial resolution. Ensemble algorithm showed relatively efficient performance in regions with extensive heterogeneous features. These analyses can helpful to provide quantitative data for various stocktaking analysis and city managers for better administration capabilities.


Sign in / Sign up

Export Citation Format

Share Document