Research of Text Categorization Model based on Random Forests

Author(s):  
Dashen Xue ◽  
Fengxin Li
2019 ◽  
Vol 11 (6) ◽  
pp. 670 ◽  
Author(s):  
Sarah Banks ◽  
Lori White ◽  
Amir Behnamian ◽  
Zhaohua Chen ◽  
Benoit Montpetit ◽  
...  

To better understand and mitigate threats to the long-term health and functioning of wetlands, there is need to establish comprehensive inventorying and monitoring programs. Here, remote sensing data and machine learning techniques that could support or substitute traditional field-based data collection are evaluated. For the Bay of Quinte on Lake Ontario, Canada, different combinations of multi-angle/temporal quad pol RADARSAT-2, simulated compact pol RADARSAT Constellation Mission (RCM), and high and low spatial resolution Digital Elevation and Surface Models (DEM and DSM, respectively) were used to classify six land cover classes with Random Forests: shallow water, marsh, swamp, water, forest, and agriculture/non-forested. Results demonstrate that high accuracies can be achieved with multi-temporal SAR data alone (e.g., user’s and producer’s accuracies ≥90% for a model based on a spring image and a summer image), or via fusion of SAR and DEM and DSM data for single dates/incidence angles (e.g., user’s and producer’s accuracies ≥90% for a model based on a spring image, DEM, and DSM data). For all models based on single SAR images, simulated compact pol data generally achieved lower accuracies than quad pol RADARSAT-2 data. However, it was possible to compensate for observed differences through either multi-temporal/angle data fusion or the inclusion of DEM and DSM data (i.e., as a result, there was not a statistically significant difference between multiple models). With a higher repeat-pass cycle than RADARSAT-2, RCM is expected to be a reliable source of C-band SAR data that will contribute positively to ongoing efforts to inventory wetlands and monitor change in areas containing the same land cover classes evaluated here.


2019 ◽  
Vol 11 (16) ◽  
pp. 1944 ◽  
Author(s):  
Jessica Esteban ◽  
Ronald McRoberts ◽  
Alfredo Fernández-Landa ◽  
José Tomé ◽  
Erik Nӕsset

Despite the popularity of random forests (RF) as a prediction algorithm, methods for constructing confidence intervals for population means using this technique are still only sparsely reported. For two regional study areas (Spain and Norway) RF was used to predict forest volume or aboveground biomass using remotely sensed auxiliary data obtained from multiple sensors. Additionally, the changes per unit area of these forest attributes were estimated using indirect and direct methods. Multiple inferential frameworks have attracted increased recent attention for estimating the variances required for confidence intervals. For this study, three different statistical frameworks, design-based expansion, model-assisted and model-based estimators, were used for estimating population parameters and their variances. Pairs and wild bootstrapping approaches at different levels were compared for estimating the variances of the model-based estimates of the population means, as well as for mapping the uncertainty of the change predictions. The RF models accurately represented the relationship between the response and remotely sensed predictor variables, resulting in increased precision for estimates of the population means relative to design-based expansion estimates. Standard errors based on pairs bootstrapping within or internal to RF were considerably larger than standard errors based on both pairs and wild external bootstrapping of the entire RF algorithm. Pairs and wild external bootstrapping produced similar standard errors, but wild bootstrapping better mimicked the original structure of the sample data and better preserved the ranges of the predictor variables.


Author(s):  
Kwan Yi ◽  
Jamshid Beheshti

The Hidden Markov model (HMM) has been successfully used for speech recognition, part of speech tagging, and pattern recognition. In this study, we apply the HMM to automatically categorize digital documents into a standard library classification scheme. In the proposed framework, A HMM-based system is viewed as a model to generate a list of words and each document is seen as. . .


Sign in / Sign up

Export Citation Format

Share Document