scholarly journals Maximizing Correctness with Minimal User Effort to Learn Data Transformations

Author(s):  
Bo Wu ◽  
Craig A. Knoblock
Keyword(s):  
2021 ◽  
Vol 13 (3) ◽  
pp. 408
Author(s):  
Charles Nickmilder ◽  
Anthony Tedde ◽  
Isabelle Dufrasne ◽  
Françoise Lessire ◽  
Bernard Tychon ◽  
...  

Accurate information about the available standing biomass on pastures is critical for the adequate management of grazing and its promotion to farmers. In this paper, machine learning models are developed to predict available biomass expressed as compressed sward height (CSH) from readily accessible meteorological, optical (Sentinel-2) and radar satellite data (Sentinel-1). This study assumed that combining heterogeneous data sources, data transformations and machine learning methods would improve the robustness and the accuracy of the developed models. A total of 72,795 records of CSH with a spatial positioning, collected in 2018 and 2019, were used and aggregated according to a pixel-like pattern. The resulting dataset was split into a training one with 11,625 pixellated records and an independent validation one with 4952 pixellated records. The models were trained with a 19-fold cross-validation. A wide range of performances was observed (with mean root mean square error (RMSE) of cross-validation ranging from 22.84 mm of CSH to infinite-like values), and the four best-performing models were a cubist, a glmnet, a neural network and a random forest. These models had an RMSE of independent validation lower than 20 mm of CSH at the pixel-level. To simulate the behavior of the model in a decision support system, performances at the paddock level were also studied. These were computed according to two scenarios: either the predictions were made at a sub-parcel level and then aggregated, or the data were aggregated at the parcel level and the predictions were made for these aggregated data. The results obtained in this study were more accurate than those found in the literature concerning pasture budgeting and grassland biomass evaluation. The training of the 124 models resulting from the described framework was part of the realization of a decision support system to help farmers in their daily decision making.


Author(s):  
Jonathan D Ericson ◽  
Elizabeth R Chrastil ◽  
William H Warren

Space syntax is an influential framework for quantifying the relationship between environmental geometry and human behavior. Although many studies report high syntactic–behavioral correlations, previous pedestrian data were collected at low spatiotemporal resolutions, and data transformations and sampling strategies vary widely; here, we systematically test the robustness of space syntax’s predictive strength by examining how these factors impact correlations. We used virtual reality and motion tracking to correlate 30 syntactic measures with high resolution walking trajectories downsampled at 10 grid resolutions and subjected to various log transformations. Overall, correlations declined with increasing grid resolution and were sensitive to data transformations. Moreover, simulations revealed spuriously high correlations (e.g. R2 = 1) with sparsely sampled data (<23 locations). These results strongly suggest that syntactic–behavioral correlations are not robust to changes in spatiotemporal resolution, and that high correlations obtained in previous studies could be inflated due to transformations, data resolution, or sampling strategies.


2012 ◽  
Vol 37 (1) ◽  
pp. 1-12 ◽  
Author(s):  
Wenfei Fan ◽  
Floris Geerts ◽  
Lixiao Zheng
Keyword(s):  

Author(s):  
E. Onur Turgay ◽  
Thomas B. Pedersen ◽  
Yücel Saygın ◽  
Erkay Savaş ◽  
Albert Levi
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document