choose parameter
Recently Published Documents


TOTAL DOCUMENTS

5
(FIVE YEARS 1)

H-INDEX

2
(FIVE YEARS 0)

Author(s):  
John Waller

Geographic outliers at GBIF (Global Biodiversity Information Facility) are a known problem. Outliers can be errors, coordinates with high uncertainty, or simply occurrences from an undersampled region. Often in data cleaning pipelines, outliers are removed (even if they are legitimate points) because the researcher does not have time to verify each record one-by-one. Outlier points are usually occurrences that need attention. Currently, there is no outlier detection implemented at GBIF and it is up to the user to flag outliers themselves. DBSCAN (a density-based algorithm for discovering clusters in large spatial databases with noise) is a simple and popular clustering algorithm. It uses two parameters, (1) distance and (2) a minimum number of points per cluster, to decide if something is an outlier. Since occurrence data can be very patchy, non-clustering distance-based methods will fail often Fig. 1. DBSCAN does not need to know the expected number of clusters in advance. DBSCAN does well using only distance and does not require some additional environmental variables like Bioclim. Advanatages of DBSCAN : Simple Easy to understand Only two parameters to set Scales well No additional data sources needed Users would understand how their data was changed Simple Easy to understand Only two parameters to set Scales well No additional data sources needed Users would understand how their data was changed Drawbacks : Only uses distance Must choose parameter settings Sensitive to sparse global sampling Does not include any other relevant environmental information Can only flag outliers outside of a point blob Only uses distance Must choose parameter settings Sensitive to sparse global sampling Does not include any other relevant environmental information Can only flag outliers outside of a point blob Outlier detection and error detection are different. If your goal is to produce a system with no false positives, it will fail. While more complex environmentally-informed outlier detection methods (like reverse jackknifing (Chapman 2005)) might perform better for certain examples or even in genreal, DBSCAN performs adequately on almost everything despite being very simple. Currently I am using DBSCAN to find errors and assess dataset quality. It is a Spark job written in Scala (github). It does not run on species with lots of (>30K) unique latitude-longitude points, since the current implementation relies on an in-memory distance matrix. However, around 99% of species (plants, animals, fungi) on GBIF have fewer than >30K unique lat-long points (2,283 species keys / 222,993 species keys). There are other implementations ( example) that might scale to many more points. There are no immediate plans to include DBSCAN outliers as a data quality flag on GBIF, but it could be done somewhat easily, since this type of method does not rely on any external environmental data sources and already runs on the GBIF cluster.


2018 ◽  
Vol 18 (1) ◽  
pp. 56-63 ◽  
Author(s):  
Dinita Rahmalia

Pest in agriculture can raise plant disease and fail to harvest. The pest problem in agriculture can be solved by using pesticide. Pesticide usage must be done proportionally. So, the manufacturer should fix standard pesticide active ingredient in pesticide production. Forecast is a prediction of some future evens. In forecast problem, there are any parameters which should be determined. Parameters can be estimated by exact method or heuristic method. Ant Colony Optimization (ACO) is inspired from the cooperative behavior of ant colonies, which can find the shortest path from their nest to a food source. In this research, we use heuristic method like ACO to estimate exponential smoothing parameter on pesticide active ingredient forecast and pesticide sample weight forecast. From the simulation, on the first iteration, all ants choose parameter randomly. At the optimization process, we update pheromone until all ants choose the similar parameter so that process converges and variance approaches to zero. The optimal exponential smoothing parameter can be applied in forecasting with minimum sum of squared error (SSE).


2015 ◽  
Vol 8 (4) ◽  
pp. 1071-1083 ◽  
Author(s):  
I. Bilionis ◽  
B. A. Drewniak ◽  
E. M. Constantinescu

Abstract. Farming is using more of the land surface, as population increases and agriculture is increasingly applied for non-nutritional purposes such as biofuel production. This agricultural expansion exerts an increasing impact on the terrestrial carbon cycle. In order to understand the impact of such processes, the Community Land Model (CLM) has been augmented with a CLM-Crop extension that simulates the development of three crop types: maize, soybean, and spring wheat. The CLM-Crop model is a complex system that relies on a suite of parametric inputs that govern plant growth under a given atmospheric forcing and available resources. CLM-Crop development used measurements of gross primary productivity (GPP) and net ecosystem exchange (NEE) from AmeriFlux sites to choose parameter values that optimize crop productivity in the model. In this paper, we calibrate these parameters for one crop type, soybean, in order to provide a faithful projection in terms of both plant development and net carbon exchange. Calibration is performed in a Bayesian framework by developing a scalable and adaptive scheme based on sequential Monte Carlo (SMC). The model showed significant improvement of crop productivity with the new calibrated parameters. We demonstrate that the calibrated parameters are applicable across alternative years and different sites.


2014 ◽  
Vol 7 (5) ◽  
pp. 6733-6771 ◽  
Author(s):  
I. Bilionis ◽  
B. A. Drewniak ◽  
E. M. Constantinescu

Abstract. Farming is using more terrestrial ground, as population increases and agriculture is increasingly used for non-nutritional purposes such as biofuel production. This agricultural expansion exerts an increasing impact on the terrestrial carbon cycle. In order to understand the impact of such processes, the Community Land Model (CLM) has been augmented with a CLM-Crop extension that simulates the development of three crop types: maize, soybean, and spring wheat. The CLM-Crop model is a complex system that relies on a suite of parametric inputs that govern plant growth under a given atmospheric forcing and available resources. CLM-Crop development used measurements of gross primary productivity and net ecosystem exchange from AmeriFlux sites to choose parameter values that optimize crop productivity in the model. In this paper we calibrate these parameters for one crop type, soybean, in order to provide a faithful projection in terms of both plant development and net carbon exchange. Calibration is performed in a Bayesian framework by developing a scalable and adaptive scheme based on sequential Monte Carlo (SMC).


2013 ◽  
Vol 6 (1) ◽  
pp. 379-398 ◽  
Author(s):  
X. Zeng ◽  
B. A. Drewniak ◽  
E. M. Constantinescu

Abstract. Farming is using more terrestrial ground with increases in population and the expanding use of agriculture for non-nutritional purposes such as biofuel production. This agricultural expansion exerts an increasing impact on the terrestrial carbon cycle. In order to understand the impact of such processes, the Community Land Model (CLM) has been augmented with a CLM-Crop extension that simulates the development of three crop types: maize, soybean, and spring wheat. The CLM-Crop model is a complex system that relies on a suite of parametric inputs that govern plant growth under a given atmospheric forcing and available resources. CLM-Crop development used measurements of gross primary productivity and net ecosystem exchange from AmeriFlux sites to choose parameter values that optimize crop productivity in the model. In this paper we calibrate these values in order to provide a faithful projection in terms of both plant development and net carbon exchange, using a Markov chain Monte Carlo technique.


Sign in / Sign up

Export Citation Format

Share Document