Is there a 'right' spatial scale? Improving pedological multi- scale modelling by optimizing input data grain size: A Case Study using Average Local Variance.

Author(s):  
Christopher Scarpone ◽  
Anders Knudby ◽  
Stephanie Melles ◽  
Andrew Millward

<p>Current soil mapping practitioners are faced with a plethora of choices of digital data for input to their modelling approaches; these data have local to global extents and are highly variable in their grain size. Deciding at what scale to represent individual covariates for a specific project, therefore, can be difficult and confusing. Moreover, a lack of accessible methodology and tools focused on determining an optimal input data scales (grain size) has led to the current status quo, which is to use data at the scale delivered by the data provider. Soil prediction models are typically applied using the grain size of the coarsest variable, scaling other data to match. In this study, average local variance was investigated as a method to determine optimal grain size(s) for input variables to a soil contaminant prediction model. The Meuse dataset was used, and heavy metal soil contamination was mapped using RandomForest. A Data Cube was employed to handle data inputs of varying grain size. Two scenarios were investigated for model prediction accuracy: (1) contaminant predictions made using data with optimized grain size, and (2) contaminant predictions made using input data where grain size was unchanged, “as received” from the data provider. Both model predictions were assessed using a cross-validation approach. Early results indicate that optimization of grain size based on average local variance can improve prediction accuracy and point toward the importance of understanding the spatial heterogeneity of an input variable and how it changes with different grain sizes prior to incorporation in a predictive model. This research lays a foundation for the creation of an automated approach practitioners can use to help untangle the relationship between the intrinsic spatial scale for a process of interest and how that process is represented in the scale of input data.</p>

Information ◽  
2021 ◽  
Vol 12 (11) ◽  
pp. 442
Author(s):  
Seol-Hyun Noh

A recurrent neural network (RNN) combines variable-length input data with a hidden state that depends on previous time steps to generate output data. RNNs have been widely used in time-series data analysis, and various RNN algorithms have been proposed, such as the standard RNN, long short-term memory (LSTM), and gated recurrent units (GRUs). In particular, it has been experimentally proven that LSTM and GRU have higher validation accuracy and prediction accuracy than the standard RNN. The learning ability is a measure of the effectiveness of gradient of error information that would be backpropagated. This study provided a theoretical and experimental basis for the result that LSTM and GRU have more efficient gradient descent than the standard RNN by analyzing and experimenting the gradient vanishing of the standard RNN, LSTM, and GRU. As a result, LSTM and GRU are robust to the degradation of gradient descent even when LSTM and GRU learn long-range input data, which means that the learning ability of LSTM and GRU is greater than standard RNN when learning long-range input data. Therefore, LSTM and GRU have higher validation accuracy and prediction accuracy than the standard RNN. In addition, it was verified whether the experimental results of river-level prediction models, solar power generation prediction models, and speech signal models using the standard RNN, LSTM, and GRUs are consistent with the analysis results of gradient vanishing.


2021 ◽  
Vol 13 (7) ◽  
pp. 3870
Author(s):  
Mehrbakhsh Nilashi ◽  
Shahla Asadi ◽  
Rabab Ali Abumalloh ◽  
Sarminah Samad ◽  
Fahad Ghabban ◽  
...  

This study aims to develop a new approach based on machine learning techniques to assess sustainability performance. Two main dimensions of sustainability, ecological sustainability, and human sustainability, were considered in this study. A set of sustainability indicators was used, and the research method in this study was developed using cluster analysis and prediction learning techniques. A Self-Organizing Map (SOM) was applied for data clustering, while Classification and Regression Trees (CART) were applied to assess sustainability performance. The proposed method was evaluated through Sustainability Assessment by Fuzzy Evaluation (SAFE) dataset, which comprises various indicators of sustainability performance in 128 countries. Eight clusters from the data were found through the SOM clustering technique. A prediction model was found in each cluster through the CART technique. In addition, an ensemble of CART was constructed in each cluster of SOM to increase the prediction accuracy of CART. All prediction models were assessed through the adjusted coefficient of determination approach. The results demonstrated that the prediction accuracy values were high in all CART models. The results indicated that the method developed by ensembles of CART and clustering provide higher prediction accuracy than individual CART models. The main advantage of integrating the proposed method is its ability to automate decision rules from big data for prediction models. The method proposed in this study could be implemented as an effective tool for sustainability performance assessment.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Fu-Qing Cui ◽  
Wei Zhang ◽  
Zhi-Yun Liu ◽  
Wei Wang ◽  
Jian-bing Chen ◽  
...  

The comprehensive understanding of the variation law of soil thermal conductivity is the prerequisite of design and construction of engineering applications in permafrost regions. Compared with the unfrozen soil, the specimen preparation and experimental procedures of frozen soil thermal conductivity testing are more complex and challengeable. In this work, considering for essentially multiphase and porous structural characteristic information reflection of unfrozen soil thermal conductivity, prediction models of frozen soil thermal conductivity using nonlinear regression and Support Vector Regression (SVR) methods have been developed. Thermal conductivity of multiple types of soil samples which are sampled from the Qinghai-Tibet Engineering Corridor (QTEC) are tested by the transient plane source (TPS) method. Correlations of thermal conductivity between unfrozen and frozen soil has been analyzed and recognized. Based on the measurement data of unfrozen soil thermal conductivity, the prediction models of frozen soil thermal conductivity for 7 typical soils in the QTEC are proposed. To further facilitate engineering applications, the prediction models of two soil categories (coarse and fine-grained soil) have also been proposed. The results demonstrate that, compared with nonideal prediction accuracy of using water content and dry density as the fitting parameter, the ternary fitting model has a higher thermal conductivity prediction accuracy for 7 types of frozen soils (more than 98% of the soil specimens’ relative error are within 20%). The SVR model can further improve the frozen soil thermal conductivity prediction accuracy and more than 98% of the soil specimens’ relative error are within 15%. For coarse and fine-grained soil categories, the above two models still have reliable prediction accuracy and determine coefficient (R2) ranges from 0.8 to 0.91, which validates the applicability for small sample soils. This study provides feasible prediction models for frozen soil thermal conductivity and guidelines of the thermal design and freeze-thaw damage prevention for engineering structures in cold regions.


2021 ◽  
Author(s):  
Xia Li ◽  
Jiulong Cheng ◽  
Dehao Yu ◽  
Yangchun Han

Abstract Most landslide prediction models need to select non-landslides. At present, non-landslides mainly use subjective inference or random selection method, which makes it easy to select non-landslides in high-risk areas. To solve this problem and improve the accuracy of landslide prediction, the method of selecting non-landslide by Information value (IV) is proposed in this study. Firstly, 230 historical landslides and 10 landslide conditioning factors are extracted and interpreted by using Remote Sensing (RS) image, Geographic Information System (GIS) and field survey. Secondly, random, buffer, river channel or slope, and IV methods are used to obtain non-landslides, and the obtained non-landslides are applied to the popular SVM model for landslide hazard mapping (LHM) in western area of Tumen City. The landslide hazard map based on the river channel or slope method is seriously inconsistent with the actual situation of study area, Therefore, the three methods of random, buffer, and IV are verified and compared by accuracy, receiver operating characteristic (ROC) curve and the area under curves (AUC). The results show that the landslide prediction accuracy of the three methods is more than 80%, and the prediction accuracy is high, but the IV is higher. In addition, IV can identify the very high hazard regions with smaller area. Therefore, it is more reasonable to use IV to select non-landslides, and IV method is more practical in landslide prevention and engineering construction. The research results may be useful to provide basic information of landslide hazard for decision makers and planners.


2018 ◽  
Vol 11 (1) ◽  
pp. 64 ◽  
Author(s):  
Kyoung-jae Kim ◽  
Kichun Lee ◽  
Hyunchul Ahn

Measuring and managing the financial sustainability of the borrowers is crucial to financial institutions for their risk management. As a result, building an effective corporate financial distress prediction model has been an important research topic for a long time. Recently, researchers are exerting themselves to improve the accuracy of financial distress prediction models by applying various business analytics approaches including statistical and artificial intelligence methods. Among them, support vector machines (SVMs) are becoming popular. SVMs require only small training samples and have little possibility of overfitting if model parameters are properly tuned. Nonetheless, SVMs generally show high prediction accuracy since it can deal with complex nonlinear patterns. Despite of these advantages, SVMs are often criticized because their architectural factors are determined by heuristics, such as the parameters of a kernel function and the subsets of appropriate features and instances. In this study, we propose globally optimized SVMs, denoted by GOSVM, a novel hybrid SVM model designed to optimize feature selection, instance selection, and kernel parameters altogether. This study introduces genetic algorithm (GA) in order to simultaneously optimize multiple heterogeneous design factors of SVMs. Our study applies the proposed model to the real-world case for predicting financial distress. Experiments show that the proposed model significantly improves the prediction accuracy of conventional SVMs.


Author(s):  
Charles Peterson ◽  
Stephen Spear

Due to the current trend of amphibian declines (Wake 1998, Alford and Richards 1999, Semlitsch 2000), the monitoring and study of amphibian populations has become increasingly necessary. To properly do such studies, we must consider several issues. Some of these include the detectability of the species at a site, current status of the population, and the spatial scale for sampling of a population. Determining the detectability of a species is important to consider because some amphibian species may have different difficulties of detection. Therefore, if a species is difficult to observe, it may occupy a greater number of sites than a survey indicates (MacKenzie et al. 2002). The appropriate spatial scale is also important for monitoring studies. For example, in a pond breeding amphibian, do one or two breeding ponds with the appropriate terrestrial habitat constitute the correct sampling area for a population, or does a population utilize multiple ponds within a larger terrestrial area? If the sampling scale is not appropriate, then any conclusions made may be inaccurate (Wiens 1989). In addition, understanding the terrestrial habitat use of pond breeding amphibians is important for both monitoring and conservation reasons. Many pond-breeding amphibians use the ponds for breeding and then utilize terrestrial zones around the pond for the rest of the year. The total area that is encompassed by these terrestrial zones is known as the terrestrial "buffer zone" or core habitat area for that population (Semlitsch 1998). To identify these core habitat areas, we must know not only the distance that the amphibians physically move from the breeding pond, but also the type of habitat that they will use. For example, short, steep slopes or rivers can serve as a barrier to amphibian movement (Laan and Verboom 1990, Storfer 1999), even if they are within the movement range of a population. Understanding individual movement may also give insights into the spatial population structure of the species. If we can identify the average distance of movement, we can then extrapolate if a breeding pond is likely to have an isolated subpopulation based on its distance from other ponds.


Processes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 158
Author(s):  
Ain Cheon ◽  
Jwakyung Sung ◽  
Hangbae Jun ◽  
Heewon Jang ◽  
Minji Kim ◽  
...  

The application of a machine learning (ML) model to bio-electrochemical anaerobic digestion (BEAD) is a future-oriented approach for improving process stability by predicting performances that have nonlinear relationships with various operational parameters. Five ML models, which included tree-, regression-, and neural network-based algorithms, were applied to predict the methane yield in BEAD reactor. The results showed that various 1-step ahead ML models, which utilized prior data of BEAD performances, could enhance prediction accuracy. In addition, 1-step ahead with retraining algorithm could improve prediction accuracy by 37.3% compared with the conventional multi-step ahead algorithm. The improvement was particularly noteworthy in tree- and regression-based ML models. Moreover, 1-step ahead with retraining algorithm showed high potential of achieving efficient prediction using pH as a single input data, which is plausibly an easier monitoring parameter compared with the other parameters required in bioprocess models.


2020 ◽  
Author(s):  
Sagnik Palmal ◽  
Kaustubh Adhikari ◽  
Javier Mendoza-Revilla ◽  
Macarena Fuentes-Guajardo ◽  
Caio C. Silva de Cerqueira ◽  
...  

AbstractWe report an evaluation of prediction accuracy for eye, hair and skin pigmentation based on genomic and phenotypic data for over 6,500 admixed Latin Americans (the CANDELA dataset). We examined the impact on prediction accuracy of three main factors: (i) The methods of prediction, including classical statistical methods and machine learning approaches, (ii) The inclusion of non-genetic predictors, continental genetic ancestry and pigmentation SNPs in the prediction models, and (iii) Compared two sets of pigmentation SNPs: the commonly-used HIrisPlex-S set (developed in Europeans) and novel SNP sets we defined here based on genome-wide association results in the CANDELA sample. We find that Random Forest or regression are globally the best performing methods. Although continental genetic ancestry has substantial power for prediction of pigmentation in Latin Americans, the inclusion of pigmentation SNPs increases prediction accuracy considerably, particularly for skin color. For hair and eye color, HIrisPlex-S has a similar performance to the CANDELA-specific prediction SNP sets. However, for skin pigmentation the performance of HIrisPlex-S is markedly lower than the SNP set defined here, including predictions in an independent dataset of Native American data. These results reflect the relatively high variation in hair and eye color among Europeans for whom HIrisPlex-S was developed, whereas their variation in skin pigmentation is comparatively lower. Furthermore, we show that the dataset used in the training of prediction models strongly impacts on the portability of these models across Europeans and Native Americans.


Sign in / Sign up

Export Citation Format

Share Document