Topsoil Texture Regionalization for Agricultural Soils in Germany—An Iterative Approach to Advance Model Interpretation

Site-specific spatially continuous soil texture data is required for many purposes such as the simulation of carbon dynamics, the estimation of drought impact on agriculture, or the modeling of water erosion rates. At large scales, there are often only conventional polygon-based soil texture maps, which are hardly reproducible, contain abrupt changes at polygon borders, and therefore are not suitable for most quantitative applications. Digital soil mapping methods can provide the required soil texture information in form of reproducible site-specific predictions with associated uncertainties. Machine learning models were trained in a nested cross-validation approach to predict the spatial distribution of the topsoil (0–30 cm) clay, silt, and sand contents in 100 m resolution. The differential evolution algorithm was applied to optimize the model parameters. High-quality nation-wide soil texture data of 2,991 soil profiles was obtained from the first German agricultural soil inventory. We tested an iterative approach by training models on predictor datasets of increasing size, which contained up to 50 variables. The best results were achieved when training the models on the complete predictor dataset. They explained about 59% of the variance in clay, 75% of the variance in silt, and 77% of the variance in sand content. The RMSE values ranged between approximately 8.2 wt.% (clay), 11.8 wt.% (silt), and 15.0 wt.% (sand). Due to their high performance, models were able to predict the spatial texture distribution. They captured the high importance of the soil forming factors parent material and relief. Our results demonstrate the high predictive power of machine learning in predicting soil texture at large scales. The iterative approach enhanced model interpretability. It revealed that the incorporated soil maps partly substituted the relief and parent material predictors. Overall, the spatially continuous soil texture predictions provide valuable input for many quantitative applications on agricultural topsoils in Germany.

Download Full-text

Soil Erosion Model Predictions Using Parent Material/Soil Texture-Based Parameters Compared to Using Site-Specific Parameters

Transactions of the ASABE ◽

10.13031/2013.39036 ◽

2011 ◽

Vol 54 (4) ◽

pp. 1347-1356 ◽

Cited By ~ 7

Author(s):

R. B. Foltz ◽

W. J. Elliot ◽

N. S. Wagenbrenner

Keyword(s):

Soil Erosion ◽

Soil Texture ◽

Parent Material ◽

Site Specific ◽

Erosion Model ◽

Model Predictions

Download Full-text

What do carbon fractions and C:N ratios tell us about the origin of carbon in German agricultural soils?

10.5194/egusphere-egu2020-18540 ◽

2020 ◽

Author(s):

Florian Schneider ◽

Axel Don

Keyword(s):

Land Use ◽

Organic Carbon ◽

Soil Texture ◽

Agricultural Soils ◽

C:N Ratio ◽

Mineral Soil ◽

Carbon Stocks ◽

Parent Material ◽

Deep Roots ◽

Depth Gradients

Agricultural soils in Germany store about 2.5 Pg (1 Pg = 1015 g) of organic carbon in 0-100 cm depth. If this carbon was all powdered charcoal, it would fill a train with 61 million carriages, extending 2.5 times the distance to the moon. This study aimed at better understanding the origin of the organic carbon contained in mineral soils under agricultural use. For this, total organic carbon (TOC), C:N ratios and particulate organic carbon (POC) of 2,939 crop- and grassland sites scattered in a 8x8 km grid across Germany were evaluated. RandomForest algorithms were trained to predict TOC, C:N, POC and their respective depth gradients down to 100 cm based on pedology, geology, climate, land-use and management data. The data originated from the first German Agricultural Soil Inventory, which was completed in 2018, comprising 14,420 mineral soil samples and 36,163 years of reported management.In 0-10 cm, land-use and/or texture were the major drivers for TOC, C:N and POC. At larger depths, the effect of current land-use vanished while soil texture remained important. Additionally, with increasing depth, soil parent materials and/or pedogenic processes gained in importance for explaining TOC, C:N and POC. Colluvial material, buried topsoil, fluvio-marine deposits and loess showed significantly higher TOC and POC contents and a higher C:N ratios than soil that developed from other parent material. Also, Podzols and Chernozems showed significantly higher TOC and POC contents and a higher C:N ratio in the subsoil than other soil types at similar depths because of illuvial organic matter deposits and bioturbation, respectively. In 30-70 cm depth, many sandy sites in north-western Germany showed TOC, POC and C:N values above average, which was a legacy of historic peat- and heathland cover. The depth gradients of TOC, POC and C:N showed only little dependence on soil texture suggesting that they were robust towards differences in carbon stabilization due to organo-mineral associations. Instead, these depth gradients were largely driven by land-use (redistribution of carbon in cropland by ploughing) and variables describing historic carbon inputs (e.g. information on topsoil burial). Hardpans with packing densities > 1.75 g cm-3 intensified the depth gradients of TOC, POC and C:N significantly, suggesting that such densely packed layers restricted the elongation of deep roots and therefore reduced organic carbon inputs into the subsoil.Today&#8217;s soil organic carbon stocks reflect past organic carbon inputs. Considering that in 0-10 cm, current land-use superseded the effect of past land-cover on TOC while land-use showed no effect on POC and C:N, we conclude that topsoil carbon stocks derived from relatively recent carbon inputs (< 100 years) with high turnover. In the subsoil, however, most carbon originated from the soil parent material or was translocated from the topsoil during soil formation. High C:N ratios and POC content of buried topsoils confirm low turnover rates of subsoil carbon. The contribution of recent, root-derived carbon inputs to subsoils was small but significant. Loosening of wide-spread hardpans could facilitate deeper rooting and increase carbon stocks along with crop yield.

Download Full-text

Global fine resolution mapping of ozone metrics through explainable machine learning

10.5194/egusphere-egu21-7596 ◽

2021 ◽

Author(s):

Clara Betancourt ◽

Scarlet Stadtler ◽

Timo Stomberg ◽

Ann-Kathrin Edrich ◽

Ankit Patnala ◽

...

Keyword(s):

Machine Learning ◽

Environmental Factors ◽

Tropospheric Ozone ◽

High Performance ◽

Learning Task ◽

Error Modeling ◽

Data Driven ◽

Model Parameters ◽

Learning Methods ◽

Machine Learning Methods

Through the availability of multi-year ground based ozone observations on a global scale, substantial geospatial meta data, and high performance computing capacities, it is now possible to use machine learning for a global data-driven ozone assessment. In this presentation, we will show a novel, completely data-driven approach to map tropospheric ozone globally.Our goal is to interpolate ozone metrics and aggregated statistics from the database of the Tropospheric Ozone Assessment Report (TOAR) onto a global 0.1&#176; x 0.1&#176; resolution grid. &#160;It is challenging to interpolate ozone, a toxic greenhouse gas because its formation depends on many interconnected environmental factors on small scales. We conduct the interpolation with various machine learning methods trained on aggregated hourly ozone data from five years at more than 5500 locations worldwide. We use several geospatial datasets as training inputs to provide proxy input for environmental factors controlling ozone formation, such as precursor emissions and climate. The resulting maps contain different ozone metrics, i.e. statistical aggregations which are widely used to assess air pollution impacts on health, vegetation, and climate.The key aspects of this contribution are twofold: First, we apply explainable machine learning methods to the data-driven ozone assessment. Second, we discuss dominant uncertainties relevant to the ozone mapping and quantify their impact whenever possible. Our methods include a thorough a-priori uncertainty estimation of the various data and methods, assessment of scientific consistency, finding critical model parameters, using ensemble methods, and performing error modeling.Our work aims to increase the reliability and integrity of the derived ozone maps through the provision of scientific robustness to a data-centric machine learning task. This study hence represents a blueprint for how to formulate an environmental machine learning task scientifically, gather the necessary data, and develop a data-driven workflow that focuses on optimizing transparency and applicability of its product to maximize its scientific knowledge return.

Download Full-text

Statistical and machine learning models for optimizing energy in parallel applications

The International Journal of High Performance Computing Applications ◽

10.1177/1094342019842915 ◽

2019 ◽

Vol 33 (6) ◽

pp. 1079-1097 ◽

Cited By ~ 2

Author(s):

Mark Endrei ◽

Chao Jin ◽

Minh Ngoc Dinh ◽

David Abramson ◽

Heidi Poxon ◽

...

Keyword(s):

Machine Learning ◽

Energy Efficiency ◽

High Performance ◽

Large Scale ◽

Energy Use ◽

Parallel Applications ◽

Learning Models ◽

Trade Off ◽

Time Required ◽

Machine Learning Models

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.

Download Full-text

Use of Machine Learning to Investigate the Quantitative Checklist for Autism in Toddlers (Q-CHAT) towards Early Autism Screening

Diagnostics ◽

10.3390/diagnostics11030574 ◽

2021 ◽

Vol 11 (3) ◽

pp. 574

Author(s):

Gennaro Tartarisco ◽

Giovanni Cicceri ◽

Davide Di Pietro ◽

Elisa Leonardi ◽

Stefania Aiello ◽

...

Keyword(s):

Machine Learning ◽

High Performance ◽

Behavioral Science ◽

Autistic Traits ◽

Classification Performance ◽

Recursive Feature Elimination ◽

Diagnostic Tools ◽

Support Vector ◽

K Nearest Neighbors ◽

Autism Screening

In the past two decades, several screening instruments were developed to detect toddlers who may be autistic both in clinical and unselected samples. Among others, the Quantitative CHecklist for Autism in Toddlers (Q-CHAT) is a quantitative and normally distributed measure of autistic traits that demonstrates good psychometric properties in different settings and cultures. Recently, machine learning (ML) has been applied to behavioral science to improve the classification performance of autism screening and diagnostic tools, but mainly in children, adolescents, and adults. In this study, we used ML to investigate the accuracy and reliability of the Q-CHAT in discriminating young autistic children from those without. Five different ML algorithms (random forest (RF), naïve Bayes (NB), support vector machine (SVM), logistic regression (LR), and K-nearest neighbors (KNN)) were applied to investigate the complete set of Q-CHAT items. Our results showed that ML achieved an overall accuracy of 90%, and the SVM was the most effective, being able to classify autism with 95% accuracy. Furthermore, using the SVM–recursive feature elimination (RFE) approach, we selected a subset of 14 items ensuring 91% accuracy, while 83% accuracy was obtained from the 3 best discriminating items in common to ours and the previously reported Q-CHAT-10. This evidence confirms the high performance and cross-cultural validity of the Q-CHAT, and supports the application of ML to create shorter and faster versions of the instrument, maintaining high classification accuracy, to be used as a quick, easy, and high-performance tool in primary-care settings.

Download Full-text

Automatic generation of high-performance quantized machine learning kernels

Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization ◽

10.1145/3368826.3377912 ◽

2020 ◽

Cited By ~ 1

Author(s):

Meghan Cowan ◽

Thierry Moreau ◽

Tianqi Chen ◽

James Bornholt ◽

Luis Ceze

Keyword(s):

Machine Learning ◽

High Performance ◽

Automatic Generation

Download Full-text

Trigonometric Inference Providing Learning in Deep Neural Networks

Applied Sciences ◽

10.3390/app11156704 ◽

2021 ◽

Vol 11 (15) ◽

pp. 6704

Author(s):

Jingyong Cai ◽

Masashi Takemoto ◽

Yuming Qiu ◽

Hironori Nakajo

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Neural Networks ◽

Activation Function ◽

Trigonometric Approximation ◽

Model Parameters ◽

Training Algorithms ◽

Activation Functions ◽

Classical Training ◽

Sum Formula

Despite being heavily used in the training of deep neural networks (DNNs), multipliers are resource-intensive and insufficient in many different scenarios. Previous discoveries have revealed the superiority when activation functions, such as the sigmoid, are calculated by shift-and-add operations, although they fail to remove multiplications in training altogether. In this paper, we propose an innovative approach that can convert all multiplications in the forward and backward inferences of DNNs into shift-and-add operations. Because the model parameters and backpropagated errors of a large DNN model are typically clustered around zero, these values can be approximated by their sine values. Multiplications between the weights and error signals are transferred to multiplications of their sine values, which are replaceable with simpler operations with the help of the product to sum formula. In addition, a rectified sine activation function is utilized for further converting layer inputs into sine values. In this way, the original multiplication-intensive operations can be computed through simple add-and-shift operations. This trigonometric approximation method provides an efficient training and inference alternative for devices with insufficient hardware multipliers. Experimental results demonstrate that this method is able to obtain a performance close to that of classical training algorithms. The approach we propose sheds new light on future hardware customization research for machine learning.

Download Full-text

An IoT-Focused Intrusion Detection System Approach Based on Preprocessing Characterization for Cybersecurity Datasets

Sensors ◽

10.3390/s21020656 ◽

2021 ◽

Vol 21 (2) ◽

pp. 656

Author(s):

Xavier Larriva-Novo ◽

Víctor A. Villagrá ◽

Mario Vega-Barbas ◽

Diego Rivera ◽

Mario Sanz Rodrigo

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

High Performance ◽

Learning Algorithm ◽

Detection System ◽

Machine Learning Algorithms ◽

Statistical Characteristics ◽

Detection Techniques ◽

Traffic Characteristics ◽

Benchmark Datasets

Security in IoT networks is currently mandatory, due to the high amount of data that has to be handled. These systems are vulnerable to several cybersecurity attacks, which are increasing in number and sophistication. Due to this reason, new intrusion detection techniques have to be developed, being as accurate as possible for these scenarios. Intrusion detection systems based on machine learning algorithms have already shown a high performance in terms of accuracy. This research proposes the study and evaluation of several preprocessing techniques based on traffic categorization for a machine learning neural network algorithm. This research uses for its evaluation two benchmark datasets, namely UGR16 and the UNSW-NB15, and one of the most used datasets, KDD99. The preprocessing techniques were evaluated in accordance with scalar and normalization functions. All of these preprocessing models were applied through different sets of characteristics based on a categorization composed by four groups of features: basic connection features, content characteristics, statistical characteristics and finally, a group which is composed by traffic-based features and connection direction-based traffic characteristics. The objective of this research is to evaluate this categorization by using various data preprocessing techniques to obtain the most accurate model. Our proposal shows that, by applying the categorization of network traffic and several preprocessing techniques, the accuracy can be enhanced by up to 45%. The preprocessing of a specific group of characteristics allows for greater accuracy, allowing the machine learning algorithm to correctly classify these parameters related to possible attacks.

Download Full-text