scholarly journals Examining the Performance of PARACUDA-II Data-Mining Engine versus Selected Techniques to Model Soil Carbon from Reflectance Spectra

2018 ◽  
Vol 10 (8) ◽  
pp. 1172 ◽  
Author(s):  
Asa Gholizadeh ◽  
Mohammadmehdi Saberioon ◽  
Nimrod Carmon ◽  
Lubos Boruvka ◽  
Eyal Ben-Dor

The monitoring and quantification of soil carbon provide a better understanding of soil and atmosphere dynamics. Visible-near-infrared-short-wave infrared (VIS-NIR-SWIR) reflectance spectroscopy can quantitatively estimate soil carbon content more rapidly and cost-effectively compared to traditional laboratory analysis. However, effective estimation of soil carbon using reflectance spectroscopy to a great extent depends on the selection of a suitable preprocessing sequence and data-mining algorithm. Many efforts have been dedicated to the comparison of conventional chemometric techniques and their optimization for soil properties prediction. Instead, the current study focuses on the potential of the new data-mining engine PARACUDA-II®, recently developed at Tel-Aviv University (TAU), by comparing its performance in predicting soil oxidizable carbon (Cox) against common data-mining algorithms including partial least squares regression (PLSR), random forests (RF), boosted regression trees (BRT), support vector machine regression (SVMR), and memory based learning (MBL). To this end, 103 soil samples from the Pokrok dumpsite in the Czech Republic were scanned with an ASD FieldSpec III Pro FR spectroradiometer in the laboratory under a strict protocol. Spectra preprocessing for conventional data-mining techniques was conducted using Savitzky-Golay smoothing and the first derivative method. PARACUDA-II®, on the other hand, operates based on the all possibilities approach (APA) concept, a conditional Latin hypercube sampling (cLHs) algorithm and parallel programming, to evaluate all of the potential combinations of eight different spectral preprocessing techniques against the original reflectance and chemical data prior to the model development. The comparison of results was made in terms of the coefficient of determination (R2) and root-mean-square error of prediction (RMSEp). Results showed that the PARACUDA-II® engine performed better than the other selected regular schemes with R2 value of 0.80 and RMSEp of 0.12; the PLSR was less predictive compared to other techniques with R2 = 0.63 and RMSEp = 0.29. This can be attributed to its capability to assess all the available options in an automatic way, which enables the hidden models to rise up and yield the best available model.

2009 ◽  
Vol 89 (5) ◽  
pp. 579-587 ◽  
Author(s):  
C Nduwamungu ◽  
N Ziadi ◽  
L -É Parent ◽  
G F Tremblay

Near-infrared reflectance spectroscopy (NIRS) is a cost-effective and environmentally friendly technique of soil analysis that is particularly advantageous in intensive soil sampling and soil nutrient management as well. This study evaluated the potential of NIRS for predicting P, K, Ca, Mg, Cu, Zn, Mn, Fe, and Al extracted by Mehlich 3. We used 150 air-dried samples collected from a 15-ha site dominated by Orthic Humic Gleysol and Gleyed Dystric Brunisol soils. Calibration equations were developed using modified partial least squares regression. The accuracy of NIRS prediction was evaluated using the coefficient of determination (R2), the ratio of performance deviation (RPD), and the ratio of error range (RER). Reliable calibrations were found for Ca, Cu, and Mg (R2 ≥ 0.7, RPD ≥ 1.75, and RER ≥ 8). Less-reliable calibrations were found for Al, Fe, K, Mn, P, and Zn (R2 < 0.7, RPD < 1.75, and RER < 8). In the validation with independent samples, acceptable regression coefficients (i.e., 0.8 ≤ slope ≤ 1.2) were only found for Ca, Mg, and Mn. We presumed that the pH of the Mehlich 3 extractant (2.5 ± 0.1) may affect the solubility of most of these nutrients, regardless the soil texture and, consequently, the potential of NIRS to predict them. The more a nutrient was correlated to clay content, the more it was likely predictable by NIRS. The prediction models obtained for Al, Ca, Cu, Fe, K, Mg, and Mn could still be used for screening purposes in cases where high accuracy is not required. These NIRS prediction models should be validated across larger geographic areas of geological homogeneity. Key words: Soil analysis, Mehlich 3, near-infrared reflectance spectroscopy, calibration


2013 ◽  
Vol 295-298 ◽  
pp. 644-647 ◽  
Author(s):  
Yu Kai Yao ◽  
Hong Mei Cui ◽  
Ming Wei Len ◽  
Xiao Yun Chen

SVM (Support Vector Machine) is a powerful data mining algorithm, and is mainly used to finish classification or regression tasks. In this literature, SVM is used to conduct disease prediction. We focus on integrating with stratified sample and grid search technology to improve the classification accuracy of SVM, thus, we propose an improved algorithm named SGSVM: Stratified sample and Grid search based SVM. To testify the performance of SGSVM, heart-disease data from UCI are used in our experiment, and the results show SGSVM has obvious improvement in classification accuracy, and this is very valuable especially in disease prediction.


Molecules ◽  
2019 ◽  
Vol 24 (3) ◽  
pp. 428 ◽  
Author(s):  
Verena Wiedemair ◽  
Dominik Langore ◽  
Roman Garsleitner ◽  
Klaus Dillinger ◽  
Christian Huck

The performance of a newly developed pocket-sized near-infrared (NIR) spectrometer was investigated by analysing 46 cheese samples for their water and fat content, and comparing results with a benchtop NIR device. Additionally, the automated data analysis of the pocket-sized spectrometer and its cloud-based data analysis software, designed for laypeople, was put to the test by comparing performances to a highly sophisticated multivariate data analysis software. All developed partial least squares regression (PLS-R) models yield a coefficient of determination (R2) of over 0.9, indicating high correlation between spectra and reference data for both spectrometers and all data analysis routes taken. In general, the analysis of grated cheese yields better results than whole pieces of cheese. Additionally, the ratios of performance to deviation (RPDs) and standard errors of prediction (SEPs) suggest that the performance of the pocket-sized spectrometer is comparable to the benchtop device. Small improvements are observable, when using sophisticated data analysis software, instead of automated tools.


Author(s):  
Xuelong Zhang

With the advent of the era of big data, people are eager to extract valuable knowledge from the rapidly expanding data, so that they can more effectively use these massive storage data. The traditional data processing technology can only achieve basic functions such as data query and statistics, and cannot achieve the goal of extracting the knowledge existing in the data to predict the future trend. Therefore, along with the rapid development of database technology and the rapid improvement of computer’s computing power, data mining (DM) came into existence. Research on DM algorithms includes knowledge of various fields such as database, statistics, pattern recognition and artificial intelligence. Pattern recognition mainly extracts features of known data samples. The DM algorithm using pattern recognition technology is a better method to obtain effective information from massive data, thus providing decision support, and has a good application prospect. Support vector machine (SVM) is a new pattern recognition algorithm proposed in recent years, which avoids dimension disaster by dimensioning and linearization. Based on this, this paper studies the DM algorithm based on pattern recognition, and proposes a DM algorithm based on SVM. The algorithm divides the vector of the SV set into two different types and iterates through multiple iterations to obtain a classifier that converges to the final result. Finally, through the cross-validation simulation experiment, the results show that the DM algorithm based on pattern recognition can effectively reduce the training time and solve the mining problem of massive data. The results show that the algorithm has certain rationality and feasibility.


2002 ◽  
Vol 66 (2) ◽  
pp. 640-646 ◽  
Author(s):  
G. W. McCarty ◽  
J. B. Reeves ◽  
V. B. Reeves ◽  
R. F. Follett ◽  
J. M. Kimble

FLORESTA ◽  
2010 ◽  
Vol 40 (3) ◽  
Author(s):  
Paulo Ricardo Gherardi Hein ◽  
José Tarcísio Lima ◽  
Gilles Chaix Gilles Chaix

A espectroscopia no infravermelho próximo (NIRS) é uma técnica não-destrutiva, rápida e utilizada para avaliação, caracterização e classificação de materiais, sobretudo de origem biológica. A obtenção de informações contida nos espectros NIR é complexa e requer a utilização de métodos quimiométricos. Assim, por meio de regressão multivariada, os espectros de absorbância podem ser associados às propriedades da madeira, tornando possível a sua predição em amostras desconhecidas. Existem algumas ferramentas quimiométricas que melhoram o ajuste dos modelos preditivos. Assim, o objetivo deste trabalho foi simular regressões dos mínimos quadrados parciais baseados nas informações espectrais e de laboratório e estudar a influência da aplicação de tratamentos matemáticos, do descarte de amostras anômalas e da seleção de comprimentos de onda no ajuste dos modelos para estimativa da densidade básica e do módulo de elasticidade em ensaio de compressão paralela às fibras da madeira de Eucalyptus. A aplicação da primeira e segunda derivada nos espectros, o descarte de amostras anômalas e a seleção de algumas das variáveis espectrais melhorou significativamente o ajuste do modelo, reduzindo o erro padrão e aumentando o coeficiente de determinação e a relação de desempenho do desvio.Palavras-chave:  Espectroscopia no infravermelho próximo; predição; densidade básica; MOE; madeira; Eucalyptus. AbstractOptimization of calibrations based on near infrared spectroscopy for estimation of Eucalyptus wood properties. Near infrared spectroscopy (NIRS) is a non-destructive technique used for rapid evaluation, characterization and classification of biological materials. The extraction of the information contained in the NIR spectrum is complex and requires the use of chemo metric methods. Thus, by means of multivariate regression, the absorbance spectra are correlated to wood properties, making possible the prediction in unknown samples. There are some chemo metric tools that can improve the adjustment of the predictive models. The aim of this work was to simulate partial least squares regression based on NIR spectra and laboratory data and to study the influence of the application of mathematical treatment, the removal of outliers and the wavelengths selection in the adjustment of models to estimate the density and modulus of elasticity in Eucalyptus wood. The use of the first and second derivative spectra, the disposal of outliers, and the variables selection improved significantly the model fit, reducing the standard error and increasing the coefficient of determination and the ratio of performance to deviation.Keywords: Near infrared; spectroscopy; prediction; density; MOE; wood; Eucalyptus.


2020 ◽  
Vol 12 (4) ◽  
pp. 1476 ◽  
Author(s):  
Lei Han ◽  
Rui Chen ◽  
Huili Zhu ◽  
Yonghua Zhao ◽  
Zhao Liu ◽  
...  

Soil arsenic (AS) contamination has attracted a great deal of attention because of its detrimental effects on environments and humans. AS and inorganic AS compounds have been classified as a class of carcinogens by the World Health Organization. In order to select a high-precision method for predicting the soil AS content using hyperspectral techniques, we collected 90 soil samples from six different land use types to obtain the soil AS content by chemical analysis and hyperspectral data based on an indoor hyperspectral experiment. A partial least squares regression (PLSR), a support vector regression (SVR), and a back propagation neural network (BPNN) were used to establish a relationship between the hyperspectral and the soil AS content to predict the soil AS content. In addition, the feasibility and modeling accuracy of different interval spectral resampling, different spectral pretreatment methods, feature bands, and full-band were compared and discussed to explore the best inversion method for estimating soil AS content by hyperspectral. The results show that 10 nm + second derivative (SD) + BPNN is the optimum method to predict soil AS content estimation; R v 2 is 0.846 and residual predictive deviation (RPD) is 2.536. These results can expand the representativeness and practicability of the model to a certain extent and provide a scientific basis and technical reference for soil pollution monitoring.


Plant Methods ◽  
2020 ◽  
Vol 16 (1) ◽  
Author(s):  
Ilse E. Renner ◽  
Vincent A Fritz

Abstract Background Glucobrassicin (GBS) and its hydrolysis product indole-3-carbinol are important nutritional constituents implicated in cancer chemoprevention. Dietary consumption of vegetables sources of GBS, such as cabbage and Brussels sprouts, is linked to tumor suppression, carcinogen excretion, and cancer-risk reduction. High-performance liquid-chromatography (HPLC) is the current standard GBS identification method, and quantification is based on UV-light absorption in comparison to known standards or via mass spectrometry. These analytical techniques require expensive equipment, trained laboratory personnel, hazardous chemicals, and they are labor intensive. A rapid, nondestructive, inexpensive quantification method is needed to accelerate the adoption of GBS-enhancing production systems. Such an analytical method would allow producers to quantify the quality of their products and give plant breeders a high-throughput phenotyping tool to increase the scale of their breeding programs for high GBS-accumulating varieties. Near-infrared reflectance spectroscopy (NIRS) paired with partial least squares regression (PLSR) could be a useful tool to develop such a method. Results Here we demonstrate that GBS concentrations of freeze-dried tissue from a wide variety of cabbage and Brussels sprouts can be predicted using partial least squares regression from NIRS data generated from wavelengths between 950 and 1650 nm. Cross-validation models had R2 = 0.75 with RPD = 2.3 for predicting µmol GBS·100 g−1 fresh weight and R2 = 0.80 with RPD = 2.4 for predicting µmol GBS·g−1 dry weight. Inspections of equation loadings suggest the molecular associations used in modeling may be due to first overtones from O–H stretching and/or N–H stretching of amines. Conclusions A calibration model suitable for screening GBS concentration of freeze-dried leaf tissue using NIRS-generated data paired with PLSR can be created for cabbage and Brussels sprouts. Optimal NIRS wavelength ranges for calibration remain an open question.


Sign in / Sign up

Export Citation Format

Share Document