Soil properties: Their prediction and feature extraction from the LUCAS spectral library using deep convolutional neural networks

Geoderma ◽  
2021 ◽  
Vol 402 ◽  
pp. 115366
Author(s):  
Liang Zhong ◽  
Xi Guo ◽  
Zhe Xu ◽  
Meng Ding
2021 ◽  
Author(s):  
Liang Zhong ◽  
Xi Guo ◽  
Zhe Xu ◽  
Meng Ding

<p>Soil, as a non-renewable resource, should be monitored continuously to prevent its degradation and promote sustainable agricultural management. Soil spectroscopy in the visible-near infrared range is a fast and cost-effective analytical technique to predict soil properties. The use of large soil spectral libraries can reduce the work needed to reliably estimate soil properties and obtain robust models capable of widespread applicability. Deep learning is apt for big data analysis, and this approach could herald a profound change in the way we model soil spectral data generally. Accordingly, we explored the modeling potential of deep convolutional neural networks (DCNNs) for soil properties based on a large soil spectral library. The European topsoil dataset provided by the Land Use/Cover Area frame Survey (LUCAS) was used without any pre-processing of spectra or covariates added. Two 16-layer DCNN models (ResNet-16 and VGGNet-16) were successfully used to make regression predictions of seven soil properties and classification predictions of soil texture into four groups and 12 levels. Our results showed that the ResNet-16 and VGGNet-16 models produced highly accurate predictions for most soil properties, being superior to either a shallow convolutional neural network and traditional machine learning approaches. Soil organic carbon content, nitrogen content, cation exchange capacity, pH, and calcium carbonate content were well predicted, having a ratio of performance to deviation (RPD) > 2.0. Soil potassium content was adequately predicted (1.4 ≤ RPD ≤ 2.0) and phosphorous content was poorly predicted (RPD < 1.4). The overall classification accuracy of soil texture was 0.749 (four groups) and 0.566 (12 levels). The position of feature wavelengths differed among the soil properties, for which multiple characteristic peaks were common. This study fully demonstrates the modeling potential of deep learning with soil hyperspectral data, which could bring us closer to achieving precision agriculture.</p>


BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Yang-Ming Lin ◽  
Ching-Tai Chen ◽  
Jia-Ming Chang

Abstract Background Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. Results We propose MS2CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS2CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS2PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, MS2PIP is significantly better than both MS2PIP and pDeep. Conclusions We showed that MS2CNN outperforms MS2PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS2CNN, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.


Sign in / Sign up

Export Citation Format

Share Document