scholarly journals Prediction of soil classes in a complex landscape in Southern Brazil

Author(s):  
Jean Michel Moura-Bueno ◽  
Ricardo Simão Diniz Dalmolin ◽  
Taciara Zborowski Horst-Heinen ◽  
Luciano Campos Cancian ◽  
Ricardo Bergamo Schenato ◽  
...  

Abstract: The objective of this work was to evaluate the use of covariate selection by expert knowledge on the performance of soil class predictive models in a complex landscape, in order to identify the best predictive model for digital soil mapping in the Southern region of Brazil. A total of 164 points were sampled in the field using the conditioned Latin hypercube, considering the covariates elevation, slope, and aspect. From the digital elevation model, environmental covariates were extracted, composing three sets, made up of: 21 covariates, covariates after the exclusion of the multicollinear ones, and covariates chosen by expert knowledge. Prediction was performed with the following models: decision tree, random forest, multiple logistic regression, and support vector machine. The accuracy of the models was evaluated by the kappa index (K), general accuracy (GA), and class accuracy. The prediction models were sensitive to the disproportionate sampling of soil classes. The best predicted map achieved a GA of 71% and K of 0.59. The use of the covariate set chosen by expert knowledge improves model performance in predicting soil classes in a complex landscape, and random forest is the best model for the spatial prediction of soil classes.

2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Kerry E. Poppenberg ◽  
Vincent M. Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n = 94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n = 40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC) = 0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


2020 ◽  
Author(s):  
Kerry E Poppenberg ◽  
Vincent M Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background: Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods: Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n=94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n=40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 9 IA-associated genes was used to verify gene expression in a subset of 49 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results: Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC)=0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 7 of 9 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions: We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


2020 ◽  
Author(s):  
Kerry E Poppenberg ◽  
Vincent M Tutino ◽  
Lu Li ◽  
Muhammad Waqas ◽  
Armond June ◽  
...  

Abstract Background Intracranial aneurysms (IAs) are dangerous because of their potential to rupture. We previously found significant RNA expression differences in circulating neutrophils between patients with and without unruptured IAs and trained machine learning models to predict presence of IA using 40 neutrophil transcriptomes. Here, we aim to develop a predictive model for unruptured IA using neutrophil transcriptomes from a larger population and more robust machine learning methods. Methods Neutrophil RNA extracted from the blood of 134 patients (55 with IA, 79 IA-free controls) was subjected to next-generation RNA sequencing. In a randomly-selected training cohort (n=94), the Least Absolute Shrinkage and Selection Operator (LASSO) selected transcripts, from which we constructed prediction models via 4 well-established supervised machine-learning algorithms (K-Nearest Neighbors, Random Forest, and Support Vector Machines with Gaussian and cubic kernels). We tested the models in the remaining samples (n=40) and assessed model performance by receiver-operating-characteristic (ROC) curves. Real-time quantitative polymerase chain reaction (RT-qPCR) of 10 IA-associated genes was used to verify gene expression in a subset of 50 neutrophil RNA samples. We also examined the potential influence of demographics and comorbidities on model prediction. Results Feature selection using LASSO in the training cohort identified 37 IA-associated transcripts. Models trained using these transcripts had a maximum accuracy of 90% in the testing cohort. The testing performance across all methods had an average area under ROC curve (AUC)=0.97, an improvement over our previous models. The Random Forest model performed best across both training and testing cohorts. RT-qPCR confirmed expression differences in 8 of 10 genes tested. Gene ontology and IPA network analyses performed on the 37 model genes reflected dysregulated inflammation, cell signaling, and apoptosis processes. In our data, demographics and comorbidities did not affect model performance. Conclusions We improved upon our previous IA prediction models based on circulating neutrophil transcriptomes by increasing sample size and by implementing LASSO and more robust machine learning methods. Future studies are needed to validate these models in larger cohorts and further investigate effect of covariates.


2020 ◽  
Vol 10 (24) ◽  
pp. 9151
Author(s):  
Yun-Chia Liang ◽  
Yona Maimury ◽  
Angela Hsiang-Ling Chen ◽  
Josue Rodolfo Cuevas Juarez

Air, an essential natural resource, has been compromised in terms of quality by economic activities. Considerable research has been devoted to predicting instances of poor air quality, but most studies are limited by insufficient longitudinal data, making it difficult to account for seasonal and other factors. Several prediction models have been developed using an 11-year dataset collected by Taiwan’s Environmental Protection Administration (EPA). Machine learning methods, including adaptive boosting (AdaBoost), artificial neural network (ANN), random forest, stacking ensemble, and support vector machine (SVM), produce promising results for air quality index (AQI) level predictions. A series of experiments, using datasets for three different regions to obtain the best prediction performance from the stacking ensemble, AdaBoost, and random forest, found the stacking ensemble delivers consistently superior performance for R2 and RMSE, while AdaBoost provides best results for MAE.


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Lei Li ◽  
Desheng Wu

PurposeThe infraction of securities regulations (ISRs) of listed firms in their day-to-day operations and management has become one of common problems. This paper proposed several machine learning approaches to forecast the risk at infractions of listed corporates to solve financial problems that are not effective and precise in supervision.Design/methodology/approachThe overall proposed research framework designed for forecasting the infractions (ISRs) include data collection and cleaning, feature engineering, data split, prediction approach application and model performance evaluation. We select Logistic Regression, Naïve Bayes, Random Forest, Support Vector Machines, Artificial Neural Network and Long Short-Term Memory Networks (LSTMs) as ISRs prediction models.FindingsThe research results show that prediction performance of proposed models with the prior infractions provides a significant improvement of the ISRs than those without prior, especially for large sample set. The results also indicate when judging whether a company has infractions, we should pay attention to novel artificial intelligence methods, previous infractions of the company, and large data sets.Originality/valueThe findings could be utilized to address the problems of identifying listed corporates' ISRs at hand to a certain degree. Overall, results elucidate the value of the prior infraction of securities regulations (ISRs). This shows the importance of including more data sources when constructing distress models and not only focus on building increasingly more complex models on the same data. This is also beneficial to the regulatory authorities.


2020 ◽  
Author(s):  
Zhanyou Xu ◽  
Andreomar Kurek ◽  
Steven B. Cannon ◽  
Williams D. Beavis

AbstractSelection of markers linked to alleles at quantitative trait loci (QTL) for tolerance to Iron Deficiency Chlorosis (IDC) has not been successful. Genomic selection has been advocated for continuous numeric traits such as yield and plant height. For ordinal data types such as IDC, genomic prediction models have not been systematically compared. The objectives of research reported in this manuscript were to evaluate the most commonly used genomic prediction method, ridge regression and it’s equivalent logistic ridge regression method, with algorithmic modeling methods including random forest, gradient boosting, support vector machine, K-nearest neighbors, Naïve Bayes, and artificial neural network using the usual comparator metric of prediction accuracy. In addition we compared the methods using metrics of greater importance for decisions about selecting and culling lines for use in variety development and genetic improvement projects. These metrics include specificity, sensitivity, precision, decision accuracy, and area under the receiver operating characteristic curve. We found that Support Vector Machine provided the best specificity for culling IDC susceptible lines, while Random Forest GP models provided the best combined set of decision metrics for retaining IDC tolerant and culling IDC susceptible lines.


2021 ◽  
Author(s):  
Eve daly ◽  
David O Leary

<p>Peatlands are becoming recognized as important carbon sequestration centres. Through restoration projects of peatlands in which the water table is raised, they may become carbon neutral or possibly carbon negative. Restoration projects require a knowledge of intra-peat variation across potentially large spatial areas. This is often difficult with traditional in-situ point measurements. The integration of multidimensional geophysical datasets and digital elevation models, combined with modern data analytical techniques, may provide a rapid means of accessing intra-peat variation. In this study, an airborne radiometric survey, being flown nationally over the Republic of Ireland, combined with a digital elevation model, is used to delineate areas within an industrial peatland where peat thickness is less than 1m. Radiometric data are particularly suited to peat studies as they are sensitive to water content and peat thickness and require relatively little expert knowledge to utilise. Peat, as a mostly organic material, acts as a low signal environment where variations in the signal are linked to intra-peat variation of thickness, density and/or water content. This study uses an unsupervised machine learning, self-organizing map clustering methodology to group the study site into three zones interpreted as 1) the edge of the bog where peat layer is thinning or there is influence on the radiometric signal from non-peat soils outside of the bog, 2) the normal peat conditions where thickness and saturation appear as a relative constant in the radiometric response, and 3) areas where the peat is either thinner or drier. A ground geophysical survey was conducted to verify this interpretation. The delineation of such spatial variations in the radiometric response could aid any restoration project in the initial stages or act as a baseline study to monitor changes to the peatland during and after a restoration project is complete. Future work will see this methodology extended to other peatland types such as blanket bogs and natural raised bogs, as well as the integration of concurrent airborne electromagnetic data to link the near-surface radiometric response to the deeper vadose zone and define a more comprehensive classification scheme for these peatland sites.</p>


Water ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 2160
Author(s):  
Daniel Kibirige ◽  
Endre Dobos

Soil moisture (SM) is a key variable in the climate system and a key parameter in earth surface processes. This study aimed to test the citizen observatory (CO) data to develop a method to estimate surface SM distribution using Sentinel-1B C-band Synthetic Aperture Radar (SAR) and Landsat 8 data; acquired between January 2019 and June 2019. An agricultural region of Tard in western Hungary was chosen as the study area. In situ soil moisture measurements in the uppermost 10 cm were carried out in 36 test fields simultaneously with SAR data acquisition. The effects of environmental covariates and the backscattering coefficient on SM were analyzed to perform SM estimation procedures. Three approaches were developed and compared for a continuous four-month period, using multiple regression analysis, regression-kriging and cokriging with the digital elevation model (DEM), and Sentinel-1B C-band and Landsat 8 images. CO data were evaluated over the landscape by expert knowledge and found to be representative of the major SM distribution processes but also presenting some indifferent short-range variability that was difficult to explain at this scale. The proposed models were evaluated using statistical metrics: The coefficient of determination (R2) and root mean square error (RMSE). Multiple linear regression provides more realistic spatial patterns over the landscape, even in a data-poor environment. Regression kriging was found to be a potential tool to refine the results, while ordinary cokriging was found to be less effective. The obtained results showed that CO data complemented with Sentinel-1B SAR, Landsat 8, and terrain data has the potential to estimate and map soil moisture content.


2020 ◽  
Vol 12 (17) ◽  
pp. 2767
Author(s):  
Yu Chen ◽  
Yongming Wei ◽  
Qinjun Wang ◽  
Fang Chen ◽  
Chunyan Lu ◽  
...  

A serious earthquake could trigger thousands of landslides and produce some slopes more sensitive to slide in future. Landslides could threaten human’s lives and properties, and thus mapping the post-earthquake landslide susceptibility is very valuable for a rapid response to landslide disasters in terms of relief resource allocation and posterior earthquake reconstruction. Previous researchers have proposed many methods to map landslide susceptibility but seldom considered the spatial structure information of the factors that influence a slide. In this study, we first developed a U-net like model suitable for mapping post-earthquake landslide susceptibility. The post-earthquake high spatial airborne images were used for producing a landslide inventory. Pre-earthquake Landsat TM (Thematic Mapper) images and the influencing factors such as digital elevation model (DEM), slope, aspect, multi-scale topographic position index (mTPI), lithology, fault, road network, streams network, and macroseismic intensity (MI) were prepared as the input layers of the model. Application of the model to the heavy-hit area of the destructive 2008 Wenchuan earthquake resulted in a high validation accuracy (precision 0.77, recall 0.90, F1 score 0.83, and AUC 0.90). The performance of this U-net like model was also compared with those of traditional logistic regression (LR) and support vector machine (SVM) models on both the model area and independent testing area with the former being stronger than the two traditional models. The U-net like model introduced in this paper provides us the inspiration that balancing the environmental influence of a pixel itself and its surrounding pixels to perform a better landslide susceptibility mapping (LSM) task is useful and feasible when using remote sensing and GIS technology.


2011 ◽  
Vol 62 (1) ◽  
pp. 5-16 ◽  
Author(s):  
Sebastian Vogel ◽  
Michael Märker ◽  
Florian Seiler

Revised modelling of the post-AD 79 volcanic deposits of Somma-Vesuvius to reconstruct the pre-AD 79 topography of the Sarno River plain (Italy) In this study the methodology proposed by Vogel & Märker (2010) to reconstruct the pre-AD 79 topography and paleo-environmental features of the Sarno River plain (Italy) was considerably revised and improved. The methodology is based on an extensive dataset of stratigraphical information from the entire Sarno River plain, a high-resolution present-day digital elevation model (DEM) and a classification and regression tree approach. The dataset was re-evaluated and 32 additional stratigraphical drillings were collected in areas that were not or insufficiently covered by previous stratigraphic data. Altogether, an assemblage of 1,840 drillings, containing information about the depth from the present-day surface to the pre-AD 79 paleo-surface (thickness of post-AD 79 deposits) and the character of the pre-AD 79 paleo-layer of the Sarno River plain was utilized. Moreover, an improved preprocessing of the input parameters attained a distinct progress in model performance in comparison to the previous model of Vogel & Märker (2010). Subsequently, a spatial model of the post-AD 79 deposits was generated. The modelled deposits were then used to reconstruct the pre-AD 79 topography of the Sarno River plain. Moreover, paleo-environmental and paleo-geomorphological features such as the paleo-coastline, the paleo-Sarno River and its floodplain, alluvial fans near the Tyrrhenian coast as well as abrasion terraces of historical and protohistorical coastlines were identified. This reconstruction represents a qualitative improvement of the previous work by Vogel & Märker (2010).


Sign in / Sign up

Export Citation Format

Share Document