Estimation of groundwater CO2 concentrations on a catchment scale using Random Forest

Author(s):  
Stefan Baltruschat ◽  
Steffen Bender ◽  
Jens Hartmann ◽  
Annika Nolte

<p>Water-rock-interactions in the saturated and unsaturated zone govern the natural variability of CO<sub>2</sub> in groundwater. However, anthropogenic pollutions such as excessive input of organic and inorganic fertilizers or sewage leakage can cause shifts in the carbonate-pH system in an aquifer. Additional dissolution of minerals and associated mobilization of harmful heavy metals are possible consequences. Anthropogenic groundwater pollution is especially an issue where a protective confining layer is absent. On the other hand, addressing an environmental hazard such as fertilizer input to a single parameter remain intricate due to the high number of possible competing reactions such as microbial-controlled redox reactions. To overcome these obstacles, machine learning based statistical methods become increasingly important.</p><p>This study attempt to predict the CO<sub>2 </sub>concentration in groundwater from a multi-feature selection by using Random Forest. For this purpose, groundwater chemistry data (in situ measured bulk parameter, major ions, nutrients, trace elements and more) from more than 23000 wells and springs in Germany were collected and homogenized in a single database. Measured or calculated CO<sub>2 </sub>concentrations<sub></sub>are used to train the Random Forest algorithm and later to validate model results. Beside chemistry data, features about hydrogeology, soil characteristics, land use land cover and climate factors serve as predictors to build the “forest”. The intention of this study is to establish comprehensive CO<sub>2 </sub>predictions based on surface and climate features and to identify trends in local CO<sub>2 </sub>production. Gained knowledge can be used as input for groundwater quality management processes and adaptation policies.</p>

2012 ◽  
Vol 43 (4) ◽  
pp. 531-546 ◽  
Author(s):  
A. Alaoui ◽  
P. Spiess ◽  
M. Beyeler ◽  
R. Weingartner

The main aims of this study were to identify and characterize the flow processes at the plot scale, and to up-scale these processes at the catchment scale by Terrain Analysis, using Digital Elevation Models (TauDEMs) based on in-situ sprinkling experiments. To calibrate the TauDEM-based method at the plot scale, in-situ sprinkling experiments were carried out on two plot scales (16 m2 divided into 16 plots of 1 m2 on various slopes). The marked differences in the textural and structural porosities between forest and grassland soil appear to control runoff processes. While grassland soils were characterized by a variable subsurface flow depending mainly on field slope, deep percolation was mainly found in forest soils. In addition, the map of flow directions also shows that two factors play an important role: on the one hand, the spatial sequence of the areas with a predisposition to surface runoff, and on the other, the tortuosity and length of channels that enhance the cumulative water volume in the target outlets. When based on sprinkling experiments, the TauDEM-based method provides more quantitative information on the dynamic of flow at the catchment scale. Furthermore, additional investigations are needed to validate the calculations of flow at a larger scale.


2018 ◽  
Vol 10 (9) ◽  
pp. 1394 ◽  
Author(s):  
Emile Ndikumana ◽  
Dinh Ho Tong Minh ◽  
Hai Dang Nguyen ◽  
Nicolas Baghdadi ◽  
Dominique Courault ◽  
...  

The research and improvement of methods to be used for crop monitoring are currently major challenges, especially for radar images due to their speckle noise nature. The European Space Agency’s (ESA) Sentinel-1 constellation provides synthetic aperture radar (SAR) images coverage with a 6-day revisit period at a high spatial resolution of pixel spacing of 20 m. Sentinel-1 data are considerably useful, as they provide valuable information of the vegetation cover. The objective of this work is to study the capabilities of multitemporal radar images for rice height and dry biomass retrievals using Sentinel-1 data. To do this, we train Sentinel-1 data against ground measurements with classical machine learning techniques (Multiple Linear Regression (MLR), Support Vector Regression (SVR) and Random Forest (RF)) to estimate rice height and dry biomass. The study is carried out on a multitemporal Sentinel-1 dataset acquired from May 2017 to September 2017 over the Camargue region, southern France. The ground in-situ measurements were made in the same period to collect rice height and dry biomass over 11 rice fields. The images were processed in order to produce a radar stack in C-band including dual-polarization VV (Vertical receive and Vertical transmit) and VH (Vertical receive and Horizontal transmit) data. We found that non-parametric methods (SVR and RF) had a better performance over the parametric MLR method for rice biophysical parameter retrievals. The accuracy of rice height estimation showed that rice height retrieval was strongly correlated to the in-situ rice height from dual-polarization, in which Random Forest yielded the best performance with correlation coefficient R 2 = 0.92 and the root mean square error (RMSE) 16% (7.9 cm). In addition, we demonstrated that the correlation of Sentinel-1 signal to the biomass was also very high in VH polarization with R 2 = 0.9 and RMSE = 18% (162 g·m − 2 ) (with Random Forest method). Such results indicate that the highly qualified Sentinel-1 radar data could be well exploited for rice biomass and height retrieval and they could be used for operational tasks.


2019 ◽  
Vol 11 (14) ◽  
pp. 1719 ◽  
Author(s):  
Jiaxin Mi ◽  
Yongjun Yang ◽  
Shaoliang Zhang ◽  
Shi An ◽  
Huping Hou ◽  
...  

Understanding the changes in a land use/land cover (LULC) is important for environmental assessment and land management. However, tracking the dynamic of LULC has proved difficult, especially in large-scale underground mining areas with extensive LULC heterogeneity and a history of multiple disturbances. Additional research related to the methods in this field is still needed. In this study, we tracked the LULC change in the Nanjiao mining area, Shanxi Province, China between 1987 and 2017 via random forest classifier and continuous Landsat imagery, where years of underground mining and reforestation projects have occurred. We applied a Savitzky–Golay filter and a normalized difference vegetation index (NDVI)-based approach to detect the temporal and spatial change, respectively. The accuracy assessment shows that the random forest classifier has a good performance in this heterogeneous area, with an accuracy ranging from 81.92% to 86.6%, which is also higher than that via support vector machine (SVM), neural network (NN), and maximum likelihood (ML) algorithm. LULC classification results reveal that cultivated forest in the mining area increased significantly after 2004, while the spatial extent of natural forest, buildings, and farmland decreased significantly after 2007. The areas where vegetation was significantly reduced were mainly because of the transformation from natural forest and shrubs into grasslands and bare lands, respectively, whereas the areas with an obvious increase in NDVI were mainly because of the conversion from grasslands and buildings into cultivated forest, especially when villages were abandoned after mining subsidence. A partial correlation analysis demonstrated that the extent of LULC change was significantly related to coal production and reforestation, which indicated the effects of underground mining and reforestation projects on LULC changes. This study suggests that continuous Landsat classification via random forest classifier could be effective in monitoring the long-term dynamics of LULC changes, and provide crucial information and data for the understanding of the driving forces of LULC change, environmental impact assessment, and ecological protection planning in large-scale mining areas.


2020 ◽  
Author(s):  
Luca Zappa ◽  
Matthias Forkel ◽  
Angelika Xaver ◽  
Wouter Dorigo

<p>Remotely sensed data from microwave sensors have been successfully used to retrieve soil moisture on a global scale. In particular, passive and active microwave sensors with large footprints can observe the same location with a (sub-)daily frequency, but typically are characterized by spatial resolutions in the order of tens of km. Therefore, such coarse scale products can accurately capture the temporal dynamics of soil moisture but are inadequate in providing spatial details. However, several agricultural and hydrological applications could greatly benefit from soil moisture observations with a sub-kilometer spatial resolution while preserving a daily revisit time.</p><p>Here, we present a framework for downscaling coarse resolution satellite soil moisture products (ASCAT and SMAP) to high spatial resolution. In particular, we build robust relationships between remotely sensed soil moisture and ancillary variables on soil texture, topography, and vegetation cover. Such relationship is built through Random Forest regressions, trained against in-situ measurements of soil moisture. The proposed approach is developed and tested in an agricultural catchment equipped with a high-density network of in-situ sensors. Our results show a strong consistency between the downscaled and the observed spatio-temporal patterns of soil moisture. Furthermore, including a proxy of vegetation cover in the Random Forest regressions results in considerable improvements of the downscaling performance. Finally, if only limited training data can be used, priority should be given to increase the number of sensor locations to adequately cover the spatial heterogeneity, rather than expanding the duration of the measurements. </p><p>Future research will focus on including additional ancillary variables as model predictors, e.g. Land Surface Temperature or backscatter, and on applying the downscaling framework to other regions with similar environmental and climatic conditions.</p>


Author(s):  
Kuncoro Teguh Setiawan ◽  
Nana Suwargana ◽  
Devica Natalia Br. Ginting ◽  
Masita Dwi Mandini Manessa ◽  
Nanin Anggraini ◽  
...  

The scope of this research is the application of the random forest method to SPOT 7 data to produce bathymetry information for shallow waters in Indonesia. The study aimed to analyze the effect of base objects in shallow marine habitats on estimating bathymetry from SPOT 7 satellite imagery. SPOT 7 satellite imagery of the shallow sea waters of Gili Matra, West Nusa Tenggara Province was used in this research. The estimation of bathymetry was carried out using two in-situ depth-data modifications, in the form of a random forest algorithm used both without and with benthic habitats (coral reefs, seagrass, macroalgae, and substrates). For bathymetry estimation from SPOT 7 data, the first modification (without benthic habitats) resulted in a 90.2% coefficient of determination (R2) and 1.57 RMSE, while the second modification (with benthic habitats) resulted in an 85.3% coefficient of determination (R2) and 2.48 RMSE. This research showed that the first modification achieved slightly better results than the second modification; thus, the benthic habitat did not significantly influence bathymetry estimation from SPOT 7 imagery.


Author(s):  
Tania MIHAIESCU ◽  
Manuela CUC ◽  
Mihnea Andrei MIHAIESCU

At the center of the Transyvanian Plateau lies Ocna Sibiului, with its salt deposits which, in places,  run  up  to  one  thousand  meters  below  the  surface.  The  presence  of  these  deposits has  been favored, due the extraction of salt throughout the years causing the formation of the now famous salt lakes. Due to different environmental conditions, lacustrine cuvettes and water quality differ from one lake  to  another. The  studied  lakes  are  considered  to  be  in  conection  with  the  salt  deposits  exposed through  minig  activities  carried  on  in  different time periods.  Other  lakes  in  the  area  already are isolated from salt deposits due to natural sedimentation proceses. The study adresses the lakes which still  present  a  high  degree of  salinity  and  are  used for  balnear  purposes,  and  present,  due  to  various factors, variation of physical-chemical composition throughout the year. Water samples were collected from  the  main  six  lakes  in  Ocna  Sibiului  (Ocniţa -  Avram  Iancu;  Rândunica;  Negru;  Fără  Fund; Brâncoveanu  and  Gura  Minei). Surveys  were  carried  out  in four  periods  (March-November)  during 2012.Water  temperature,  electrical  conductivity  (at  25°C)  and  pH  were  measured in-situ. The water pH  changed from  a  lake  to  another.  Generally  it  is  within  the  range of  6.78  to  8.8  highlighting  the neutral  to  slightly  alkaline  lake  water.  Salt  lake  water  conductivity  values,  in  the  upper  layers  vary within  a  wide  range,  from  more  than  200  mS cm-1 to 45.7 mS cm-1, with  different  values from the different  lacustrine  units. The  salt  water  lakes are  characterized  by  high  content  of  sodium  and chlorine ions. The other major ions are present in a small amount. This paper aims to compare the past and present evolution, characteristics and environment of these lakes, , and, based on this and current research, to project a vision of their future and how it will affect the lives of future generations living besides them.


2021 ◽  
Author(s):  
Nwamaka Okafor

IoT sensors are gaining more popularity in the environmental monitoring space due to their relatively small size, cost of acquisition and ease of installation and operation. They are becoming increasingly important<br>supplement to traditional monitoring systems, particularly for in-situ based monitoring. However, data collection based on IoT sensors are often plagued with missing values usually occurring as a result of sensor faults, network failures, drifts and other operational issues. Several imputation strategies have been proposed for handling missing values in various application domains. This paper examines the performance of different imputation techniques including Multiple Imputation by Chain Equations (MICE), Random forest based imputation (missForest) and K-Nearest Neighbour (KNN) for handling missing values on sensor networks deployed for the quantification of Green House Gases(GHGs). Two tasks were conducted: first, Ozone (O3) and NO2/O3 concentration data collected using Aeroqual and Cairclip sensors respectively over a six months data collection period were corrupted by removing data intervals at different missing periods (p) where p 2 f1day; 1week; 2weeks; 1monthg and also at random points on the dataset at varying proportion (r) where r 2 f5%; 10%; 30%; 50%; 70%g. The missing data were then filled using the different imputation strategies and their imputation accuracy calculated. Second, the performance of sensor calibration by different regression models including Multi Linear Regression (MLR), Decision Tree (DT), Random Forest (RF) and XGBoost (XGB) trained on the different imputed datasets were evaluated. The analysis showed the MICE technique to outperform the others in imputing the missing values on both the O3 and NO2/O3 datasets when missingness was introduced over periods p. MissForest, however, outperformed the rest when missingness was introduced as randomly occuring point errors. While the analysis demonstrated the effects of missing and imputed data on sensor calibration, experimental results showed that a simple model on the imputed dataset can achieve state of-the-art result on in-situ sensor calibration, improving the data quality of the sensor.


2021 ◽  
Vol 13 (17) ◽  
pp. 3342
Author(s):  
Marcel Urban ◽  
Konstantin Schellenberg ◽  
Theunis Morgenthal ◽  
Clémence Dubois ◽  
Andreas Hirner ◽  
...  

Increasing woody cover and overgrazing in semi-arid ecosystems are known to be the major factors driving land degradation. This study focuses on mapping the distribution of the slangbos shrub (Seriphium plumosum) in a test region in the Free State Province of South Africa. The goal of this study is to monitor the slangbos encroachment on cultivated land by synergistically combining Synthetic Aperture Radar (SAR) (Sentinel-1) and optical (Sentinel-2) Earth observation information. Both optical and radar satellite data are sensitive to different vegetation properties and surface scattering or reflection mechanisms caused by the specific sensor characteristics. We used a supervised random forest classification to predict slangbos encroachment for each individual crop year between 2015 and 2020. Training data were derived based on expert knowledge and in situ information from the Department of Agriculture, Land Reform and Rural Development (DALRRD). We found that the Sentinel-1 VH (cross-polarization) and Sentinel-2 SAVI (Soil Adjusted Vegetation Index) time series information have the highest importance for the random forest classifier among all input parameters. The modelling results confirm the in situ observations that pastures are most affected by slangbos encroachment. The estimation of the model accuracy was accomplished via spatial cross-validation (SpCV) and resulted in a classification precision of around 80% for the slangbos class within each time step.


2018 ◽  
Vol 10 (9) ◽  
pp. 1393 ◽  
Author(s):  
Nicole DeLuca ◽  
Benjamin Zaitchik ◽  
Frank Curriero

Total suspended solids (TSS) is an important environmental parameter to monitor in the Chesapeake Bay due to its effects on submerged aquatic vegetation, pathogen abundance, and habitat damage for other aquatic life. Chesapeake Bay is home to an extensive and continuous network of in situ water quality monitoring stations that include TSS measurements. Satellite remote sensing can address the limited spatial and temporal extent of in situ sampling and has proven to be a valuable tool for monitoring water quality in estuarine systems. Most algorithms that derive TSS concentration in estuarine environments from satellite ocean color sensors utilize only the red and near-infrared bands due to the observed correlation with TSS concentration. In this study, we investigate whether utilizing additional wavelengths from the Moderate Resolution Imaging Spectroradiometer (MODIS) as inputs to various statistical and machine learning models can improve satellite-derived TSS estimates in the Chesapeake Bay. After optimizing the best performing multispectral model, a Random Forest regression, we compare its results to those from a widely used single-band algorithm for the Chesapeake Bay. We find that the Random Forest model modestly outperforms the single-band algorithm on a holdout cross-validation dataset and offers particular advantages under high TSS conditions. We also find that both methods are similarly generalizable throughout various partitions of space and time. The multispectral Random Forest model is, however, more data intensive than the single band algorithm, so the objectives of the application will ultimately determine which method is more appropriate.


Sign in / Sign up

Export Citation Format

Share Document