Application of machine learning techniques for regional bias correction of SWE estimates in Ontario, Canada

Abstract. Snow is a critical contributor to Ontario's water-energy budget with impacts on water resource management and flood forecasting. Snow water equivalent (SWE) describes the amount of water stored in a snowpack and is important in deriving estimates of snowmelt. However, only a limited number of sparsely distributed snow survey sites (n = 383) exist throughout Ontario. The SNOw Data Assimilation System (SNODAS) is a daily, 1 km gridded SWE product that provides uniform spatial coverage across this region; however, we show here that SWE estimates from SNODAS display a strong positive mean bias of 50 % (16 mm SWE) when compared to in situ observations from 2011 to 2018. This study evaluates multiple statistical techniques of varying complexity, including simple subtraction, linear regression and machine learning methods to bias-correct SNODAS SWE estimates using absolute mean bias and RMSE as evaluation criteria. Results show that the random forest (RF) algorithm is most effective at reducing bias in SNODAS SWE, with an absolute mean bias of 0.2 mm and RMSE of 3.64 mm when compared with in situ observations. Other methods, such as mean bias subtraction and linear regression, are somewhat effective at bias reduction however, only the RF method captures the nonlinearity in the bias, and its interannual variability. Applying the RF model to the full spatio-temporal domain shows that the SWE bias is largest before 2015, during the spring melt period, north of 44.5° N and East (downwind) of the Great Lakes. As an independent validation, we also compare estimated snowmelt volumes with observed hydrographs, and demonstrate that uncorrected SNODAS SWE is associated with unrealistically large volumes at the time of the spring freshet, while bias-corrected SWE values are highly consistent with observed discharge volumes.

Download Full-text

Application of machine learning techniques for regional bias correction of snow water equivalent estimates in Ontario, Canada

Hydrology and Earth System Sciences ◽

10.5194/hess-24-4887-2020 ◽

2020 ◽

Vol 24 (10) ◽

pp. 4887-4902

Author(s):

Fraser King ◽

Andre R. Erler ◽

Steven K. Frey ◽

Christopher G. Fletcher

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Water Resource Management ◽

Snow Water Equivalent ◽

Bias Reduction ◽

Machine Learning Techniques ◽

Spatial Coverage ◽

Snow Water ◽

In Situ Observations

Abstract. Snow is a critical contributor to Ontario's water-energy budget, with impacts on water resource management and flood forecasting. Snow water equivalent (SWE) describes the amount of water stored in a snowpack and is important in deriving estimates of snowmelt. However, only a limited number of sparsely distributed snow survey sites (n=383) exist throughout Ontario. The SNOw Data Assimilation System (SNODAS) is a daily, 1 km gridded SWE product that provides uniform spatial coverage across this region; however, we show here that SWE estimates from SNODAS display a strong positive mean bias of 50 % (16 mm SWE) when compared to in situ observations from 2011 to 2018. This study evaluates multiple statistical techniques of varying complexity, including simple subtraction, linear regression and machine learning methods to bias-correct SNODAS SWE estimates using absolute mean bias and RMSE as evaluation criteria. Results show that the random forest (RF) algorithm is most effective at reducing bias in SNODAS SWE, with an absolute mean bias of 0.2 mm and RMSE of 3.64 mm when compared with in situ observations. Other methods, such as mean bias subtraction and linear regression, are somewhat effective at bias reduction; however, only the RF method captures the nonlinearity in the bias and its interannual variability. Applying the RF model to the full spatio-temporal domain shows that the SWE bias is largest before 2015, during the spring melt period, north of 44.5∘ N and east (downwind) of the Great Lakes. As an independent validation, we also compare estimated snowmelt volumes with observed hydrographs and demonstrate that uncorrected SNODAS SWE is associated with unrealistically large volumes at the time of the spring freshet, while bias-corrected SWE values are highly consistent with observed discharge volumes.

Download Full-text

Snow Depth Fusion Based on Machine Learning Methods for the Northern Hemisphere

Remote Sensing ◽

10.3390/rs13071250 ◽

2021 ◽

Vol 13 (7) ◽

pp. 1250

Author(s):

Yanxing Hu ◽

Tao Che ◽

Liyun Dai ◽

Lin Xiao

Keyword(s):

Machine Learning ◽

Northern Hemisphere ◽

Snow Depth ◽

Learning Algorithm ◽

Random Forest Regression ◽

Machine Learning Methods ◽

Long Time ◽

In Situ Observations ◽

Input Variables

In this study, a machine learning algorithm was introduced to fuse gridded snow depth datasets. The input variables of the machine learning method included geolocation (latitude and longitude), topographic data (elevation), gridded snow depth datasets and in situ observations. A total of 29,565 in situ observations were used to train and optimize the machine learning algorithm. A total of five gridded snow depth datasets—Advanced Microwave Scanning Radiometer for the Earth Observing System (AMSR-E) snow depth, Global Snow Monitoring for Climate Research (GlobSnow) snow depth, Long time series of daily snow depth over the Northern Hemisphere (NHSD) snow depth, ERA-Interim snow depth and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) snow depth—were used as input variables. The first three snow depth datasets are retrieved from passive microwave brightness temperature or assimilation with in situ observations, while the last two are snow depth datasets obtained from meteorological reanalysis data with a land surface model and data assimilation system. Then, three machine learning methods, i.e., Artificial Neural Networks (ANN), Support Vector Regression (SVR), and Random Forest Regression (RFR), were used to produce a fused snow depth dataset from 2002 to 2004. The RFR model performed best and was thus used to produce a new snow depth product from the fusion of the five snow depth datasets and auxiliary data over the Northern Hemisphere from 2002 to 2011. The fused snow-depth product was verified at five well-known snow observation sites. The R2 of Sodankylä, Old Aspen, and Reynolds Mountains East were 0.88, 0.69, and 0.63, respectively. At the Swamp Angel Study Plot and Weissfluhjoch observation sites, which have an average snow depth exceeding 200 cm, the fused snow depth did not perform well. The spatial patterns of the average snow depth were analyzed seasonally, and the average snow depths of autumn, winter, and spring were 5.7, 25.8, and 21.5 cm, respectively. In the future, random forest regression will be used to produce a long time series of a fused snow depth dataset over the Northern Hemisphere or other specific regions.

Download Full-text

Evaluating Machine Learning and Geostatistical Methods for Spatial Gap-filling of Monthly ESA CCI Soil Moisture in China

Remote Sensing ◽

10.3390/rs13142848 ◽

2021 ◽

Vol 13 (14) ◽

pp. 2848

Author(s):

Hao Sun ◽

Qian Xu

Keyword(s):

Machine Learning ◽

Soil Moisture ◽

Water Resource Management ◽

Large Scale ◽

Gap Filling ◽

Spatial Continuity ◽

Study Results ◽

Data Gaps

Obtaining large-scale, long-term, and spatial continuous soil moisture (SM) data is crucial for climate change, hydrology, and water resource management, etc. ESA CCI SM is such a large-scale and long-term SM (longer than 40 years until now). However, there exist data gaps, especially for the area of China, due to the limitations in remote sensing of SM such as complex topography, human-induced radio frequency interference (RFI), and vegetation disturbances, etc. The data gaps make the CCI SM data cannot achieve spatial continuity, which entails the study of gap-filling methods. In order to develop suitable methods to fill the gaps of CCI SM in the whole area of China, we compared typical Machine Learning (ML) methods, including Random Forest method (RF), Feedforward Neural Network method (FNN), and Generalized Linear Model (GLM) with a geostatistical method, i.e., Ordinary Kriging (OK) in this study. More than 30 years of passive–active combined CCI SM from 1982 to 2018 and other biophysical variables such as Normalized Difference Vegetation Index (NDVI), precipitation, air temperature, Digital Elevation Model (DEM), soil type, and in situ SM from International Soil Moisture Network (ISMN) were utilized in this study. Results indicated that: 1) the data gap of CCI SM is frequent in China, which is found not only in cold seasons and areas but also in warm seasons and areas. The ratio of gap pixel numbers to the whole pixel numbers can be greater than 80%, and its average is around 40%. 2) ML methods can fill the gaps of CCI SM all up. Among the ML methods, RF had the best performance in fitting the relationship between CCI SM and biophysical variables. 3) Over simulated gap areas, RF had a comparable performance with OK, and they outperformed the FNN and GLM methods greatly. 4) Over in situ SM networks, RF achieved better performance than the OK method. 5) We also explored various strategies for gap-filling CCI SM. Results demonstrated that the strategy of constructing a monthly model with one RF for simulating monthly average SM and another RF for simulating monthly SM disturbance achieved the best performance. Such strategy combining with the ML method such as the RF is suggested in this study for filling the gaps of CCI SM in China.

Download Full-text

Application of multi-linear regression models and machine learning techniques for online voltage stability margin estimation

2010 IREP Symposium Bulk Power System Dynamics and Control - VIII (IREP) ◽

10.1109/irep.2010.5563288 ◽

2010 ◽

Cited By ~ 3

Author(s):

Bruno Leonardi ◽

Venkataramana Ajjarapu ◽

Miodrag Djukanovic ◽

Pei Zhang

Keyword(s):

Machine Learning ◽

Linear Regression ◽

Regression Models ◽

Voltage Stability ◽

Stability Margin ◽

Machine Learning Techniques ◽

Linear Regression Models ◽

Voltage Stability Margin ◽

Learning Techniques ◽

Multi Linear Regression

Download Full-text

Probabilistic Machine Learning Estimation of Ocean Mixed Layer Depth from Dense Satellite and Sparse In-Situ Observations

10.1002/essoar.10505859.1 ◽

2021 ◽

Author(s):

Dallas Foster ◽

David John Gagne ◽

Daniel Bridger Whitt

Keyword(s):

Machine Learning ◽

Mixed Layer ◽

Mixed Layer Depth ◽

Ocean Mixed Layer ◽

Layer Depth ◽

In Situ Observations ◽

Probabilistic Machine Learning ◽

Ocean Mixed Layer Depth

Download Full-text

Predicting land deformation by integrating InSAR data and cone penetration testing through machine learning techniques

Proceedings of the International Association of Hydrological Sciences ◽

10.5194/piahs-382-525-2020 ◽

2020 ◽

Vol 382 ◽

pp. 525-529

Author(s):

Melika Sajadian ◽

Ana Teixeira ◽

Faraz S. Tehrani ◽

Mathias Lemmens

Keyword(s):

Machine Learning ◽

Soil Mechanics ◽

Machine Learning Techniques ◽

Cone Penetration ◽

Penetration Testing ◽

Cone Penetration Testing ◽

Learning Techniques ◽

Land Deformation ◽

Spatio Temporal

Abstract. Built environments developed on compressible soils are susceptible to land deformation. The spatio-temporal monitoring and analysis of these deformations are necessary for sustainable development of cities. Techniques such as Interferometric Synthetic Aperture Radar (InSAR) or predictions based on soil mechanics using in situ characterization, such as Cone Penetration Testing (CPT) can be used for assessing such land deformations. Despite the combined advantages of these two methods, the relationship between them has not yet been investigated. Therefore, the major objective of this study is to reconcile InSAR measurements and CPT measurements using machine learning techniques in an attempt to better predict land deformation.

Download Full-text

Hospital Site Suitability Assessment Using Three Machine Learning Approaches: Evidence from the Gaza Strip in Palestine

Applied Sciences ◽

10.3390/app112211054 ◽

2021 ◽

Vol 11 (22) ◽

pp. 11054

Author(s):

Khaled Yousef Almansi ◽

Abdul Rashid Mohamed Shariff ◽

Ahmad Fikri Abdullah ◽

Sharifah Norkhadijah Syed Ismail

Keyword(s):

Machine Learning ◽

Service Delivery ◽

Roc Curve ◽

Gaza Strip ◽

Machine Learning Techniques ◽

Support Vector ◽

Site Suitability ◽

Hospital Site ◽

Spatial Coverage ◽

The Gaza Strip

Palestinian healthcare institutions face difficulties in providing effective service delivery, particularly in times of crisis. Problems arising from inadequate healthcare service delivery are traceable to issues such as spatial coverage, emergency response time, infrastructure, and manpower. In the Gaza Strip, specifically, there is inadequate spatial distribution and accessibility to healthcare facilities due to decades of conflicts. This study focuses on identifying hospital site suitability areas within the Gaza Strip in Palestine. The study aims to find an optimal solution for a suitable hospital location through suitability mapping using relevant environmental, topographic, and geodemographic parameters and their variable criteria. To find the most significant parameters that reduce the error rate and increase the efficiency for the suitability analysis, this study utilized machine learning methods. Identification of the most significant parameters (conditioning factors) that influence a suitable hospital location was achieved by employing correlation-based feature selection (CFS) with the search algorithm (greedy stepwise). Thus, the suitability map of potential hospital sites was modeled using a support vector machine (SVM), multilayer perceptron (MLP), and linear regression (LR) models. The results of the predicted sites were validated using CFS cross-validation and the receiver operating characteristic (ROC) curve metrics. The CFS analysis shows very high correlations with R2 values of 0.94, 0. 93, and 0.75 for the SVM, MLP, and LR models, respectively. Moreover, based on areas under the ROC curve, the MLP model produced a prediction accuracy of 84.90%, SVM of 75.60%, and LR of 64.40%. The findings demonstrate that the machine learning techniques used in this study are reliable, and therefore are a promising approach for assessing a suitable location for hospital sites for effective health delivery planning and implementation.

Download Full-text

Application of Machine Learning Techniques to Predict Mechanical Properties for Polyamide 2200 (PA12) in Additive Manufacturing

10.20944/preprints201903.0051.v1 ◽

2019 ◽

Author(s):

Ivanna Baturynska

Keyword(s):

Machine Learning ◽

Mechanical Properties ◽

Additive Manufacturing ◽

Linear Regression ◽

Prediction Accuracy ◽

Regression Models ◽

Tensile Modulus ◽

Machine Learning Techniques ◽

Linear Regression Models ◽

Elongation At Break

Additive manufacturing (AM) is an attractive technology for manufacturing industry due to flexibility in design and functionality, but inconsistency in quality is one of the major limitations that does not allow utilizing this technology for production of end-use parts. Prediction of mechanical properties can be one of the possible ways to improve the repeatability of the results. The part placement, part orientation, and STL model properties (number of mesh triangles, surface, and volume) are used to predict tensile modulus, nominal stress and elongation at break for polyamide 2200 (also known as PA12). EOS P395 polymer powder bed fusion system was used to fabricate 217 specimens in two identical builds (434 specimens in total). Prediction is performed for XYZ, XZY, ZYX, and Angle orientations separately, and all orientations together. The different non-linear models based on machine learning methods have higher prediction accuracy compared with linear regression models. Linear regression models have prediction accuracy higher than 80% only for Tensile Modulus and Elongation at break in Angle orientation. Since orientation-based modeling has low prediction accuracy due to a small number of data points and lack of information about material properties, these models need to be improved in the future based on additional experimental work.

Download Full-text

Controlling a Simulated Robot Using Machine Learning Techniques

ASME 2010 World Conference on Innovative Virtual Reality ◽

10.1115/winvr2010-3705 ◽

2010 ◽

Author(s):

Jonathan Becker ◽

Aveek Purohit ◽

Zheng Sun

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Linear Regression ◽

Pid Controller ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Gaming Environment ◽

Using Data

USARSim group at NIST developed a simulated robot that operated in the Unreal Tournament 3 (UT3) gaming environment. They used a software PID controller to control the robot in UT3 worlds. Unfortunately, the PID controller did not work well, so NIST asked us to develop a better controller using machine learning techniques. In the process, we characterized the software PID controller and the robot’s behavior in UT3 worlds. Using data collected from our simulations, we compared different machine learning techniques including linear regression and reinforcement learning (RL). Finally, we implemented a RL based controller in Matlab and ran it in the UT3 environment via a TCP/IP link between Matlab and UT3.

Download Full-text

P31 Optimising trajectories in computer assisted planning for cranial laser interstitial thermal therapy: a machine learning approach

Journal of Neurology Neurosurgery & Psychiatry ◽

10.1136/jnnp-2019-abn.104 ◽

2019 ◽

Vol 90 (3) ◽

pp. e33.1-e33

Author(s):

K Li ◽

VN Vakharia ◽

R Sparks ◽

LGS França ◽

A McEvoy ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Linear Regression ◽

Hippocampal Sclerosis ◽

Thermal Therapy ◽

Machine Learning Techniques ◽

Target Point ◽

Computer Assisted ◽

Laser Interstitial Thermal Therapy ◽

Entry Points

ObjectivesOptimal trajectory planning for cranial laser interstitial thermal therapy (cLITT) in drug resistant focal mesial temporal lobe epilepsy (MTLE).DesignA composite ablation score of ablated AHC minus ablated PHG volumes were calculated and normalised. Random forest and linear regression were implemented to predict composite ablation scores and determine the optimal entry and target point combinations to maximize this.SubjectsTen patients with hippocampal sclerosis were included.MethodsComputer Assisted Planning (CAP) cLITT trajectories were generated using entry regions that include the inferior occipital gyri (IOG), middle occipital gyri (MOG), inferior temporal gyri (ITG) and middle temporal gyri (MTG). Target points were varied by sequential erosions and transformations of the centroid of the amygdala. In total 760 trajectory combinations were generated per patient and ablation volumes were calculated based on a conservative 15 mm maximum ablation diameter.ResultsLinear regression was superior to random forest predictions. Linear regression indicated that maximal composite ablation scores were associated with entry points that clustered around the junction of the IOG, MOG and MTG. The optimal target point was a translation of the centroid of the amygdala anteriorly and medially.ConclusionsMachine learning techniques accurately predict composite ablation scores with linear regression outperforming the random forest approach. Optimal CAP entry points for cLITT maximize ablation of the AHC and spare the PHG.

Download Full-text