Hydrologic modeling by means of a hybrid downscaling approach: an application to the Sai Gon–Dong Nai Rivers Basin

Abstract The spatial and temporal availability and reliability of hydrological data are substantial contribution to the accuracy of watershed modeling; unfortunately, such data requirements are challenging and perhaps impossible in many regions of the world. In this study, hydrological conditions are simulated using the hydrologic model-WEHY, whose data input are obtained from a hybrid downscaling technique to provide reliable and high temporal and spatial resolution hydrological data. The hybrid downscaling technique is coupled a hydroclimate and a machine learning models; wherein the global atmospheric reanalysis data, including ERA-Interim, ERA-20C, and CFSR are used for initial and boundary conditions of dynamical downscaling utilizing the Weather Research and Forecasting model (WRF). The machine learning model (ANN) then follows to further downscale the WRF outputs to a finer resolution over the studied watershed. An application of the combination of mentioned techniques is applied to the third-largest river basin in Vietnam, the Sai Gon–Dong Nai Rivers Basin. The validation of hybrid model is in the ‘satisfactory’ range. After the estimation of geomorphology and land cover within the watershed, WEHY's calibration and validation are performed based on observed rainfall data. The simulation results matched well with flow observation data with respect to magnitude for both the rising and recession time segments. In comparison among the three selected reanalysis data sets, the best calibration and validation results were obtained from the CFSR data set. These results are closer to the observation data than those using only the dynamic downscaling technique in combination with the WEHY model.

Download Full-text

Evaluation of the Performance of CFSR Reanalysis Data Set for Estimating Potential Evapotranspiration (PET) in Turkey

10.21203/rs.3.rs-977416/v1 ◽

2021 ◽

Author(s):

AHMET IRVEM ◽

Mustafa OZBULDU

Keyword(s):

Potential Evapotranspiration ◽

Meteorological Data ◽

Reanalysis Data ◽

Actual Evapotranspiration ◽

Observation Data ◽

Reanalysis Dataset ◽

Data Set ◽

Climate Forecast System Reanalysis ◽

Mountainous Regions ◽

Weather Stations

Abstract Evapotranspiration is an important parameter for hydrological, meteorological and agricultural studies. However, the calculation of actual evapotranspiration is very challenging and costly. Therefore, Potential Evapotranspiration (PET) is typically calculated using meteorological data to calculate actual evapotranspiration. However, it is very difficult to get complete and accurate data from meteorology stations in, rural and mountainous regions. This study examined the availability of the Climate Forecast System Reanalysis (CFSR) reanalysis data set as an alternative to meteorological observation stations in the computation of potential annual and seasonal evapotranspiration. The PET calculations using the CFSR reanalysis dataset for the period 1987-2017 were compared to data observed at 259 weather stations observed in Turkey. As a result of the assessments, it was determined that the seasons in which the CFSR reanalysis data set had the best prediction performance were the winter (C'= 0.76 and PBias = -3.77) and the autumn (C' = 0.75 and PBias = -12.10). The worst performance was observed for the summer season. The performance of the annual prediction was determined as C'= 0.60 and PBias = -15.27. These findings indicate that the results of the PET calculation using the CFSR reanalysis data set are relatively successful for the study area. However, the data should be evaluated with observation data before being used especially in the summer models.

Download Full-text

Fidelity of CORDEX Evaluation runs under Non-stationary climate

10.5194/egusphere-egu2020-927 ◽

2020 ◽

Author(s):

Swati Singh ◽

Kaustubh Salvi ◽

Subimal Ghosh ◽

Subhankar Karmakar

Keyword(s):

Boundary Conditions ◽

Statistical Methods ◽

Statistical Downscaling ◽

Climate Models ◽

Dynamical Downscaling ◽

Regional Climate ◽

Regional Climate Models ◽

Reanalysis Data ◽

Observation Data ◽

The Difference

The downscaling approaches: Statistical and Dynamic, developed for regional climate predictions, have both advantages and limitations. The statistical downscaling is computationally inexpensive but suffers from the violation of the assumption of stationarity in statistical (predictor-predictand) relationship. The dynamical downscaling is assumed to take care of stationarity but suffers from the biases associated with various sources.&#160; Here we propose a joint approach of both the methods by applying statistical methods: bias correction & statistical downscaling to Coordinated Regional Climate Downscaling Experiment (CORDEX) evaluation runs. The evaluation runs are considered as perfect simulations of CORDEX Regional Climate Models (RCMs) with the boundary conditions by ERA-Interim reanalysis data. The statistical methods are also applied to ERA-Interim reanalysis data and compared with observation data for Indian Summer Monsoon characteristics. We evaluate the ability of statistical methods under the non-stationary environment by taking the difference of years close to extreme future runs (RCP8.5) as warmer years and preindustrial runs as cooler years. We find statistical downscaling of CORDEX evaluation runs shows skill in reproducing the signal of non-stationarity. The study can be extended methods by applying statistical downscaling to CORDEX RCMs with the CMIP5 boundary conditions.&#160;

Download Full-text

High Asia Refined Analysis Version 2 (HAR v2): a New Atmospheric Data Set for the Third Pole Region

10.5194/egusphere-egu2020-8756 ◽

2020 ◽

Author(s):

Xun Wang ◽

Vanessa Tolksdorf ◽

Marco Otto ◽

Dieter Scherer

Keyword(s):

Natural Hazards ◽

Dynamical Downscaling ◽

Data Access ◽

Reanalysis Data ◽

The Tibetan Plateau ◽

Climate Data ◽

Data Set ◽

The Third ◽

High Asia ◽

Atmospheric Data

Climatic-triggered natural hazards such as landslides and glacier lake outburst floods pose a threat to human lives in the third pole region. Availability of accurate climate data with high spatial and temporal resolution is crucial for better understanding climatic triggering mechanisms of these localized natural hazards. Within the framework of the project &#8220;Climatic and Tectonic Natural Hazard in Central Asia&#8221; (CaTeNA), High Asia Refined analysis version 2 (HAR v2) is under production, and is freely available upon request. HAR v2 is a regional atmospheric data set generated by dynamical downscaling of global ERA5 reanalysis data using the Weather Research and Forecasting (WRF) model. Compared to its predecessor (HAR), HAR v2 has an extended 10 km domain covering the Tibetan Plateau and the surrounding mountains, as well as a longer temporal coverage. It will be extended back to 1979, and will be continuously updated in the future. This presentation will contain the following aspects: (1) summarizing the WRF configuration; (2) validating HAR v2 against observational data; (3) comparing HAR v2 with other gridded data sets, such as the newly developed ERA5-Land reanalysis data; (4) providing information about data format, variable list, data access, etc.&#160;&#160;

Download Full-text

Empirical statistical downscaling with EPISODES in an Alpine territory

10.5194/ems2021-73 ◽

2021 ◽

Author(s):

Theresa Schellander-Gorgas ◽

Philip Lorenz ◽

Frank Kreienkamp ◽

Christoph Matulla

Keyword(s):

Statistical Downscaling ◽

Dynamical Downscaling ◽

Regional Scale ◽

Global Climate Models ◽

Climate Projections ◽

Observation Data ◽

Data Set ◽

Statistical Downscaling Method ◽

Downscaling Method ◽

Selection Of

EPISODES is an empirical statistical downscaling method which has been developed at the German national weather service, DWD (Kreienkamp et al. 2019).&#160;Its main aim is the downscaling of climate projections and climate predictions (seasonal to decadal) from global climate models (GCMs) to regional scale. A specific aim is to enhance ensembles based on dynamical downscaling and to improve robustness of deduced indices and statements.The methodology involves two main steps, first, analogue downscaling in connection with linear regression and, second, a sort of weather generator. An important precondition is the availability of long-term observation data sets of high quality and resolution.&#160;The synthetic time-series resulting from EPISODES are multivariate and consistent in space and time. The data provide daily values for selected surface variables and can be delivered on grid or station representation. As such, they meet the main requirements for applications in climate impact research. Thanks to low computational needs, EPISODES can provide climate projections within short time. This enables early insights in the local effects of climate change as projected by GCMs and allows flexibility in the selection of ensembles.While good results for EPISODES projections have already been achieved for Germany, the methodology needs to be adapted for the more complex terrain of the Alpine region.&#160;This is done in close collaboration of DWD and ZAMG (Austria). Among other tasks, the adaptions include a regionalization of the selection of relevant weather regimes, optimal fragmentation of the target region into climatic sub-zones and correction of precipitation class frequencies.The presentation will refer to the progress of the adaption process. In doing so the quality of downscaled climate projections is shown for a test ensemble in comparison with existing projections of the Austrian &#214;KS15 data set and EURO-CORDEX.&#160;Reference: Kreienkamp, F., Paxian, A., Fr&#252;h, B., Lorenz, P., Matulla, C.: Evaluation of the empirical&#8211;statistical downscaling method EPISODES. Clim Dyn 52, 991&#8211;1026 (2019). https://doi.org/10.1007/s00382-018-4276-2

Download Full-text

Exchange Spin Coupling from Gaussian Process Regression

10.26434/chemrxiv.12589541.v3 ◽

2020 ◽

Author(s):

Marc Philipp Bahlke ◽

Natnael Mogos ◽

Jonny Proppe ◽

Carmen Herrmann

Keyword(s):

Machine Learning ◽

Gaussian Process ◽

Gaussian Process Regression ◽

Molecular Magnets ◽

Molecular Structures ◽

Spin Coupling ◽

Structure Property ◽

Data Set ◽

Uncertainty Estimates

Heisenberg exchange spin coupling between metal centers is essential for describing and understanding the electronic structure of many molecular catalysts, metalloenzymes, and molecular magnets for potential application in information technology. We explore the machine-learnability of exchange spin coupling, which has not been studied yet. We employ Gaussian process regression since it can potentially deal with small training sets (as likely associated with the rather complex molecular structures required for exploring spin coupling) and since it provides uncertainty estimates (“error bars”) along with predicted values. We compare a range of descriptors and kernels for 257 small dicopper complexes and find that a simple descriptor based on chemical intuition, consisting only of copper-bridge angles and copper-copper distances, clearly outperforms several more sophisticated descriptors when it comes to extrapolating towards larger experimentally relevant complexes. Exchange spin coupling is similarly easy to learn as the polarizability, while learning dipole moments is much harder. The strength of the sophisticated descriptors lies in their ability to linearize structure-property relationships, to the point that a simple linear ridge regression performs just as well as the kernel-based machine-learning model for our small dicopper data set. The superior extrapolation performance of the simple descriptor is unique to exchange spin coupling, reinforcing the crucial role of choosing a suitable descriptor, and highlighting the interesting question of the role of chemical intuition vs. systematic or automated selection of features for machine learning in chemistry and material science.

Download Full-text

Random Forest Refinement of Pairwise Potentials for Protein-ligand Decoy Detection

10.26434/chemrxiv.8047820.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jun Pei ◽

Zheng Zheng ◽

Hyunji Kim ◽

Lin Song ◽

Sarah Walworth ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Probability Function ◽

Pair Potential ◽

Scoring Function ◽

Stable Structure ◽

Scoring Functions ◽

Atom Pair ◽

Data Set ◽

Atom Pairs

An accurate scoring function is expected to correctly select the most stable structure from a set of pose candidates. One can hypothesize that a scoring function’s ability to identify the most stable structure might be improved by emphasizing the most relevant atom pairwise interactions. However, it is hard to evaluate the relevant importance for each atom pair using traditional means. With the introduction of machine learning methods, it has become possible to determine the relative importance for each atom pair present in a scoring function. In this work, we use the Random Forest (RF) method to refine a pair potential developed by our laboratory (GARF6) by identifying relevant atom pairs that optimize the performance of the potential on our given task. Our goal is to construct a machine learning (ML) model that can accurately differentiate the native ligand binding pose from candidate poses using a potential refined by RF optimization. We successfully constructed RF models on an unbalanced data set with the ‘comparison’ concept and, the resultant RF models were tested on CASF-2013.5 In a comparison of the performance of our RF models against 29 scoring functions, we found our models outperformed the other scoring functions in predicting the native pose. In addition, we used two artificial designed potential models to address the importance of the GARF potential in the RF models: (1) a scrambled probability function set, which was obtained by mixing up atom pairs and probability functions in GARF, and (2) a uniform probability function set, which share the same peak positions with GARF but have fixed peak heights. The results of accuracy comparison from RF models based on the scrambled, uniform, and original GARF potential clearly showed that the peak positions in the GARF potential are important while the well depths are not.

Download Full-text

In silico Prediction of Inhibitory Constant of Thrombin Inhibitors Using Machine Learning

Combinatorial Chemistry & High Throughput Screening ◽

10.2174/1386207322666181220130232 ◽

2019 ◽

Vol 21 (9) ◽

pp. 662-669 ◽

Cited By ~ 1

Author(s):

Junnan Zhao ◽

Lu Zhu ◽

Weineng Zhou ◽

Lingfeng Yin ◽

Yuchen Wang ◽

...

Keyword(s):

Machine Learning ◽

Prediction Models ◽

Regression Tree ◽

Large Data ◽

Thrombin Inhibitors ◽

Coagulation Cascade ◽

Gradient Boosting ◽

Support Vector ◽

Data Set ◽

Descriptor Selection

Background: Thrombin is the central protease of the vertebrate blood coagulation cascade, which is closely related to cardiovascular diseases. The inhibitory constant Ki is the most significant property of thrombin inhibitors. Method: This study was carried out to predict Ki values of thrombin inhibitors based on a large data set by using machine learning methods. Taking advantage of finding non-intuitive regularities on high-dimensional datasets, machine learning can be used to build effective predictive models. A total of 6554 descriptors for each compound were collected and an efficient descriptor selection method was chosen to find the appropriate descriptors. Four different methods including multiple linear regression (MLR), K Nearest Neighbors (KNN), Gradient Boosting Regression Tree (GBRT) and Support Vector Machine (SVM) were implemented to build prediction models with these selected descriptors. Results: The SVM model was the best one among these methods with R2=0.84, MSE=0.55 for the training set and R2=0.83, MSE=0.56 for the test set. Several validation methods such as yrandomization test and applicability domain evaluation, were adopted to assess the robustness and generalization ability of the model. The final model shows excellent stability and predictive ability and can be employed for rapid estimation of the inhibitory constant, which is full of help for designing novel thrombin inhibitors.

Download Full-text

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Download Full-text

AN EFFICIENT MACHINE LEARNING MODEL FOR PREDICTION OF ACUTE MYOCARDIAL INFARCTION

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666200325104317 ◽

2020 ◽

Vol 13 ◽

Author(s):

Dhilsath Fathima.M ◽

S. Justin Samuel ◽

R. Hari Haran

Keyword(s):

Machine Learning ◽

Myocardial Infarction ◽

Acute Myocardial Infarction ◽

Logistic Regression ◽

Decision Tree ◽

Learning Model ◽

Training Dataset ◽

Data Set ◽

Machine Learning Model ◽

Proposed Model

Aim: This proposed work is used to develop an improved and robust machine learning model for predicting Myocardial Infarction (MI) could have substantial clinical impact. Objectives: This paper explains how to build machine learning based computer-aided analysis system for an early and accurate prediction of Myocardial Infarction (MI) which utilizes framingham heart study dataset for validation and evaluation. This proposed computer-aided analysis model will support medical professionals to predict myocardial infarction proficiently. Methods: The proposed model utilize the mean imputation to remove the missing values from the data set, then applied principal component analysis to extract the optimal features from the data set to enhance the performance of the classifiers. After PCA, the reduced features are partitioned into training dataset and testing dataset where 70% of the training dataset are given as an input to the four well-liked classifiers as support vector machine, k-nearest neighbor, logistic regression and decision tree to train the classifiers and 30% of test dataset is used to evaluate an output of machine learning model using performance metrics as confusion matrix, classifier accuracy, precision, sensitivity, F1-score, AUC-ROC curve. Results: Output of the classifiers are evaluated using performance measures and we observed that logistic regression provides high accuracy than K-NN, SVM, decision tree classifiers and PCA performs sound as a good feature extraction method to enhance the performance of proposed model. From these analyses, we conclude that logistic regression having good mean accuracy level and standard deviation accuracy compared with the other three algorithms. AUC-ROC curve of the proposed classifiers is analyzed from the output figure.4, figure.5 that logistic regression exhibits good AUC-ROC score, i.e. around 70% compared to k-NN and decision tree algorithm. Conclusion: From the result analysis, we infer that this proposed machine learning model will act as an optimal decision making system to predict the acute myocardial infarction at an early stage than an existing machine learning based prediction models and it is capable to predict the presence of an acute myocardial Infarction with human using the heart disease risk factors, in order to decide when to start lifestyle modification and medical treatment to prevent the heart disease.

Download Full-text

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Download Full-text