Wave downscaling using machine learning

Author(s):  
Sara Santamaria Aguilar ◽  
Thomas Wahl

<p>Future changes in the wind wave climate due to atmospheric changes can intensify present erosion and flood risk. Knowledge on both mean and extreme wave climate is necessary for understanding changes in sediment dynamics and flood events at the coastline. In order to assess potential wave changes, ensemble nearshore wave projections are required for covering   the entire range of wave conditions and also the large uncertainties related to future climate states. However, nearshore wave projections are not available for most coastal regions due to the excessive computational effort required for dynamically downscaling ensemble offshore wave data. As a result, the large relative contribution of waves to coastal flooding and erosion is commonly omitted in the assessment of those hazards. In this context, machine learning models can be an efficient tool for downscaling ensemble global wave projections if they are able to accurately simulate the non-linear processes of wave propagation due to their low computational requirements. Here, we analyse the performance of three machine learning methods, namely random forest, multivariate adaptive regression splines and artificial neural networks, for downscaling the wave climate along the coast of Florida. We further compare the performance of these three models to the multiple linear regression, which is a statistical model frequently used, although it does not account for the non-linearities associated with wave propagation processes. We find that the three machine learning models perform better than the multiple linear regression for all wave parameters (significant wave height, peak and mean periods, direction) along the entire coastline of Florida, which highlights the ability of these models to reproduce the non-linear wave propagation processes. Specifically, random forest shows the best performance and the lowest computational training times. In addition, this model shows a remarkably good performance in simulating the wave extreme events compared to the other models. By following a tree bagging approach, random forest can also provide confidence intervals and reduce the tuning process. The latter is one of the main disadvantages of the artificial neural networks, which also show a high performance for wave downscaling but require more training and tuning effort. Although the significant wave height and the periods can be simulated with very high accuracy (R<sup>2</sup> higher than 0.9 and 0.8 respectively), the wave direction is poorly simulated by all models due to its circular behaviour. We find that a transformation of the direction into sine and cosine can improve the model performance. Finally, we downscale an ensemble of global wave projections along the coast of Florida and assess potential changes in the wave climate of this region.   </p>

2021 ◽  
Vol 11 (19) ◽  
pp. 9296
Author(s):  
Talha Mahboob Alam ◽  
Mubbashar Mushtaq ◽  
Kamran Shaukat ◽  
Ibrahim A. Hameed ◽  
Muhammad Umer Sarwar ◽  
...  

Lack of education is a major concern in underdeveloped countries because it leads to poor human and economic development. The level of education in public institutions varies across all regions around the globe. Current disparities in access to education worldwide are mostly due to systemic regional differences and the distribution of resources. Previous research focused on evaluating students’ academic performance, but less has been done to measure the performance of educational institutions. Key performance indicators for the evaluation of institutional performance differ from student performance indicators. There is a dire need to evaluate educational institutions’ performance based on their disparities and academic results on a large scale. This study proposes a model to measure institutional performance based on key performance indicators through data mining techniques. Various feature selection methods were used to extract the key performance indicators. Several machine learning models, namely, J48 decision tree, support vector machines, random forest, rotation forest, and artificial neural networks were employed to build an efficient model. The results of the study were based on different factors, i.e., the number of schools in a specific region, teachers, school locations, enrolment, and availability of necessary facilities that contribute to school performance. It was also observed that urban regions performed well compared to rural regions due to the improved availability of educational facilities and resources. The results showed that artificial neural networks outperformed other models and achieved an accuracy of 82.9% when the relief-F based feature selection method was used. This study will help support efforts in governance for performance monitoring, policy formulation, target-setting, evaluation, and reform to address the issues and challenges in education worldwide.


2021 ◽  
Author(s):  
Moritz Feigl ◽  
Katharina Lebiedzinski ◽  
Mathew Herrnegger ◽  
Karsten Schulz

Abstract. Water temperature in rivers is a crucial environmental factor with the ability to alter hydro-ecological as well as socio-economic conditions within a catchment. The development of modelling concepts for predicting river water temperature is and will be essential for an effective integrated water management and the development of adaptation strategies to future global changes (e.g. climate change). This study tests the performance of 6 different machine learning models: step-wise linear regression, Random forest, eXtreme Gradient Boosting (XGBoost), Feedforward neural networks (FNN), and two types of Recurrent neural networks (RNN). All models are applied using different data inputs for daily water temperature prediction in 10 Austrian catchments ranging from 200 km2 to 96000 km2 and exhibiting a wide range of physiographic characteristics. The evaluated input data sets include combinations of daily means of air temperature, runoff, precipitation and global radiation. Bayesian optimization is applied to optimize the hyperparameters of all applied machine learning models. To make the results comparable to previous studies, two widely used benchmark models are applied additionally: linear regression and air2stream. With a mean root mean squared error (RMSE) of 0.55 °C the tested models could significantly improve water temperature prediction compared to linear regression (1.55 °C) and air2stream (0.98 °C). In general, the results show a very similar performance of the tested machine learning models, with a median RMSE difference of 0.08 °C between the models. From the 6 tested machine learning models both FNNs and XGBoost performed best in 4 of the 10 catchments. RNNs are the best performing models in the largest catchment, indicating that RNNs are mainly performing well when processes with long-term dependencies are important. Furthermore, a wide range of performance was observed for different hyperparameter sets for the tested models, showing the importance of hyperprameter optimization. Especially the FNN model results showed an extremely large RMSE standard deviation of 1.60 °C due to the chosen hyperparamerters. This study evaluates different sets of input variables, machine learning models and training characteristics for daily stream water temperature prediction, acting as a basis for future development of regional multi-catchment water temperature prediction models. All preprocessing steps and models are implemented into the open source R package wateRtemp, to provide easy access to these modelling approaches and facilitate further research.


Student admission problem is very important in educational institutions. This paper addresses machine learning models to predict the chance of a student to be admitted to a master’s program. This will assist students to know in advance if they have a chance to get accepted. The machine learning models are multiple linear regression, k-nearest neighbor, random forest, and Multilayer Perceptron. Experiments show that the Multilayer Perceptron model surpasses other models.


2020 ◽  
Author(s):  
Qing Wu ◽  
Fatma Nasoz ◽  
Jongyun Jung ◽  
Bibek Bhattarai

AbstractBone mineral density (BMD) is a highly heritable trait with heritability ranging from 50% to 80%. Numerous BMD-associated Single Nucleotide Polymorphisms (SNPs) were discovered by GWAS and GWAS meta-analysis. However, several studies found that combining these highly significant SNPs together only explained a small percentage of BMD variance. This inconsistency may be caused by limitations of the linear regression approaches employed because these traditional approaches lack the flexibility and the adequacy to model complex gene interactions and regulations. Hence, we developed various machine learning models of genomic data and ran experiments to identify the best machine learning model for BMD prediction at three different sites. We used genomic data of Osteoporotic Fractures in Men (MrOS) cohort Study (N=5,133) for analysis. Genotype imputation was conducted at the Sanger Imputation Server. A total of 1,103 BMD-associated SNPs were identified and corresponding weighted genetic risk scores were calculated. Genetic variants, as well as age and other traditional BMD predictors, were included for modeling. Data were normalized and were split into a training set (80%) and a test set (20%). BMD prediction models were built separately by random forest, gradient boosting, and neural network algorithms. Linear regression was used as a reference model. We applied the non-parametric Wilcoxon signed-rank tests for the measurement of MSE in each model for the pair-wise model comparison. We found that gradient boosting shows the lowest MSE for each BMD site and a prediction model built using the machine learning models achieves improved performance when a large number of SNPs are included in the models. With the predictors of phenotype covariate + 1,103 SNPs, all of the models were statistically significant except neural network vs. random forest at femoral neck BMD and gradient boosting vs. random forest at total hip BMD.


2021 ◽  
Author(s):  
Larissa Asito ◽  
Hélcio Pereira ◽  
Marcello Nogueira-Barbosa ◽  
Renato Tinós

We propose a computer-aided diagnosis system based on convolutional neural networks (CNNs) for the identification of osteosarcoma on bone radiographs. The CNN should indicate regions of the image that may contain tumors. In order to indicate these regions on the image, we propose to split the image in windows and individually classify them by using a CNN. Techniques for pre-processing, such as window exclusion and labeling, are proposed. Two CNNs are compared in the proposed system. The first one is trained from scratch, while the second one is a pre-trained CNN (VGG16). The CNNs are compared to four machine learning models that use features extracted from the image windows as inputs: multilayer perceptron (MLP), decision tree, random forest, and MLP with feature selection. In the experiments, the best performance was obtained by the pre-trained CNN.


2021 ◽  
Vol 25 (5) ◽  
pp. 2951-2977
Author(s):  
Moritz Feigl ◽  
Katharina Lebiedzinski ◽  
Mathew Herrnegger ◽  
Karsten Schulz

Abstract. Water temperature in rivers is a crucial environmental factor with the ability to alter hydro-ecological as well as socio-economic conditions within a catchment. The development of modelling concepts for predicting river water temperature is and will be essential for effective integrated water management and the development of adaptation strategies to future global changes (e.g. climate change). This study tests the performance of six different machine-learning models: step-wise linear regression, random forest, eXtreme Gradient Boosting (XGBoost), feed-forward neural networks (FNNs), and two types of recurrent neural networks (RNNs). All models are applied using different data inputs for daily water temperature prediction in 10 Austrian catchments ranging from 200 to 96 000 km2 and exhibiting a wide range of physiographic characteristics. The evaluated input data sets include combinations of daily means of air temperature, runoff, precipitation and global radiation. Bayesian optimization is applied to optimize the hyperparameters of all applied machine-learning models. To make the results comparable to previous studies, two widely used benchmark models are applied additionally: linear regression and air2stream. With a mean root mean squared error (RMSE) of 0.55 ∘C, the tested models could significantly improve water temperature prediction compared to linear regression (1.55 ∘C) and air2stream (0.98 ∘C). In general, the results show a very similar performance of the tested machine-learning models, with a median RMSE difference of 0.08 ∘C between the models. From the six tested machine-learning models both FNNs and XGBoost performed best in 4 of the 10 catchments. RNNs are the best-performing models in the largest catchment, indicating that RNNs mainly perform well when processes with long-term dependencies are important. Furthermore, a wide range of performance was observed for different hyperparameter sets for the tested models, showing the importance of hyperparameter optimization. Especially the FNN model results showed an extremely large RMSE standard deviation of 1.60 ∘C due to the chosen hyperparameters. This study evaluates different sets of input variables, machine-learning models and training characteristics for daily stream water temperature prediction, acting as a basis for future development of regional multi-catchment water temperature prediction models. All preprocessing steps and models are implemented in the open-source R package wateRtemp to provide easy access to these modelling approaches and facilitate further research.


Author(s):  
Cecilia Martinez-Castillo ◽  
Gonzalo Astray ◽  
Juan Carlos Mejuto

Different machine learning models (multiple linear regression, vector support machines, artificial neural networks and random forests) are applied to predict the monthly global irradiation (MGI) from different input variables (latitude, longitude and altitude of meteorological station, month, average temperatures, among others) of different areas of Galicia (Spain). The models were trained, validated and queried using data from three stations, and each best machine model was checked in two independent stations. The results obtained confirmed that the best ML methodology is the ANN model which presents the lowest RMSE value in the validation and querying phases 122.6·10kJ/(m2∙day) and 113.6·10kJ/(m2∙day), respectively, and predict conveniently for independent stations, 201.3·10kJ/(m2∙day) and 209.4·10kJ/(m2∙day), respectively. Given the good results obtained, it is convenient to continue with the design of artificial neural networks applied to the analysis of monthly global irradiation.


Author(s):  
Farrikh Alzami ◽  
Erika Devi Udayanti ◽  
Dwi Puji Prabowo ◽  
Rama Aria Megantara

Sentiment analysis in terms of polarity classification is very important in everyday life, with the existence of polarity, many people can find out whether the respected document has positive or negative sentiment so that it can help in choosing and making decisions. Sentiment analysis usually done manually. Therefore, an automatic sentiment analysis classification process is needed. However, it is rare to find studies that discuss extraction features and which learning models are suitable for unstructured sentiment analysis types with the Amazon food review case. This research explores some extraction features such as Word Bags, TF-IDF, Word2Vector, as well as a combination of TF-IDF and Word2Vector with several machine learning models such as Random Forest, SVM, KNN and Naïve Bayes to find out a combination of feature extraction and learning models that can help add variety to the analysis of polarity sentiments. By assisting with document preparation such as html tags and punctuation and special characters, using snowball stemming, TF-IDF results obtained with SVM are suitable for obtaining a polarity classification in unstructured sentiment analysis for the case of Amazon food review with a performance result of 87,3 percent.


Sign in / Sign up

Export Citation Format

Share Document