Reconstruction of Missing Hourly Precipitation Data to Increase Training Data Set for ANN

2022 ◽  
pp. 242-265
Author(s):  
Hema Nagaraja ◽  
Krishna Kant ◽  
K. Rajalakshmi

This paper investigates the hourly precipitation estimation capacities of ANN using raw data and reconstructed data using proposed Precipitation Sliding Window Period (PSWP) method. The precipitation data from 11 Automatic Weather Station (AWS) of Delhi has been obtained from Jan 2015 to Feb 2016. The proposed PSWP method uses both time and space dimension to fill the missing precipitation values. Hourly precipitation follows patterns in particular period along with its neighbor stations. Based on these patterns of precipitation, Local Cluster Sliding Window Period (LCSWP) and Global Cluster Sliding Window Period (GCSWP) are defined for single AWS and all AWSs respectively. Further, GCSWP period is classified into four different categories to fill the missing precipitation data based on patterns followed in it. The experimental results indicate that ANN trained with reconstructed data has better estimation results than the ANN trained with raw data. The average RMSE for ANN trained with raw data is 0.44 and while that for neural network trained with reconstructed data is 0.34.

Author(s):  
Hema Nagaraja ◽  
Krishna Kant ◽  
K. Rajalakshmi

This paper investigates the hourly precipitation estimation capacities of ANN using raw data and reconstructed data using proposed Precipitation Sliding Window Period (PSWP) method. The precipitation data from 11 Automatic Weather Station (AWS) of Delhi has been obtained from Jan 2015 to Feb 2016. The proposed PSWP method uses both time and space dimension to fill the missing precipitation values. Hourly precipitation follows patterns in particular period along with its neighbor stations. Based on these patterns of precipitation, Local Cluster Sliding Window Period (LCSWP) and Global Cluster Sliding Window Period (GCSWP) are defined for single AWS and all AWSs respectively. Further, GCSWP period is classified into four different categories to fill the missing precipitation data based on patterns followed in it. The experimental results indicate that ANN trained with reconstructed data has better estimation results than the ANN trained with raw data. The average RMSE for ANN trained with raw data is 0.44 and while that for neural network trained with reconstructed data is 0.34.


2021 ◽  
Vol 8 (1) ◽  
pp. 33
Author(s):  
Carlos Javier Gamboa-Villafruela ◽  
José Carlos Fernández-Alvarez ◽  
Maykel Márquez-Mijares ◽  
Albenis Pérez-Alarcón ◽  
Alfo José Batista-Leyva

The short-term prediction of precipitation is a difficult spatio-temporal task due to the non-uniform characterization of meteorological structures over time. Currently, neural networks such as convolutional LSTM have shown ability for the spatio-temporal prediction of complex problems. In this research, we propose an LSTM convolutional neural network (CNN-LSTM) architecture for immediate prediction of various short-term precipitation events using satellite data. The CNN-LSTM is trained with NASA Global Precipitation Measurement (GPM) precipitation data sets, each at 30-min intervals. The trained neural network model is used to predict the sixteenth precipitation data of the corresponding fifteen precipitation sequence and up to a time interval of 180 min. The results show that the increase in the number of layers, as well as in the amount of data in the training data set, improves the quality of the forecast.


2017 ◽  
Vol 2017 ◽  
pp. 1-12
Author(s):  
Gangquan Si ◽  
Jianquan Shi ◽  
Zhang Guo ◽  
Lixin Jia ◽  
Yanbin Zhang

The sparse strategy plays a significant role in the application of the least square support vector machine (LSSVM), to alleviate the condition that the solution of LSSVM is lacking sparseness and robustness. In this paper, a sparse method using reconstructed support vectors is proposed, which has also been successfully applied to mill load prediction. Different from other sparse algorithms, it no longer selects the support vectors from training data set according to the ranked contributions for optimization of LSSVM. Instead, the reconstructed data is obtained first based on the initial model with all training data. Then, select support vectors from reconstructed data set according to the location information of density clustering in training data set, and the process of selecting is terminated after traversing the total training data set. Finally, the training model could be built based on the optimal reconstructed support vectors and the hyperparameter tuned subsequently. What is more, the paper puts forward a supplemental algorithm to subtract the redundancy support vectors of previous model. Lots of experiments on synthetic data sets, benchmark data sets, and mill load data sets are carried out, and the results illustrate the effectiveness of the proposed sparse method for LSSVM.


2019 ◽  
Vol 12 (2) ◽  
pp. 120-127 ◽  
Author(s):  
Wael Farag

Background: In this paper, a Convolutional Neural Network (CNN) to learn safe driving behavior and smooth steering manoeuvring, is proposed as an empowerment of autonomous driving technologies. The training data is collected from a front-facing camera and the steering commands issued by an experienced driver driving in traffic as well as urban roads. Methods: This data is then used to train the proposed CNN to facilitate what it is called “Behavioral Cloning”. The proposed Behavior Cloning CNN is named as “BCNet”, and its deep seventeen-layer architecture has been selected after extensive trials. The BCNet got trained using Adam’s optimization algorithm as a variant of the Stochastic Gradient Descent (SGD) technique. Results: The paper goes through the development and training process in details and shows the image processing pipeline harnessed in the development. Conclusion: The proposed approach proved successful in cloning the driving behavior embedded in the training data set after extensive simulations.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2019 ◽  
Vol 9 (6) ◽  
pp. 1128 ◽  
Author(s):  
Yundong Li ◽  
Wei Hu ◽  
Han Dong ◽  
Xueyan Zhang

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Ryoya Shiode ◽  
Mototaka Kabashima ◽  
Yuta Hiasa ◽  
Kunihiro Oka ◽  
Tsuyoshi Murase ◽  
...  

AbstractThe purpose of the study was to develop a deep learning network for estimating and constructing highly accurate 3D bone models directly from actual X-ray images and to verify its accuracy. The data used were 173 computed tomography (CT) images and 105 actual X-ray images of a healthy wrist joint. To compensate for the small size of the dataset, digitally reconstructed radiography (DRR) images generated from CT were used as training data instead of actual X-ray images. The DRR-like images were generated from actual X-ray images in the test and adapted to the network, and high-accuracy estimation of a 3D bone model from a small data set was possible. The 3D shape of the radius and ulna were estimated from actual X-ray images with accuracies of 1.05 ± 0.36 and 1.45 ± 0.41 mm, respectively.


Genetics ◽  
2021 ◽  
Author(s):  
Marco Lopez-Cruz ◽  
Gustavo de los Campos

Abstract Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and in linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a Sparse Selection Index (SSI) that integrates Selection Index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-BLUP (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in ten different environments) that the SSI can achieve significant (anywhere between 5-10%) gains in prediction accuracy relative to the G-BLUP.


Water ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 107
Author(s):  
Elahe Jamalinia ◽  
Faraz S. Tehrani ◽  
Susan C. Steele-Dunne ◽  
Philip J. Vardon

Climatic conditions and vegetation cover influence water flux in a dike, and potentially the dike stability. A comprehensive numerical simulation is computationally too expensive to be used for the near real-time analysis of a dike network. Therefore, this study investigates a random forest (RF) regressor to build a data-driven surrogate for a numerical model to forecast the temporal macro-stability of dikes. To that end, daily inputs and outputs of a ten-year coupled numerical simulation of an idealised dike (2009–2019) are used to create a synthetic data set, comprising features that can be observed from a dike surface, with the calculated factor of safety (FoS) as the target variable. The data set before 2018 is split into training and testing sets to build and train the RF. The predicted FoS is strongly correlated with the numerical FoS for data that belong to the test set (before 2018). However, the trained model shows lower performance for data in the evaluation set (after 2018) if further surface cracking occurs. This proof-of-concept shows that a data-driven surrogate can be used to determine dike stability for conditions similar to the training data, which could be used to identify vulnerable locations in a dike network for further examination.


Sign in / Sign up

Export Citation Format

Share Document