scholarly journals A comparative study of different imputation methods for daily rainfall data in east-coast Peninsular Malaysia

Author(s):  
Siti Mariana Che Mat Nor ◽  
Shazlyn Milleana Shaharudin ◽  
Shuhaida Ismail ◽  
Nurul Hila Zainuddin ◽  
Mou Leong Tan

Rainfall data are the most significant values in hydrology and climatology modelling. However, the datasets are prone to missing values due to various issues. This study aspires to impute the rainfall missing values by using various imputation method such as Replace by Mean, Nearest Neighbor, Random Forest, Non-linear Interactive Partial Least-Square (NIPALS) and Markov Chain Monte Carlo (MCMC). Daily rainfall datasets from 48 rainfall stations across east-coast Peninsular Malaysia were used in this study. The dataset were then fed into Multiple Linear Regression (MLR) model. The performance of abovementioned methods were evaluated using Root Mean Square Method (RMSE), Mean Absolute Error (MAE) and Nash-Sutcliffe Efficiency Coefficient (CE). The experimental results showed that RF coupled with MLR (RF-MLR) approach was attained as more fitting for satisfying the missing data in east-coast Peninsular Malaysia.

2016 ◽  
Vol 78 (9-4) ◽  
Author(s):  
Nur Shazwani Muhammad ◽  
Amieroul Iefwat Akashah ◽  
Jazuri Abdullah

Extreme rainfall events are the main cause of flooding. This study aimed to examine seven extreme rainfall indices, i.e. extreme rain sum (XRS), very wet day intensity (I95), extremely wet day intensity (I99), very wet day proportion (R95), extremely wet day proportion (R99), very wet days (N95) and extremely wet days (N99) using Mann-Kendall (MK) and the normalized statistic Z tests. The analyses are based on the daily rainfall data gathered from Bayan Lepas, Subang, Senai, Kuantan and Kota Bharu. The east coast states received more rainfall than any other parts in Peninsular Malaysia. Kota Bharu station recorded the highest XRS, i.e. 648 mm. The analyses also indicate that the stations in the eastern part of Peninsular Malaysia experienced higher XRS, I95, I99, R95 and R99 as compared to the stations located in the western and northern part of Peninsular Malaysia. Subang and Senai show the highest number of days for wet and very wet (N95) as compared to other stations. Other than that, all stations except for Kota Bharu show increasing trends for most of the extreme rainfall indices. Upward trends indicate that the extreme rainfall events were becoming more severe over the period of 1960 to 2014. 


2017 ◽  
Vol 13 (4-1) ◽  
pp. 375-380 ◽  
Author(s):  
Izzat Fakharuddin Kamaruzaman ◽  
Wan Zawiah Wan Zin ◽  
Noratiqah Mohd Ariff

This study modified a method for treating missing values in daily rainfall data from 104 selected rainfall stations. The daily rainfall data were obtained from the Department of Irrigation and Drainage Malaysia (DID) for the periods of 1965 to 2015. The missing values throughout the 51 years period were estimated using the various types of weighting methods. In determining the best imputation method, three test for evaluating model performance has been used. The findings of this study indicate that the proposed method is more efficient than the traditional method. The homogeneity of the data series was checked using the homogeneity tests recommended by the existing literatures. The results indicated that more than 40% of the rainfall stations were homogenous based on the proposed method.


2018 ◽  
Vol 17 (4) ◽  
pp. 334-347
Author(s):  
Kwanchayanawish MACHANA ◽  
Amonrat KANOKRUNG ◽  
Sirinart SRICHAN ◽  
Boonyadist VONGSAK ◽  
Maliwan KUTAKO ◽  
...  

Determinations of fatty acid profiles of five microalgae; Amphora sp., Chaetoceros sp., Melosira sp., Bellerochae sp., and Lithodesmium sp., from the east coast of Thailand were evaluated by conventional Gas Chromatography-Flame Ionization Detector (GC-FID). The results exhibited that the fatty acids suitable for biodiesel production were the most frequent entities encountered in all microalgae profiles. The GC chromatogram of fatty acid profiles in microalgae showed that both Amphora sp. and Chaetoceros sp. comprised essential omega-3 fatty acids, eicosapentaenoic acid (EPA), and docosahexaenoic acid (DHA). Additionally, this study assessed whether Fourier Transform infrared (FT-IR) microspectroscopy could be used to evaluate and monitor the biochemical compositions of microalgae, including lipid, carbohydrate, and protein profiles, by using colorimetric methods. Results showed that FT-IR spectra combined with biochemical values of lipid, carbohydrate, and protein contents were used as predictive models generated by partial least square (PLS) regression. Cross-validation of the lipid, protein, and carbohydrate models showed high degrees of statistical accuracy with RMSECV values of approximately 0.5 - 3.22 %, and a coefficient of regression between the actual and predicted values of lipids, carbohydrates, and proteins were 92.66, 95.73, and 96.43 %, respectively. The RPD values were all high (> 3), indicating good predictive accuracy. This study suggested that FT-IR could be a tool for the simultaneous measurement of microalgae composition of biochemical contents in microalgae cells.


2013 ◽  
Vol 17 (4) ◽  
pp. 1311-1318 ◽  
Author(s):  
F. Yusof ◽  
I. L. Kane ◽  
Z. Yusop

Abstract. A short memory process that encounters occasional structural breaks in mean can show a slower rate of decay in the autocorrelation function and other properties of fractional integrated I (d) processes. In this paper we employed a procedure for estimating the fractional differencing parameter in semiparametric contexts proposed by Geweke and Porter-Hudak (1983) to analyse nine daily rainfall data sets across Malaysia. The results indicate that all the data sets exhibit long memory. Furthermore, an empirical fluctuation process using the ordinary least square (OLS)-based cumulative sum (CUSUM) test for the break date was applied. Break dates were detected in all data sets. The data sets were partitioned according to their respective break date, and a further test for long memory was applied for all subseries. Results show that all subseries follows the same pattern as the original series. The estimate of the fractional parameters d1 and d2 on the subseries obtained by splitting the original series at the break date confirms that there is a long memory in the data generating process (DGP). Therefore this evidence shows a true long memory not due to structural break.


Author(s):  
Eiman Tamah Alshammari

This paper motivation is to find the most accurate technique to predict the ground level ozone at Al Jahra station, Kuwait. The data on the meteorological variables (air temperature, relative humidity, solar radiation, direction and speed of wind) and concentration of seven pollutants of environment (SO2, NO2, NO, CO2, CO, NMHC, and CH4) were applied to forecast the ozone concentration in atmosphere. In this report, three methods (PLS regression, support vector machine (SVM), and multiple least-square regression) were used to predict ground-level ozone. We used Fifteen parameters to evaluate the performance of methods. Multiple least-square regression, partial least square regression (PLS regression), and SVM using linear and radial kernels were the best performers with MAE (mean absolute error) of 9.17x 10-03, 9.72 x 10-03, 9.64 x 10-03, and 9.12 x 10-03, respectively. SVM with polynomial kernel had MAE of 5.46 x 10-02. These results show that these methods could be used to predict ground-level ozone concentrations at Al Jahra station in Kuwait.


2018 ◽  
Vol 7 (3.11) ◽  
pp. 168
Author(s):  
Aisar Ashra M. Ashri ◽  
Wardah Tahir ◽  
Nurul Syahira M. Harmay ◽  
Intan Shafeenar A. Mohtar ◽  
Sazali Osman ◽  
...  

Intense hydrological event such as floods are increasing lately especially in Peninsular Malaysia. Therefore, it is important to forecast the intense rainfall as part of flood preparedness and mitigation measures. In this study, Numerical Weather Prediction (NWP) model precipitation outputs using Weather Research and Forecasting (WRF) with horizontal resolution of 3 km have been validated against observed rainfall data measurements for its performance measurement. Forecasted rainfall event data of three (3) states in the East Coast Region; Kelantan, Terengganu and Pahang were evaluated and compared with the observed rainfall data before statistically verifying their accuracy using False Alarm Ratio (FAR) and Probability of Detection (POD). The results indicate a very promising potential of the models in producing quantitative precipitation forecast (QPF) for flood forecasting purpose in Kelantan, Terengganu and Pahang. Since these three states, which are located in the East Coast region of Peninsular Malaysia experienced annual flood event, accurate forecast rainfall data can be used to improve forecast information for flood indicator.   


2015 ◽  
Vol 754-755 ◽  
pp. 923-932 ◽  
Author(s):  
Norazian Mohamed Noor ◽  
A.S. Yahaya ◽  
N.A. Ramli ◽  
Mohd Mustafa Al Bakri Abdullah

Hourly measured PM10 concentration at eight monitoring stations within peninsular Malaysia in 2006 was used to conduct the simulated missing data. The gap lengths of the simulated missing values are limited to 12 hours since the actual trend of missingness is considered short. Two percentages of simulated missing gaps were generated that are 5 % and 15 %. A number of single imputation methods (linear interpolation (LI), nearest neighbour interpolation (NN), mean above below (MAB), daily mean (DM), mean 12-hour (12M), mean 6-hour (6M), row mean (RM) and previous year (PY)) were calculated to fill in the simulated missing data. In addition, multiple imputation (MI) was also conducted to compare between the single imputation methods. The performances were evaluated using four statistical criteria namely mean absolute error, root mean squared error, prediction accuracy and index of agreement. The results show that 6M perform comparably well to LI. Thus, this show that the effect of smaller averaging time gives better prediction. Other single imputation methods predict the missing data well except for PY. RM and MI performs moderately with the increasing performance in higher fraction of missing gaps whereas LR makes the worst methods for both simulated missing data percentages.


2019 ◽  
Vol 9 (2) ◽  
pp. 186-192
Author(s):  
Fawaz Kh. Aswad ◽  
Ali A. Yousif ◽  
Sayran A. Ibrahim

In this research, the effect of random component in the modified Thomas-Fiering model to generate daily rainfall data was studied, and Akre station considered a case study. A random component with special distributions: Normal random numbers, Wilson-Hilferty (W-H) transformation, truncated W-H, and Kirby modification to W-H transformation were used. The model applied to the daily rainfall data for Akre station for available years 2000–2006 and the model used to generate the rainfall data for the years 2006 and 2007. The results showed that the correlation coefficients between the observed and generated data were 0.82 for normal random numbers, 0.77 for W-H transformation, 0.89 for truncated –W –H, and 0.87 for KM to W-H transformation. The tests of Chi-square test, Kolmogorov–Smirnov test, root mean squared error (RMSE) test, and mean absolute error (MAE) test were used to compare between observed and generated data. All the results have passed the Chi-square test and Kolmogorov–Smirnov, where the calculated values were less than the tabulated value at 5% significance. For the test RMSE and MAE, the truncated W-H transform was the values of at least two. Therefore, W-H transform is the best for generating the rainfall data at Akre station


2016 ◽  
Vol 13 (1) ◽  
pp. 83
Author(s):  
Siti Nur Zahrah Amin Burhanuddin ◽  
Sayang Mohd Deni ◽  
Norazan Mohamed Ramli

A good quality of rainfall data is highly necessary in hydrological and meteorological analyses. Lack of quality in rainfall data will influence the process of analyses and subsequently, produce misleading results. Thus, this study is aimed to propose modified missing rainfall data treatment methods that produced more accurate estimation results. In this study, the old normal ratio method and the modified normal ratio based on trimmed mean are combined with geographical coordinate method. The performances of these modified methods were tested on various levels of the missing data of 36 years complete daily rainfall records from eighteen meteorology stations in Peninsular Malaysia. The results indicated that both modified methods improved the estimation of missing rainfall values at the target station based on the least error measurements. Modified normal ratio based on trimmed mean with geographical coordinate method is found to be the most appropriate method for station Batu Kurau and Sg. Bernam while modified old normal ratio with geographical coordinate is the most accurate in estimating the missing data at station Genting Klang.


Sign in / Sign up

Export Citation Format

Share Document