scholarly journals A comparison of gap-filling algorithms for eddy covariance fluxes and their drivers

2020 ◽  
Author(s):  
Atbin Mahabbati ◽  
Jason Beringer ◽  
Matthias Leopold ◽  
Ian McHugh ◽  
James Cleverly ◽  
...  

Abstract. The errors and uncertainties associated with gap-filling algorithms of water, carbon and energy fluxes data, have always been one of the prominent challenges of the global network of microclimatological tower sites that use eddy covariance (EC) technique. To address this concern, and find more efficient gap-filling algorithms, we reviewed eight algorithms to estimate missing values of environmental drivers, and separately three major fluxes in EC time series. We then examined the performance of mentioned algorithms for different gap-filling scenarios utilising data from five OzFlux Network towers during 2013. The objectives of this research were (a) to evaluate the impact of training and testing window lengths on the performance of each algorithm; (b) to compare the performance of traditional and new gap-filling techniques for the EC data, for fluxes and their corresponding meteorological drivers. The performance of algorithms was evaluated by generating nine different training-testing window lengths, ranging from a day to 365 days. In each scenario, the gaps covered the data for the entirety of 2013 by consecutively repeating them, where, in each step, values were modelled by using earlier window data. After running each scenario, a variety of statistical metrics was used to evaluate the performance of the algorithms. The algorithms showed different levels of sensitivity to training-testing windows; The Prophet Forecast Model (FBP) revealed the most sensitivity, whilst the performance of artificial neural networks (ANNs), for instance, did not vary considerably by changing the window length. The performance of the algorithms generally decreased with increasing training-testing window length, yet the differences were not considerable for the windows smaller than 60 days. Gap-filling of the environmental drivers showed there was not a significant difference amongst the algorithms, the linear algorithms showed slight superiority over those of machine learning (ML), except the random forest algorithm estimating the ground heat flux (RMSEs of 30.17 and 34.93 for RF and CLR respectively). For the major fluxes, though, ML algorithms showed superiority (9 % less RMSE on average), except the Support Vector Regression (SVR), which provided significant bias in its estimations. Even though ANNs, random forest (RF) and extreme gradient boost (XGB) showed close performance in gap-filling of the major fluxes, RF provided more consistent results with less bias, relatively. The results indicated that there is no single algorithm which outperforms in all situations and therefore, but RF is a potential alternative for the ANNs as regards flux gap-filling.

2021 ◽  
Vol 10 (1) ◽  
pp. 123-140
Author(s):  
Atbin Mahabbati ◽  
Jason Beringer ◽  
Matthias Leopold ◽  
Ian McHugh ◽  
James Cleverly ◽  
...  

Abstract. The errors and uncertainties associated with gap-filling algorithms of water, carbon, and energy fluxes data have always been one of the main challenges of the global network of microclimatological tower sites that use the eddy covariance (EC) technique. To address these concerns and find more efficient gap-filling algorithms, we reviewed eight algorithms to estimate missing values of environmental drivers and nine algorithms for the three major fluxes typically found in EC time series. We then examined the algorithms' performance for different gap-filling scenarios utilising the data from five EC towers during 2013. This research's objectives were (a) to evaluate the impact of the gap lengths on the performance of each algorithm and (b) to compare the performance of traditional and new gap-filling techniques for the EC data, for fluxes, and separately for their corresponding meteorological drivers. The algorithms' performance was evaluated by generating nine gap windows with different lengths, ranging from a day to 365 d. In each scenario, a gap period was chosen randomly, and the data were removed from the dataset accordingly. After running each scenario, a variety of statistical metrics were used to evaluate the algorithms' performance. The algorithms showed different levels of sensitivity to the gap lengths; the Prophet Forecast Model (FBP) revealed the most sensitivity, whilst the performance of artificial neural networks (ANNs), for instance, did not vary as much by changing the gap length. The algorithms' performance generally decreased with increasing the gap length, yet the differences were not significant for windows smaller than 30 d. No significant differences between the algorithms were recognised for the meteorological and environmental drivers. However, the linear algorithms showed slight superiority over those of machine learning (ML), except the random forest (RF) algorithm estimating the ground heat flux (root mean square errors – RMSEs – of 28.91 and 33.92 for RF and classic linear regression – CLR, respectively). However, for the major fluxes, ML algorithms and the MDS showed superiority over the other algorithms. Even though ANNs, random forest (RF), and eXtreme Gradient Boost (XGB) showed comparable performance in gap-filling of the major fluxes, RF provided more consistent results with slightly less bias against the other ML algorithms. The results indicated no single algorithm that outperforms in all situations, but the RF is a potential alternative for the MDS and ANNs as regards flux gap-filling.


2017 ◽  
Vol 14 (23) ◽  
pp. 5551-5569 ◽  
Author(s):  
Luke Gregor ◽  
Schalk Kok ◽  
Pedro M. S. Monteiro

Abstract. The Southern Ocean accounts for 40 % of oceanic CO2 uptake, but the estimates are bound by large uncertainties due to a paucity in observations. Gap-filling empirical methods have been used to good effect to approximate pCO2 from satellite observable variables in other parts of the ocean, but many of these methods are not in agreement in the Southern Ocean. In this study we propose two additional methods that perform well in the Southern Ocean: support vector regression (SVR) and random forest regression (RFR). The methods are used to estimate ΔpCO2 in the Southern Ocean based on SOCAT v3, achieving similar trends to the SOM-FFN method by Landschützer et al. (2014). Results show that the SOM-FFN and RFR approaches have RMSEs of similar magnitude (14.84 and 16.45 µatm, where 1 atm  =  101 325 Pa) where the SVR method has a larger RMSE (24.40 µatm). However, the larger errors for SVR and RFR are, in part, due to an increase in coastal observations from SOCAT v2 to v3, where the SOM-FFN method used v2 data. The success of both SOM-FFN and RFR depends on the ability to adapt to different modes of variability. The SOM-FFN achieves this by having independent regression models for each cluster, while this flexibility is intrinsic to the RFR method. Analyses of the estimates shows that the SVR and RFR's respective sensitivity and robustness to outliers define the outcome significantly. Further analyses on the methods were performed by using a synthetic dataset to assess the following: which method (RFR or SVR) has the best performance? What is the effect of using time, latitude and longitude as proxy variables on ΔpCO2? What is the impact of the sampling bias in the SOCAT v3 dataset on the estimates? We find that while RFR is indeed better than SVR, the ensemble of the two methods outperforms either one, due to complementary strengths and weaknesses of the methods. Results also show that for the RFR and SVR implementations, it is better to include coordinates as proxy variables as RMSE scores are lowered and the phasing of the seasonal cycle is more accurate. Lastly, we show that there is only a weak bias due to undersampling. The synthetic data provide a useful framework to test methods in regions of sparse data coverage and show potential as a useful tool to evaluate methods in future studies.


2019 ◽  
Vol 16 (16) ◽  
pp. 3113-3131 ◽  
Author(s):  
Mathias Göckede ◽  
Fanny Kittler ◽  
Carsten Schaller

Abstract. Methane flux measurements by the eddy-covariance technique are subject to large uncertainties, particularly linked to the partly highly intermittent nature of methane emissions. Outbursts of high methane emissions, termed event fluxes, hold the potential to introduce systematic biases into derived methane budgets, since under such conditions the assumption of stationarity of the flow is violated. In this study, we investigate the net impact of this effect by comparing eddy-covariance fluxes against a wavelet-derived reference that is not negatively influenced by non-stationarity. Our results demonstrate that methane emission events influenced 3 %–4 % of the flux measurements and did not lead to systematic biases in methane budgets for the analyzed summer season; however, the presence of events substantially increased uncertainties in short-term flux rates. The wavelet results provided an excellent reference to evaluate the performance of three different gap-filling approaches for eddy-covariance methane fluxes, and we show that none of them could reproduce the range of observed flux rates. The integrated performance of the gap-filling methods for the longer-term dataset varied between the two eddy-covariance towers involved in this study, and we show that gap-filling remains a large source of uncertainty linked to limited insights into the mechanisms governing the short-term variability in methane emissions. With the capability for broadening our observational methane flux database to a wider range of conditions, including the direct resolution of short-term variability on the order of minutes, wavelet-derived fluxes hold the potential to generate new insight into methane exchange processes with the atmosphere and therefore also improve our understanding of the underlying processes.


Author(s):  
Marina Azer ◽  
◽  
Mohamed Taha ◽  
Hala H. Zayed ◽  
Mahmoud Gadallah

Social media presence is a crucial portion of our life. It is considered one of the most important sources of information than traditional sources. Twitter has become one of the prevalent social sites for exchanging viewpoints and feelings. This work proposes a supervised machine learning system for discovering false news. One of the credibility detection problems is finding new features that are most predictive to better performance classifiers. Both features depending on new content, and features based on the user are used. The features' importance is examined, and their impact on the performance. The reasons for choosing the final feature set using the k-best method are explained. Seven supervised machine learning classifiers are used. They are Naïve Bayes (NB), Support vector machine (SVM), Knearest neighbors (KNN), Logistic Regression (LR), Random Forest (RF), Maximum entropy (ME), and conditional random forest (CRF). Training and testing models were conducted using the Pheme dataset. The feature's analysis is introduced and compared to the features depending on the content, as the decisive factors in determining the validity. Random forest shows the highest performance while using user-based features only and using a mixture of both types of features; features depending on content and the features based on the user, accuracy (82.2 %) in using user-based features only. We achieved the highest results by using both types of features, utilizing random forest classifier accuracy(83.4%). In contrast, logistic regression was the best as to using features that are based on contents. Performance is measured by different measurements accuracy, precision, recall, and F1_score. We compared our feature set with other studies' features and the impact of our new features. We found that our conclusions exhibit high enhancement concerning discovering and verifying the false news regarding the discovery and verification of false news, comparing it to the current results of how it is developed.


Agriculture ◽  
2021 ◽  
Vol 11 (11) ◽  
pp. 1106
Author(s):  
Yan Hu ◽  
Lijia Xu ◽  
Peng Huang ◽  
Xiong Luo ◽  
Peng Wang ◽  
...  

A rapid and nondestructive tea classification method is of great significance in today’s research. This study uses fluorescence hyperspectral technology and machine learning to distinguish Oolong tea by analyzing the spectral features of tea in the wavelength ranging from 475 to 1100 nm. The spectral data are preprocessed by multivariate scattering correction (MSC) and standard normal variable (SNV), which can effectively reduce the impact of baseline drift and tilt. Then principal component analysis (PCA) and t-distribution random neighborhood embedding (t-SNE) are adopted for feature dimensionality reduction and visual display. Random Forest-Recursive Feature Elimination (RF-RFE) is used for feature selection. Decision Tree (DT), Random Forest Classification (RFC), K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) are used to establish the classification model. The results show that MSC-RF-RFE-SVM is the best model for the classification of Oolong tea in which the accuracy of the training set and test set is 100% and 98.73%, respectively. It can be concluded that fluorescence hyperspectral technology and machine learning are feasible to classify Oolong tea.


Entropy ◽  
2019 ◽  
Vol 21 (4) ◽  
pp. 386
Author(s):  
Lin Lin ◽  
Bin Wang ◽  
Jiajin Qi ◽  
Da Wang ◽  
Nantian Huang

To improve the accuracy of the recognition of complicated mechanical faults in bearings, a large number of features containing fault information need to be extracted. In most studies regarding bearing fault diagnosis, the influence of the limitation of fault training samples has not been considered. Furthermore, commonly used multi-classifiers could misidentify the type or severity of faults without using normal samples as training samples. Therefore, a novel bearing fault diagnosis method based on the one-class classification concept and random forest is proposed for reducing the impact of the limitations of the fault training sample. First, the bearing vibration signals are decomposed into numerous intrinsic mode functions using empirical wavelet transform. Then, 284 features including multiple entropy are extracted from the original signal and intrinsic mode functions to construct the initial feature set. Lastly, a hybrid classifier based on one-class support vector machine trained by normal samples and a random forest trained by imbalanced fault data without some specific severities is set up to accurately identify the mechanical state and specific fault type of the bearings. The experimental results show that the proposed method can significantly improve the classification accuracy compared with traditional methods in different diagnostic target.


2018 ◽  
Vol 10 (9) ◽  
pp. 3282 ◽  
Author(s):  
Da Liu ◽  
Kun Sun ◽  
Han Huang ◽  
Pingzhou Tang

Accurate load forecasting can help alleviate the impact of renewable-energy access to the network, facilitate the power plants to arrange unit maintenance and encourage the power broker companies to develop a reasonable quotation plan. However, the traditional prediction methods are insufficient for the analysis of load sequence fluctuations. The economic variables are not introduced into the input variable selection and the redundant information interferes with the final prediction results. In this paper, a set of the ensemble empirical mode is used to decompose the electricity consumption sequence. Appropriate economic variables are as selected as model input for each decomposition sequence to model separately according to its characteristics. Then the models are constructed by selecting the optimal parameters in the random forest. Finally, the result of the component prediction is reconstituted. Compared with random forest, support vector machine and seasonal naïve method, the example results show that the prediction accuracy of the model is better than that of the contrast models. The validity and feasibility of the method in the monthly load forecasting is verified.


Author(s):  
A. Jamali ◽  
A. Abdul Rahman

Abstract. Environmental change monitoring in earth sciences needs land use land cover change (LULCC) modeling to investigate the impact of climate change phenomena such as droughts and floods on earth surface land cover. As land cover has a direct impact on Land Surface Temperature (LST), the Land cover mapping is an essential part of climate change modeling. In this paper, for land use land cover mapping (LULCM), image classification of Sentinel-1A Synthetic Aperture Radar (SAR) Ground Range Detected (GRD) data using two machine learning algorithms including Support Vector Machine (SVM) and Random Forest (RF) are implemented in R programming language and compared in terms of overall accuracy for image classification. Considering eight different scenarios defined in this research, RF and SVM classification methods show their best performance with overall accuracies of 90.81 and 92.09 percent respectively.


2021 ◽  
Vol 10 (2) ◽  
pp. 116
Author(s):  
Haleh Azizi ◽  
Hassan Reza

Several studies have been conducted in recent years to discriminate between fractured (FZs) and non-fractured zones (NFZs) in oil wells. These studies have applied data mining techniques to petrophysical logs (PLs) with generally valuable results; however, identifying fractured and non-fractured zones is difficult because imbalanced data is not treated as balanced data during analysis. We studied the importance of using balanced data to detect fractured zones using PLs. We used Random-Forest and Support Vector Machine classifiers on eight oil wells drilled into a fractured carbonite reservoir to study PLs with imbalanced and balanced datasets, then validated our results with image logs. A significant difference between accuracy and precision indicates imbalanced data with fractured zones categorized as the minor class. The results indicated that the accuracy of imbalanced and balanced datasets is similar, but precision is significantly improved by balancing, regardless of how low or high the calculated indices might be.  


2021 ◽  
Author(s):  
Jingyu Yao ◽  
Zhongming Gao ◽  
Jianping Huang ◽  
Heping Liu ◽  
Guoyin Wang

Abstract. Gap-filling eddy covariance CO2 fluxes is challenging at dryland sites due to small CO2 fluxes. Here, four machine learning (ML) algorithms including artificial neural network (ANN), k-nearest neighbours (KNN), random forest (RF), and support vector machine (SVM) are employed and evaluated for gap-filling CO2 fluxes over a semi-arid sagebrush ecosystem with different lengths of artificial gaps. The ANN and RF algorithms outperform the KNN and SVM in filling gaps ranging from hours to days, with the RF being more time efficient than the ANN. Performances of the ANN and RF are largely degraded for extremely long gaps of two months. In addition, our results suggest that there is no need to fill the daytime and nighttime NEE gaps separately when using the ANN and RF. With the ANN and RF, the gap-filling induced uncertainties in the annual NEE at this site are estimated to be within 16 g C m−2, whereas the uncertainties by the KNN and SVM can be as large as 27 g C m−2. To better fill extremely long gaps of a few months, we test a two-layer gap-filling framework based on the RF. With this framework, the model performance is improved significantly, especially for the nighttime data. Therefore, this approach provides an alternative in filling extremely long gaps to characterize annual carbon budgets and interannual variability in dryland ecosystems.


Sign in / Sign up

Export Citation Format

Share Document