scholarly journals Quantile regression using gradient boosted decision trees for daily residential energy load disaggregation

2021 ◽  
Vol 2069 (1) ◽  
pp. 012107
Author(s):  
B Delcroix ◽  
S Sansregret ◽  
G Larochelle Martin ◽  
A Daoud

Abstract The building sector is responsible for approximately one-third of the total energy consumption, worldwide. This sector is undergoing a major digital transformation, buildings being more and more equipped with connected devices such as smart meters and IoT devices. This transformation offers the opportunity to better monitor and optimize building operations. In the province of Quebec (Canada), most buildings are equipped with smart meters providing electricity usage data every 15 minutes. A current major challenge is to disaggregate the different energy use from smart meter data, a discipline called non-intrusive load monitoring in literature. In this work, the aim is to develop and validate a potentially generalizable model for all houses that identifies the daily share of each energy use based on building information, weather data and smart meter data. Input features are selected and ordered using an aggregated score composed of the correlation coefficient, the feature importance given by a decision tree, and the predictive power score. Two modelling methods based on quantile regression are tested: linear regression (LR) and gradient boosted decision trees (GBDT). Compared to ordinary least squares regression, quantile methods inherently provide more robustness and confidence intervals. Both models are trained and validated using separate datasets collected in 8 houses in Canada where metering and sub-metering were performed during a whole year. Results on the test dataset indicate a better performance of the GBDT model, compared to the LR model, with a coefficient of determination of 0.88 (vs. 0.78), a mean absolute error of 6.34 % (vs. 8.89 %) and a maximum absolute error between the actual and predicted values in 95 % of the cases of 17.2 % (vs. 23.1 %).

2013 ◽  
Vol 2 (1) ◽  
pp. 6
Author(s):  
IDA AYU PRASETYA UTHAMI ◽  
I KOMANG GDE SUKARSA ◽  
I PUTU EKA NILA KENCANA

In regression analysis, the method used to estimate the parameters is Ordinary Least Squares (OLS). The principle of OLS is to minimize the sum of squares error. If any of the assumptions were not met, the results of the OLS estimates are no longer best, linear, and unbiased estimator (BLUE). One of the assumptions that must be met is the assumption about homoscedasticity, a condition in which the variance of the error is constant (same). Violation of the assumptions about homoscedasticity is referred to heteroscedasticity. When there exists heteroscedas­ticity, other regression techniques are needed, such as median quantile regression which is done by defining the median as a solution to minimize sum of absolute error. This study intended to estimate the regression parameters of the data were known to have heteroscedasticity. The secondary data were taken from the book Basic Econometrics (Gujarati, 2004) and analyzing method were performed by EViews 6. Parameter estimation of the median quantile regression were done by estimating the regression parameters at each quantile ?th, then an estimator was chosen on the median quantile as regression coefficients estimator. The result showed heteroscedasticity problem has been solved with median quantile regression although error still does not follow normal distribution properties with a value of R2 about 71 percent. Therefore it can be concluded that median quantile regression can overcome heteroscedasticity but the data still abnormalities.


2021 ◽  
Author(s):  
Renan N.D. Almeida ◽  
Michael Greenberg ◽  
Cedoljub Bundalovic-Torma ◽  
Alexandre Martel ◽  
Pauline W. Wang ◽  
...  

Pseudomonas syringae is a genetically diverse bacterial species complex responsible for numerous agronomically important crop diseases. Individual isolates of P. syringae are typically assigned pathovar names based on their host of isolation and the associated disease symptoms, and these pathovar designations are often assumed to reflect host specificity. Unfortunately, this assumption has rarely been rigorously tested, which poses a challenge when trying to identify genetic factors associated with host specificity. Here we develop a rapid seed infection assays to measure the virulence of 121 diverse P. syringae isolates on common bean (Phaseolus vulgaris). This collection includes P. syringae phylogroup 2 bean isolates (pathovar syringae) that cause bacterial spot disease and P. syringae phylogroup 3 bean isolates (pathovar phaseolicola) that cause the much more serious halo blight disease. We find that phylogroup 2 strains generally show lower levels of host specificity on bean, with the average level of virulence for all strains in this phylogroup (irrespective of host of isolation) being higher than the average level for all other P. syringae strains. We then use gradient boosted decision trees to model the P. syringae virulence weights using whole genome kmers, type III secreted effector kmers, and the presence / absence of type III effectors and phytotoxins. Our machine learning model performed best using whole genome data, and we were able to predict bean virulence with high accuracy (mean absolute error as low as 0.05). Finally, we functionally validated the model by predicting virulence for 16 strains and found that 15 (94%) of the strains had virulence levels within the bounds of estimated predictions given the calculated RMSE values. This study further illustrates that P. syringae phylogroup 2 strains may have evolved a different lifestyle than other P. syringae strains and demonstrates the power of machine learning for predicting host specific adaptation.


Author(s):  
Wan Nur Shaziayani ◽  
Ahmad Zia Ul-Saufie ◽  
Hasfazilah Ahmat ◽  
Dhiya Al-Jumeily

AbstractAir pollution is currently becoming a significant global environmental issue. The sources of air pollution in Malaysia are mobile or stationary. Motor vehicles are one of the mobile sources. Stationary sources originated from emissions caused by urban development, quarrying and power plants and petrochemical. The most noticeable contaminant in the Peninsular of Malaysia is the particulate matter (PM10), the highest contributor of Air Pollution Index (API) compared to other pollution parameters. The aim of this study is to determine the best loss function between quantile regression (QR) and ordinary least squares (OLS) using boosted regression tree (BRT) for the prediction of PM10 concentration in Alor Setar, Klang and Kota Bharu, Malaysia. Model comparison statistics using coefficient of determination (R2), prediction accuracy (PA), index of agreement (IA), normalized absolute error (NAE) and root mean square error (RMSE) show that QR is slightly better than OLS with the performance of R2 (0.60–0.73), PA (0.78–0.85), IA (0.86–0.92), NAE (0.15–0.17) and RMSE (9.52–22.15) for next-day predictions in BRT model.


2018 ◽  
Vol 19 (2) ◽  
pp. 392-403 ◽  
Author(s):  
Omolbani Mohammadrezapour ◽  
Jamshid Piri ◽  
Ozgur Kisi

Abstract Evapotranspiration is an important component in planning and management of water resources. It depends on climatic factors and the influence of these factors on each other makes evapotranspiration estimation difficult. This study attempts to explore the possibility of predicting this important component using three different heuristic methods: support vector machine (SVM), adaptive neuro-fuzzy inference system (ANFIS) and gene expression programming (GEP). In this regard, according to the Food and Agriculture Organization of the United Nations (FAO) Penman-Monteith equation, the monthly potential evapotranspiration in four synoptic stations (Zahedan, Zabol, Iranshahr, and Chabahar) was calculated using monthly weather data. The weather data were then used as inputs to the SVM, ANFIS and GEP models to estimate potential evapotranspiration. Five different input combinations were tried in the applications. The results of SVM, ANFIS and GEP models were compared based on the coefficient of determination (R2), mean absolute error and root mean square error. Findings showed that the SVM model, whose inputs are average air temperature, relative humidity, wind speed, and sunny hours of the current and one previous month, performed better than the other models for the Zahedan, Zabol, Iranshahr, and Chabahar stations. Comparison of the three heuristic methods indicated that in all stations, the SVM, GEP and ANFIS models took first, second, and third place in estimation of the monthly potential evapotranspiration, respectively.


Author(s):  
Ibrahim Abdullahi ◽  
Abubakar Yahaya

<p>In this article, an alternative to ordinary least squares (OLS) regression based on analytical solution in the Statgraphics software is considered, and this alternative is no other than quantile regression (QR) model. We also present goodness of fit statistic as well as approximate distributions of the associated test statistics for the parameters. Furthermore, we suggest a goodness of fit statistic called the least absolute deviation (LAD) coefficient of determination. The procedure is well presented, illustrated and validated by a numerical example based on publicly available dataset on fuel consumption in miles per gallon in highway driving.</p>


Irriga ◽  
2018 ◽  
Vol 23 (1) ◽  
pp. 154-167
Author(s):  
Ramon Amaro de Sales ◽  
Evandro Chaves de Oliveira ◽  
Marcus José Alves Lima ◽  
Eduardo Monteiro Gelcer ◽  
Robson Argolo dos Santos ◽  
...  

AJUSTE DOS COEFICIENTES DAS EQUAÇÕES DE ESTIMATIVA DA EVAPOTRANSPIRAÇÃO DE REFERÊNCIA PARA SÃO MATEUS, ES  RAMON AMARO DE SALES1; EVANDRO CHAVES DE OLIVEIRA2; MARCUS JOSÉ ALVES LIMA3; EDUARDO MONTEIRO GELCER4; ROBSON ARGOLO DOS SANTOS5 E CÁSSIO FURTADO LIMA2 1Programa de Pós-graduação em Produção Vegetal, Universidade Federal do Espírito Santo, CEP: 29500-000, Alegre-ES, E-mail: [email protected];2Instituto Federal de Educação, Ciência e Tecnologia do Espírito Santo Campus Itapina, CEP 29709-910, Colatina-ES,  E-mail: [email protected]; [email protected];3Universidade Federal Rural da Amazônia, Campus Capitão Poço, CEP 66077-530, Belém-PA, E-mail: [email protected];4University of Florida, Department of Agricultural and Biological Engineering, Gainesville, Florida, 32608 - United States. E-mail: [email protected];5Programa de Pós-graduação em Engenharia Agrícola, Universidade Federal de Viçosa, CEP 36570-900, Viçosa-MG, E-mail: [email protected].  1 RESUMO O objetivo deste trabalho foi ajustar métodos empíricos para estimativa da evapotranspiração de referência em escala diária para a região de São Mateus, ES, através de dados da estação meteorológica do Instituto Nacional de Meteorologia. Para tanto, foi utilizado uma série histórica de 15 anos (2000 – 2015), sendo que os primeiros 14 anos foram considerados para ajustar os parâmetros e ao ano de 2015 foram atribuídos dados independentes para validação dos ajustes. O método FAO-56 PM foi referência para a avaliação dos demais métodos, dentre eles: Priestley-Taylor, Tanner-Pelton, Turc, Jensen-Haise, Makkink, Camargo, Hamon, Hargreaves e Samani, Linacre e Benevides Lopes. O desempenho dos métodos foi analisado pelo coeficiente de determinação (R2), índice de concordância de Willmott (d), raiz do erro quadrático médio normalizado (RMSEn) e soma do erro absoluto (SEA). Os resultados obtidos mostraram que os métodos que utilizaram a radiação solar como variável preditora foram mais precisos que aqueles que utilizaram somente temperatura e/ou umidade relativa do ar. Os métodos de maior desempenho foram Turc, Jensen e Haise, Priestley e Taylor, Tanner e Pelton e Makkink, os quais apresentaram valores de RMSEn variando entre 3 – 5% e d igual a 0,99, enquanto os demais apresentaram RMSEn de 21 – 29% e d inferior a 0,81, mesmo após o ajuste. Palavras-chave: Penman-Monteith-FAO, irrigação, necessidade hídrica, meteorologia agrícola  SALES, R. A.; OLIVEIRA, E. C.; LIMA, M. J. A.; GELCER, E. M.; SANTOS, R. A.; LIMA, C. F.ADJUSTMENT OF COEFFICIENTS OF REFERENCE EVAPOTRASPIRATION ESTIMATE EQUATIONS FOR  SÃO MATEUS, ES    2 ABSTRACT The objective of this work was to adjust empirical methods to estimate reference evapotranspiration (ETo) at daily scale for São Mateus, ES region by using weather data from the National Institute of Meteorology. Fifteen years of data (2000 – 2015) were used, with the first 14 years used to adjust parameters and independent data were assigned to the year 2015 to validate the adjustments. The FAO-56 PM method was the reference to evaluate the other ones: Priestley-Taylor, Tanner-Pelton, Turc, Jensen-Haise, Makkink, Camargo, Hamon, Hargreaves the Samani, Linacre the Benevides Lopes. The performance of the methods was analyzed by the coefficient of determination (R2), index of Willmott (d), normalized root mean squared error (RMSEn) and absolute error of the sum (SEA). The results obtained demonstrated that the methods using  solar radiation as predictor variable were more accurate than those in which only temperature and/or relative humidity of the air  were used. The methods with better performance were Turc, Jensen-Haise, Priestley-Taylor, Tanner-Pelton, and Makkink, which presented values of RMSEn varying between 3 – 5% and d equal to 0.99, while the others presented RMSEn of 21 – 29% and d lower than 0.81, even after adjustment. Keywords: Penman – Monteith - FAO, irrigation, water need, agricultural meteorology


Author(s):  
Vasileios Ntouros ◽  
Nikolaos Kampelis ◽  
Martina Senzacqua ◽  
Theoni Karlessi ◽  
Margarita-Niki Assimakopoulos ◽  
...  

AbstractSmart meters, one of the crucial enablers of the smart-grid concept and cornerstones in smart planning for cities, offer the opportunity for consumers to address their energy consumption effectively through timely and accurate data on their energy usage. However, previous studies have shown that smart meters may not lead to the desired energy savings unless actively used by households. To this end, the research presented in this paper investigates the penetration of smart meters at community level and explores how such a metering system can help people to understand and manage their energy use better. It examines the awareness about smart meters, looks into their presence in current accommodation and focuses on the views people have about smart meters. For this purpose, a questionnaire was prepared and distributed to a group of individuals residing in the wide area of Ancona province in Italy. Although the deployment of modern second-generation smart meters started in 2017 replacing the outdated smart meters massively installed in the 2000s, the results show low-to-moderate levels of awareness of modern smart meters among the respondents and a low presence of second-generation metering devices in their current accommodation. However, the general view expressed by the participants about smart meters is positive. The findings demonstrate that respondents are in need not only of a gauge that measures energy consumption but also of a tool that assists them to manage effectively their energy use.


2021 ◽  
pp. 1-15
Author(s):  
O. Basturk ◽  
C. Cetek

ABSTRACT In this study, prediction of aircraft Estimated Time of Arrival (ETA) is proposed using machine learning algorithms. Accurate prediction of ETA is important for management of delay and air traffic flow, runway assignment, gate assignment, collaborative decision making (CDM), coordination of ground personnel and equipment, and optimisation of arrival sequence etc. Machine learning is able to learn from experience and make predictions with weak assumptions or no assumptions at all. In the proposed approach, general flight information, trajectory data and weather data were obtained from different sources in various formats. Raw data were converted to tidy data and inserted into a relational database. To obtain the features for training the machine learning models, the data were explored, cleaned and transformed into convenient features. New features were also derived from the available data. Random forests and deep neural networks were used to train the machine learning models. Both models can predict the ETA with a mean absolute error (MAE) less than 6min after departure, and less than 3min after terminal manoeuvring area (TMA) entrance. Additionally, a web application was developed to dynamically predict the ETA using proposed models.


Author(s):  
Zhai Mingyu ◽  
Wang Sutong ◽  
Wang Yanzhang ◽  
Wang Dujuan

AbstractData-driven techniques improve the quality of talent training comprehensively for university by discovering potential academic problems and proposing solutions. We propose an interpretable prediction method for university student academic crisis warning, which consists of K-prototype-based student portrait construction and Catboost–SHAP-based academic achievement prediction. The academic crisis warning experiment is carried out on desensitization multi-source student data of a university. The experimental results show that the proposed method has significant advantages over common machine learning algorithms. In terms of achievement prediction, mean square error (MSE) reaches 24.976, mean absolute error (MAE) reaches 3.551, coefficient of determination ($$R^{2}$$ R 2 ) reaches 80.3%. The student portrait and Catboost–SHAP method are used for visual analysis of the academic achievement factors, which provide intuitive decision support and guidance assistance for education administrators.


Agronomy ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 1207
Author(s):  
Gonçalo C. Rodrigues ◽  
Ricardo P. Braga

This study aims to evaluate NASA POWER reanalysis products for daily surface maximum (Tmax) and minimum (Tmin) temperatures, solar radiation (Rs), relative humidity (RH) and wind speed (Ws) when compared with observed data from 14 distributed weather stations across Alentejo Region, Southern Portugal, with a hot summer Mediterranean climate. Results showed that there is good agreement between NASA POWER reanalysis and observed data for all parameters, except for wind speed, with coefficient of determination (R2) higher than 0.82, with normalized root mean square error (NRMSE) varying, from 8 to 20%, and a normalized mean bias error (NMBE) ranging from –9 to 26%, for those variables. Based on these results, and in order to improve the accuracy of the NASA POWER dataset, two bias corrections were performed to all weather variables: one for the Alentejo Region as a whole; another, for each location individually. Results improved significantly, especially when a local bias correction is performed, with Tmax and Tmin presenting an improvement of the mean NRMSE of 6.6 °C (from 8.0 °C) and 16.1 °C (from 20.5 °C), respectively, while a mean NMBE decreased from 10.65 to 0.2%. Rs results also show a very high goodness of fit with a mean NRMSE of 11.2% and mean NMBE equal to 0.1%. Additionally, bias corrected RH data performed acceptably with an NRMSE lower than 12.1% and an NMBE below 2.1%. However, even when a bias correction is performed, Ws lacks the performance showed by the remaining weather variables, with an NRMSE never lower than 19.6%. Results show that NASA POWER can be useful for the generation of weather data sets where ground weather stations data is of missing or unavailable.


Sign in / Sign up

Export Citation Format

Share Document