Performance of Repeated Cross Validation for Machine Learning Models in Building Energy Analysis

Author(s):  
Xiangfei Li ◽  
Baoquan Yin ◽  
Wei Tian ◽  
Yu Sun
2020 ◽  
Vol 15 (3) ◽  
pp. 83-93
Author(s):  
Wei Tian ◽  
Chuanqi Zhu ◽  
Yunliang Liu ◽  
Baoquan Yin ◽  
Jiaxin Shi

ABSTRACT Urban building energy analysis has attracted more attention as the population living in cities increases as does the associated energy consumption in urban environments. This paper proposes a systematic bottom-up method to conduct energy analysis and assess energy saving potentials by combining dynamic engineering-based energy models, machine learning models, and global sensitivity analysis within the GIS (Geographic Information System) environment for large-scale urban buildings. This method includes five steps: database construction of building parameters, automation of creating building models at the GIS environment, construction of machine learning models for building energy assessment, sensitivity analysis for choosing energy saving measures, and GIS visual evaluation of energy saving schemes. Campus buildings in Tianjin (China) are used as a case study to demonstrate the application of the method proposed in this research. The results indicate that the method proposed here can provide reliable and fast analysis to evaluate the energy performance of urban buildings and determine effective energy saving measures to reduce energy consumption of urban buildings. Moreover, the GIS-based analysis is very useful to both create energy models of buildings and display energy analysis results for urban buildings.


2021 ◽  
Vol 13 (3) ◽  
pp. 408
Author(s):  
Charles Nickmilder ◽  
Anthony Tedde ◽  
Isabelle Dufrasne ◽  
Françoise Lessire ◽  
Bernard Tychon ◽  
...  

Accurate information about the available standing biomass on pastures is critical for the adequate management of grazing and its promotion to farmers. In this paper, machine learning models are developed to predict available biomass expressed as compressed sward height (CSH) from readily accessible meteorological, optical (Sentinel-2) and radar satellite data (Sentinel-1). This study assumed that combining heterogeneous data sources, data transformations and machine learning methods would improve the robustness and the accuracy of the developed models. A total of 72,795 records of CSH with a spatial positioning, collected in 2018 and 2019, were used and aggregated according to a pixel-like pattern. The resulting dataset was split into a training one with 11,625 pixellated records and an independent validation one with 4952 pixellated records. The models were trained with a 19-fold cross-validation. A wide range of performances was observed (with mean root mean square error (RMSE) of cross-validation ranging from 22.84 mm of CSH to infinite-like values), and the four best-performing models were a cubist, a glmnet, a neural network and a random forest. These models had an RMSE of independent validation lower than 20 mm of CSH at the pixel-level. To simulate the behavior of the model in a decision support system, performances at the paddock level were also studied. These were computed according to two scenarios: either the predictions were made at a sub-parcel level and then aggregated, or the data were aggregated at the parcel level and the predictions were made for these aggregated data. The results obtained in this study were more accurate than those found in the literature concerning pasture budgeting and grassland biomass evaluation. The training of the 124 models resulting from the described framework was part of the realization of a decision support system to help farmers in their daily decision making.


2018 ◽  
Vol 124 (5) ◽  
pp. 1284-1293 ◽  
Author(s):  
Alexander H. K. Montoye ◽  
Bradford S. Westgate ◽  
Morgan R. Fonley ◽  
Karin A. Pfeiffer

Wrist-worn accelerometers are gaining popularity for measurement of physical activity. However, few methods for predicting physical activity intensity from wrist-worn accelerometer data have been tested on data not used to create the methods (out-of-sample data). This study utilized two previously collected data sets [Ball State University (BSU) and Michigan State University (MSU)] in which participants wore a GENEActiv accelerometer on the left wrist while performing sedentary, lifestyle, ambulatory, and exercise activities in simulated free-living settings. Activity intensity was determined via direct observation. Four machine learning models (plus 2 combination methods) and six feature sets were used to predict activity intensity (30-s intervals) with the accelerometer data. Leave-one-out cross-validation and out-of-sample testing were performed to evaluate accuracy in activity intensity prediction, and classification accuracies were used to determine differences among feature sets and machine learning models. In out-of-sample testing, the random forest model (77.3–78.5%) had higher accuracy than other machine learning models (70.9–76.4%) and accuracy similar to combination methods (77.0–77.9%). Feature sets utilizing frequency-domain features had improved accuracy over other feature sets in leave-one-out cross-validation (92.6–92.8% vs. 87.8–91.9% in MSU data set; 79.3–80.2% vs. 76.7–78.4% in BSU data set) but similar or worse accuracy in out-of-sample testing (74.0–77.4% vs. 74.1–79.1% in MSU data set; 76.1–77.0% vs. 75.5–77.3% in BSU data set). All machine learning models outperformed the euclidean norm minus one/GGIR method in out-of-sample testing (69.5–78.5% vs. 53.6–70.6%). From these results, we recommend out-of-sample testing to confirm generalizability of machine learning models. Additionally, random forest models and feature sets with only time-domain features provided the best accuracy for activity intensity prediction from a wrist-worn accelerometer. NEW & NOTEWORTHY This study includes in-sample and out-of-sample cross-validation of an alternate method for deriving meaningful physical activity outcomes from accelerometer data collected with a wrist-worn accelerometer. This method uses machine learning to directly predict activity intensity. By so doing, this study provides a classification model that may avoid high errors present with energy expenditure prediction while still allowing researchers to assess adherence to physical activity guidelines.


2020 ◽  
pp. 98-105
Author(s):  
Darshan Jagannath Pangarkar ◽  
Rajesh Sharma ◽  
Amita Sharma ◽  
Madhu Sharma

Prediction of crop yield can help traders, agri-business and government agencies to plan their activities accordingly. It can help government agencies to manage situations like over or under production. Traditionally statistical and crop simulation methods are used for this purpose. Machine learning models can be great deal of help. Aim of present study is to assess the predictive ability of various machine learning models for Cluster bean (Cyamopsis tetragonoloba L. Taub.) yield prediction. Various machine learning models were applied and tested on panel data of 19 years i.e. from 1999-2000 to 2017-18 for the Bikaner district of Rajasthan. Various data mining steps were performed before building a model. K- Nearest Nighbors (K-NN), Support Vector Regression (SVR) with various kernels, and Random forest regression were applied. Cross validation was also performed to know extra sampler validity. The best fitted model was chosen based cross validation scores and R2 values. Besides the coefficient of determination (R2), root mean squared error (RMSE), mean absolute error (MAE), and root relative squared error (RRSE) were calculated for the testing set. Support vector regression with linear kernel has the lowest RMSE (23.19), RRSE (0.14), MAE (19.27) values followed by random forest regression and second-degree polynomial support vector regression with the value of gamma = auto. Instead there was a little difference with R2, placing support vector regression first (98.31%), followed by second-degree polynomial support vector regression with value of gamma = auto (89.83%) and second-degree polynomial support vector regression with value of gamma = scale (88.83%). On two-fold cross validation, support vector regression with a linear kernel had the highest cross validation score explaining 71% (+/-0.03) followed by second-degree polynomial support vector regression with a value of gamma = auto and random forest regression. KNN and support vector regression with radial basis function as a kernel function had negative cross validation scores. Support vector regression with linear kernel was found to be the best-fitted model for predicting the yield as it had higher sample validity (98.31%) and global validity (71%).


2019 ◽  
Vol 47 ◽  
pp. 101484 ◽  
Author(s):  
Saleh Seyedzadeh ◽  
Farzad Pour Rahimian ◽  
Parag Rastogi ◽  
Ivan Glesk

Sign in / Sign up

Export Citation Format

Share Document