Systematic Feature Selection Process Applied in Short-Term Data-Driven Building Energy Forecasting Models: A Case Study of a Campus Building

Author(s):  
Liang Zhang ◽  
Jin Wen ◽  
Yimin Chen

An accurate building energy forecasting model is a key component for real-time and advanced control of building energy system and building-to-grid integration. With the fast deployment and advancement of building automation systems, data are collected by hundreds and sometimes thousands of sensors every few minutes in buildings, which provide great potential for data-driven building energy forecasting. To develop building energy forecasting models from a large number of potential inputs, feature selection is a critical procedure to ensure model accuracy and computation efficiency. Though the theory of feature selection is well developed in statistics and machine learning fields, it is not well studied in the application of building energy modeling. In this paper, a feature selection framework proposed in an earlier study is examined using a real campus building in Philadelphia. This feature selection framework combines domain knowledge and statistical methods and is developed for short-term data-driven building energy forecasting. In this case study, the feasibilities of using this feature selection framework in developing whole building energy forecasting model and chiller energy forecasting model are studied. Results show that, for both whole building and chiller energy forecasting applications, the model with systematic feature selection process presents better performance (in terms of cross validation error of forecasted output) than other models including that with conventional inputs and that uses only single feature selection technique.

Author(s):  
G. T. Alckmin ◽  
L. Kooistra ◽  
A. Lucieer ◽  
R. Rawnsley

<p><strong>Abstract.</strong> Vegetation indices (VIs) have been extensively employed as a feature for dry matter (DM) estimation. During the past five decades more than a hundred vegetation indices have been proposed. Inevitably, the selection of the optimal index or subset of indices is not trivial nor obvious. This study, performed on a year-round observation of perennial ryegrass (n&amp;thinsp;=&amp;thinsp;900), indicates that for this response variable (i.e. kg.DM.ha<sup>&amp;minus;1</sup>), more than 80% of indices present a high degree of collinearity (correlation&amp;thinsp;&amp;gt;&amp;thinsp;|0.8|.) Additionally, the absence of an established workflow for feature selection and modelling is a handicap when trying to establish meaningful relations between spectral data and biophysical/biochemical features. Within this case study, an unsupervised and supervised filtering process is proposed to an initial dataset of 97 VIs. This research analyses the effects of the proposed filtering and feature selection process to the overall stability of final models. Consequently, this analysis provides a straightforward framework to filter and select VIs. This approach was able to provide a reduced feature set for a robust model and to quantify trade-offs between optimal models (i.e. lowest root mean square error &amp;ndash; RMSE&amp;thinsp;=&amp;thinsp;412.27&amp;thinsp;kg.DM.ha<sup>&amp;minus;1</sup>) and tolerable models (with a smaller number of features &amp;ndash; 4 VIs and within 10% of the lowest RMSE.)</p>


Energies ◽  
2020 ◽  
Vol 13 (4) ◽  
pp. 780 ◽  
Author(s):  
Zihao Li ◽  
Daniel Friedrich ◽  
Gareth P. Harrison

There is great interest in data-driven modelling for the forecasting of building energy consumption while using machine learning (ML) modelling. However, little research considers classification-based ML models. This paper compares the regression and classification ML models for daily electricity and thermal load modelling in a large, mixed-use, university building. The independent feature variables of the model include outdoor temperature, historical energy consumption data sets, and several types of ‘agent schedules’ that provide proxy information that is based on broad classes of activity undertaken by the building’s inhabitants. The case study compares four different ML models testing three different feature sets with a genetic algorithm (GA) used to optimize the feature sets for those ML models without an embedded feature selection process. The results show that the regression models perform significantly better than classification models for the prediction of electricity demand and slightly better for the prediction of heat demand. The GA feature selection improves the performance of all models and demonstrates that historical heat demand, temperature, and the ‘agent schedules’, which derive from large occupancy fluctuations in the building, are the main factors influencing the heat demand prediction. For electricity demand prediction, feature selection picks almost all ‘agent schedule’ features that are available and the historical electricity demand. Historical heat demand is not picked as a feature for electricity demand prediction by the GA feature selection and vice versa. However, the exclusion of historical heat/electricity demand from the selected features significantly reduces the performance of the demand prediction.


2022 ◽  
Vol 13 (2) ◽  
pp. 1-20
Author(s):  
Byron Marshall ◽  
Michael Curry ◽  
Robert E. Crossler ◽  
John Correia

Survey items developed in behavioral Information Security (InfoSec) research should be practically useful in identifying individuals who are likely to create risk by failing to comply with InfoSec guidance. The literature shows that attitudes, beliefs, and perceptions drive compliance behavior and has influenced the creation of a multitude of training programs focused on improving ones’ InfoSec behaviors. While automated controls and directly observable technical indicators are generally preferred by InfoSec practitioners, difficult-to-monitor user actions can still compromise the effectiveness of automatic controls. For example, despite prohibition, doubtful or skeptical employees often increase organizational risk by using the same password to authenticate corporate and external services. Analysis of network traffic or device configurations is unlikely to provide evidence of these vulnerabilities but responses to well-designed surveys might. Guided by the relatively new IPAM model, this study administered 96 survey items from the Behavioral InfoSec literature, across three separate points in time, to 217 respondents. Using systematic feature selection techniques, manageable subsets of 29, 20, and 15 items were identified and tested as predictors of non-compliance with security policy. The feature selection process validates IPAM's innovation in using nuanced self-efficacy and planning items across multiple time frames. Prediction models were trained using several ML algorithms. Practically useful levels of prediction accuracy were achieved with, for example, ensemble tree models identifying 69% of the riskiest individuals within the top 25% of the sample. The findings indicate the usefulness of psychometric items from the behavioral InfoSec in guiding training programs and other cybersecurity control activities and demonstrate that they are promising as additional inputs to AI models that monitor networks for security events.


Author(s):  
Shuojiang Xu ◽  
Kim Hua Tan

From 21st century, enterprises combine supply chain management with big data to improve their products and services level. In China healthcare industry, supply chain decisions are made based on experience, due to the environment complexities, such as changing policies and license delay. A flexible and dynamic big data driven analysis approach for supply chain decisions is urgently required. This report demonstrates a case study on CRT forecasting model of inventory data to predict the market demand based on pervious transaction data. First a basic statistic approach has been applied to represent the superficial patterns and suggest some decisions. After that a CRT model has been built based on the several independent variables. And there is also a comparison between CRT and CHAID models to choose a better one to further build an improved model. Finally some limitations and future work have been proposed.


Sign in / Sign up

Export Citation Format

Share Document