Systematic Feature Selection Process Applied in Short-Term Data-Driven Building Energy Forecasting Models: A Case Study of a Campus Building

Volume 3: Vibration in Mechanical Systems; Modeling and Validation; Dynamic Systems and Control Education; Vibrations and Control of Systems; Modeling and Estimation for Vehicle Safety and Integrity; Modeling and Control of IC Engines and Aftertreatment Systems; Unmanned Aerial Vehicles (UAVs) and Their Applications; Dynamics and Control of Renewable Energy Systems; Energy Harvesting; Control of Smart Buildings and Microgrids; Energy Systems ◽

10.1115/dscc2017-5073 ◽

2017 ◽

Author(s):

Liang Zhang ◽

Jin Wen ◽

Yimin Chen

Keyword(s):

Feature Selection ◽

Selection Process ◽

Building Energy ◽

Data Driven ◽

Forecasting Model ◽

Energy Forecasting ◽

Systematic Feature ◽

Selection Framework ◽

Term Data

An accurate building energy forecasting model is a key component for real-time and advanced control of building energy system and building-to-grid integration. With the fast deployment and advancement of building automation systems, data are collected by hundreds and sometimes thousands of sensors every few minutes in buildings, which provide great potential for data-driven building energy forecasting. To develop building energy forecasting models from a large number of potential inputs, feature selection is a critical procedure to ensure model accuracy and computation efficiency. Though the theory of feature selection is well developed in statistics and machine learning fields, it is not well studied in the application of building energy modeling. In this paper, a feature selection framework proposed in an earlier study is examined using a real campus building in Philadelphia. This feature selection framework combines domain knowledge and statistical methods and is developed for short-term data-driven building energy forecasting. In this case study, the feasibilities of using this feature selection framework in developing whole building energy forecasting model and chiller energy forecasting model are studied. Results show that, for both whole building and chiller energy forecasting applications, the model with systematic feature selection process presents better performance (in terms of cross validation error of forecasted output) than other models including that with conventional inputs and that uses only single feature selection technique.

Download Full-text

A systematic feature selection procedure for short-term data-driven building energy forecasting model development

Energy and Buildings ◽

10.1016/j.enbuild.2018.11.010 ◽

2019 ◽

Vol 183 ◽

pp. 428-442 ◽

Cited By ~ 19

Author(s):

Liang Zhang ◽

Jin Wen

Keyword(s):

Feature Selection ◽

Model Development ◽

Selection Procedure ◽

Building Energy ◽

Data Driven ◽

Forecasting Model ◽

Short Term ◽

Energy Forecasting ◽

Systematic Feature ◽

Term Data

Download Full-text

Active Learning Strategy for High Fidelity Short-Term Data-Driven Building Energy Forecasting

Energy and Buildings ◽

10.1016/j.enbuild.2021.111026 ◽

2021 ◽

pp. 111026

Author(s):

Liang Zhang ◽

Jin Wen

Keyword(s):

Active Learning ◽

Learning Strategy ◽

Building Energy ◽

Data Driven ◽

High Fidelity ◽

Short Term ◽

Energy Forecasting ◽

Active Learning Strategy ◽

Term Data

Download Full-text

Data-driven Whole Building Energy Forecasting Model for Data Predictive Control

10.17918/3k51-2078 ◽

2021 ◽

Author(s):

Liang Zhang

Keyword(s):

Predictive Control ◽

Building Energy ◽

Data Driven ◽

Forecasting Model ◽

Energy Forecasting

Download Full-text

Data-Driven Modeling and Optimization of Building Energy Consumption: a Case Study

2020 IEEE Power & Energy Society General Meeting (PESGM) ◽

10.1109/pesgm41954.2020.9281663 ◽

2020 ◽

Author(s):

Divas Grover ◽

Yaser P. Fallah ◽

Qun Zhou ◽

P.E. Ian LaHiff

Keyword(s):

Energy Consumption ◽

Building Energy ◽

Data Driven ◽

Building Energy Consumption ◽

Modeling And Optimization ◽

Data Driven Modeling

Download Full-text

FEATURE FILTERING AND SELECTION FOR DRY MATTER ESTIMATION ON PERENNIAL RYEGRASS: A CASE STUDY OF VEGETATION INDICES

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-2-w13-1827-2019 ◽

2019 ◽

Vol XLII-2/W13 ◽

pp. 1827-1831 ◽

Cited By ~ 3

Author(s):

G. T. Alckmin ◽

L. Kooistra ◽

A. Lucieer ◽

R. Rawnsley

Keyword(s):

Feature Selection ◽

Perennial Ryegrass ◽

Dry Matter ◽

Selection Process ◽

Vegetation Indices ◽

Robust Model ◽

Trade Offs ◽

Initial Dataset ◽

High Degree

Abstract. Vegetation indices (VIs) have been extensively employed as a feature for dry matter (DM) estimation. During the past five decades more than a hundred vegetation indices have been proposed. Inevitably, the selection of the optimal index or subset of indices is not trivial nor obvious. This study, performed on a year-round observation of perennial ryegrass (n&thinsp;=&thinsp;900), indicates that for this response variable (i.e. kg.DM.ha&minus;1), more than 80% of indices present a high degree of collinearity (correlation&thinsp;&gt;&thinsp;|0.8|.) Additionally, the absence of an established workflow for feature selection and modelling is a handicap when trying to establish meaningful relations between spectral data and biophysical/biochemical features. Within this case study, an unsupervised and supervised filtering process is proposed to an initial dataset of 97 VIs. This research analyses the effects of the proposed filtering and feature selection process to the overall stability of final models. Consequently, this analysis provides a straightforward framework to filter and select VIs. This approach was able to provide a reduced feature set for a robust model and to quantify trade-offs between optimal models (i.e. lowest root mean square error &ndash; RMSE&thinsp;=&thinsp;412.27&thinsp;kg.DM.ha&minus;1) and tolerable models (with a smaller number of features &ndash; 4 VIs and within 10% of the lowest RMSE.)

Download Full-text

A systematic feature selection process for a Sinhala character recognition system

2016 IEEE International Conference on Information and Automation for Sustainability (ICIAfS) ◽

10.1109/iciafs.2016.7946523 ◽

2016 ◽

Cited By ~ 3

Author(s):

Titus Nanda Kumara ◽

Roshan Ragel

Keyword(s):

Feature Selection ◽

Character Recognition ◽

Selection Process ◽

Recognition System ◽

Systematic Feature

Download Full-text

Demand Forecasting for a Mixed-Use Building Using Agent-Schedule Information with a Data-Driven Model

Energies ◽

10.3390/en13040780 ◽

2020 ◽

Vol 13 (4) ◽

pp. 780 ◽

Cited By ~ 4

Author(s):

Zihao Li ◽

Daniel Friedrich ◽

Gareth P. Harrison

Keyword(s):

Feature Selection ◽

Energy Consumption ◽

Selection Process ◽

Demand Forecasting ◽

Electricity Demand ◽

Data Driven ◽

Feature Sets ◽

Demand Prediction ◽

Heat Demand ◽

Mixed Use

There is great interest in data-driven modelling for the forecasting of building energy consumption while using machine learning (ML) modelling. However, little research considers classification-based ML models. This paper compares the regression and classification ML models for daily electricity and thermal load modelling in a large, mixed-use, university building. The independent feature variables of the model include outdoor temperature, historical energy consumption data sets, and several types of ‘agent schedules’ that provide proxy information that is based on broad classes of activity undertaken by the building’s inhabitants. The case study compares four different ML models testing three different feature sets with a genetic algorithm (GA) used to optimize the feature sets for those ML models without an embedded feature selection process. The results show that the regression models perform significantly better than classification models for the prediction of electricity demand and slightly better for the prediction of heat demand. The GA feature selection improves the performance of all models and demonstrates that historical heat demand, temperature, and the ‘agent schedules’, which derive from large occupancy fluctuations in the building, are the main factors influencing the heat demand prediction. For electricity demand prediction, feature selection picks almost all ‘agent schedule’ features that are available and the historical electricity demand. Historical heat demand is not picked as a feature for electricity demand prediction by the GA feature selection and vice versa. However, the exclusion of historical heat/electricity demand from the selected features significantly reduces the performance of the demand prediction.

Download Full-text

Machine Learning and Survey-based Predictors of InfoSec Non-Compliance

ACM Transactions on Management Information Systems ◽

10.1145/3466689 ◽

2022 ◽

Vol 13 (2) ◽

pp. 1-20

Author(s):

Byron Marshall ◽

Michael Curry ◽

Robert E. Crossler ◽

John Correia

Keyword(s):

Feature Selection ◽

Security Policy ◽

Prediction Models ◽

Training Programs ◽

Selection Process ◽

Multiple Time ◽

Compliance Behavior ◽

Systematic Feature ◽

Tree Models ◽

Time Frames

Survey items developed in behavioral Information Security (InfoSec) research should be practically useful in identifying individuals who are likely to create risk by failing to comply with InfoSec guidance. The literature shows that attitudes, beliefs, and perceptions drive compliance behavior and has influenced the creation of a multitude of training programs focused on improving ones’ InfoSec behaviors. While automated controls and directly observable technical indicators are generally preferred by InfoSec practitioners, difficult-to-monitor user actions can still compromise the effectiveness of automatic controls. For example, despite prohibition, doubtful or skeptical employees often increase organizational risk by using the same password to authenticate corporate and external services. Analysis of network traffic or device configurations is unlikely to provide evidence of these vulnerabilities but responses to well-designed surveys might. Guided by the relatively new IPAM model, this study administered 96 survey items from the Behavioral InfoSec literature, across three separate points in time, to 217 respondents. Using systematic feature selection techniques, manageable subsets of 29, 20, and 15 items were identified and tested as predictors of non-compliance with security policy. The feature selection process validates IPAM's innovation in using nuanced self-efficacy and planning items across multiple time frames. Prediction models were trained using several ML algorithms. Practically useful levels of prediction accuracy were achieved with, for example, ensemble tree models identifying 69% of the riskiest individuals within the top 25% of the sample. The findings indicate the usefulness of psychometric items from the behavioral InfoSec in guiding training programs and other cybersecurity control activities and demonstrate that they are promising as additional inputs to AI models that monitor networks for security events.

Download Full-text

Data-Driven Inventory Management in the Healthcare Supply Chain

Supply Chain and Logistics Management ◽

10.4018/978-1-7998-0945-6.ch067 ◽

2020 ◽

pp. 1390-1403

Author(s):

Shuojiang Xu ◽

Kim Hua Tan

Keyword(s):

Big Data ◽

Supply Chain ◽

Inventory Management ◽

Data Driven ◽

Forecasting Model ◽

Market Demand ◽

Improved Model ◽

Future Work ◽

Statistic Approach

From 21st century, enterprises combine supply chain management with big data to improve their products and services level. In China healthcare industry, supply chain decisions are made based on experience, due to the environment complexities, such as changing policies and license delay. A flexible and dynamic big data driven analysis approach for supply chain decisions is urgently required. This report demonstrates a case study on CRT forecasting model of inventory data to predict the market demand based on pervious transaction data. First a basic statistic approach has been applied to represent the superficial patterns and suggest some decisions. After that a CRT model has been built based on the several independent variables. And there is also a comparison between CRT and CHAID models to choose a better one to further build an improved model. Finally some limitations and future work have been proposed.

Download Full-text