scholarly journals Stacking Ensemble Tree Models to Predict Energy Performance in Residential Buildings

2021 ◽  
Vol 13 (15) ◽  
pp. 8298
Author(s):  
Ahmed Salih Mohammed ◽  
Panagiotis G. Asteris ◽  
Mohammadreza Koopialipoor ◽  
Dimitrios E. Alexakis ◽  
Minas E. Lemonis ◽  
...  

In this research, a new machine-learning approach was proposed to evaluate the effects of eight input parameters (surface area, relative compactness, wall area, overall height, roof area, orientation, glazing area distribution, and glazing area) on two output parameters, namely, heating load (HL) and cooling load (CL), of the residential buildings. The association strength of each input parameter with each output was systematically investigated using a variety of basic statistical analysis tools to identify the most effective and important input variables. Then, different combinations of data were designed using the intelligent systems, and the best combination was selected, which included the most optimal input data for the development of stacking models. After that, various machine learning models, i.e., XGBoost, random forest, classification and regression tree, and M5 tree model, were applied and developed to predict HL and CL values of the energy performance of buildings. The mentioned techniques were also used as base techniques in the forms of stacking models. As a result, the XGboost-based model achieved a higher accuracy level (HL: coefficient of determination, R2 = 0.998; CL: R2 = 0.971) with a lower system error (HL: root mean square error, RMSE = 0.461; CL: RMSE = 1.607) than the other developed models in predicting both HL and CL values. Using new stacking-based techniques, this research was able to provide alternative solutions for predicting HL and CL parameters with appropriate accuracy and runtime.

Author(s):  
Bambang Biantoro ◽  
Hernadewita Hernadewita

Problem solving in the multistage production process is a challenge for the industry. The use of modern techniques such as machine learning in solving quality problems continues to be developed. One of the machine learning is decision tree. The tire industry entered the era of industrial revolution 4.0 with the use of information technology. Utilizing data using machine learning in finding the root cause of the problem can support the tire industry in industrial competition. This study aims to explore the process data in the tire industry to solve one of the tire quality problems, namely radial run-out tires. The technique of finding the root of the problem in this research is done using Classification and Regression Tree (CART) technique. Input variables involve 60 factors in the production process. From the research, it was found that the factors that influence the radial run out value are the lot of the Tread, Bead and Sidewall components. The factors causing the high radial run-out of the tires are the variations in the lot of the tire components Tread and Bead. The decision tree model that was formed has a precision level of 74.7% in detecting high radial run-out events. The effects of improvement on the lot tread and bead components resulting from the decision tree can reduce the defect of radial run out rate by 99.9%. Keywords: Decision tree; Root cause analysis;  Radial run-out Tire; Data mining AbstrakPemecahan masalah pada proses produksi multistage merupakan tantangan untuk indusri. Pemanfaatan teknik modern seperti machine learning dalam pemecahan masalah kualitas terus dikembangkan. Salah satu machine learning adalah decision tree. Industri ban memasuki era industri revolusi 4.0 dengan adanya pemakaian teknologi informasi seperti barcode atau radio frequency identification. Pemanfaataan data dengan menggunakan machine learning dalam pencarian akar masalah bisa mendukung industri ban dalam kompetisi industri. Penelitian ini bertujuan untuk mengekplorasi data proses pada industri ban untuk memecahkan permasalahan kualitas ban yaitu radial run-out ban. Teknik pencarian akar masalah dilakukan menggunakan Clasification and Regression Tree (CART). Variabel input melibatkan 60 faktor dalam proses produksi. Dari penelitian didapatkan faktor yang mempengaruhi nilai radial run out adalah lot komponen Tread, Bead dan Sidewall. Untuk faktor penyebab tingginya radial run-out ban adalah variasi lot komponen Tread dan Bead. Model decision tree yang terbentuk memiliki tingkat presisi 74,7% dalam mendeteksi kejadian radial run-out berkategori tinggi. Efek perbaikan pada komponen lot Tread dan Bead yang dihasilkan dari decision tree dapat menurunkan tingkat defect radial run- out ban sebesar 99,9%.


2020 ◽  
Author(s):  
Anurag Sohane ◽  
Ravinder Agarwal

Abstract Various simulation type tools and conventional algorithms are being used to determine knee muscle forces of human during dynamic movement. These all may be good for clinical uses, but have some drawbacks, such as higher computational times, muscle redundancy and less cost-effective solution. Recently, there has been an interest to develop supervised learning-based prediction model for the computationally demanding process. The present research work is used to develop a cost-effective and efficient machine learning (ML) based models to predict knee muscle force for clinical interventions for the given input parameter like height, mass and angle. A dataset of 500 human musculoskeletal, have been trained and tested using four different ML models to predict knee muscle force. This dataset has obtained from anybody modeling software using AnyPyTools, where human musculoskeletal has been utilized to perform squatting movement during inverse dynamic analysis. The result based on the datasets predicts that the random forest ML model outperforms than the other selected models: neural network, generalized linear model, decision tree in terms of mean square error (MSE), coefficient of determination (R2), and Correlation (r). The MSE of predicted vs actual muscle forces obtained from the random forest model for Biceps Femoris, Rectus Femoris, Vastus Medialis, Vastus Lateralis are 19.92, 9.06, 5.97, 5.46, Correlation are 0.94, 0.92, 0.92, 0.94 and R2 are 0.88, 0.84, 0.84 and 0.89 for the test dataset, respectively.


2021 ◽  
Vol 13 (4) ◽  
pp. 1595
Author(s):  
Valeria Todeschi ◽  
Roberto Boghetti ◽  
Jérôme H. Kämpf ◽  
Guglielmina Mutani

Building energy-use models and tools can simulate and represent the distribution of energy consumption of buildings located in an urban area. The aim of these models is to simulate the energy performance of buildings at multiple temporal and spatial scales, taking into account both the building shape and the surrounding urban context. This paper investigates existing models by simulating the hourly space heating consumption of residential buildings in an urban environment. Existing bottom-up urban-energy models were applied to the city of Fribourg in order to evaluate the accuracy and flexibility of energy simulations. Two common energy-use models—a machine learning model and a GIS-based engineering model—were compared and evaluated against anonymized monitoring data. The study shows that the simulations were quite precise with an annual mean absolute percentage error of 12.8 and 19.3% for the machine learning and the GIS-based engineering model, respectively, on residential buildings built in different periods of construction. Moreover, a sensitivity analysis using the Morris method was carried out on the GIS-based engineering model in order to assess the impact of input variables on space heating consumption and to identify possible optimization opportunities of the existing model.


2021 ◽  
Vol 9 ◽  
Author(s):  
Manish Pandey ◽  
Aman Arora ◽  
Alireza Arabameri ◽  
Romulus Costache ◽  
Naveen Kumar ◽  
...  

This study has developed a new ensemble model and tested another ensemble model for flood susceptibility mapping in the Middle Ganga Plain (MGP). The results of these two models have been quantitatively compared for performance analysis in zoning flood susceptible areas of low altitudinal range, humid subtropical fluvial floodplain environment of the Middle Ganga Plain (MGP). This part of the MGP, which is in the central Ganga River Basin (GRB), is experiencing worse floods in the changing climatic scenario causing an increased level of loss of life and property. The MGP experiencing monsoonal subtropical humid climate, active tectonics induced ground subsidence, increasing population, and shifting landuse/landcover trends and pattern, is the best natural laboratory to test all the susceptibility prediction genre of models to achieve the choice of best performing model with the constant number of input parameters for this type of topoclimatic environmental setting. This will help in achieving the goal of model universality, i.e., finding out the best performing susceptibility prediction model for this type of topoclimatic setting with the similar number and type of input variables. Based on the highly accurate flood inventory and using 12 flood predictors (FPs) (selected using field experience of the study area and literature survey), two machine learning (ML) ensemble models developed by bagging frequency ratio (FR) and evidential belief function (EBF) with classification and regression tree (CART), CART-FR and CART-EBF, were applied for flood susceptibility zonation mapping. Flood and non-flood points randomly generated using flood inventory have been apportioned in 70:30 ratio for training and validation of the ensembles. Based on the evaluation performance using threshold-independent evaluation statistic, area under receiver operating characteristic (AUROC) curve, 14 threshold-dependent evaluation metrices, and seed cell area index (SCAI) meant for assessing different aspects of ensembles, the study suggests that CART-EBF (AUCSR = 0.843; AUCPR = 0.819) was a better performant than CART-FR (AUCSR = 0.828; AUCPR = 0.802). The variability in performances of these novel-advanced ensembles and their comparison with results of other published models espouse the need of testing these as well as other genres of susceptibility models in other topoclimatic environments also. Results of this study are important for natural hazard managers and can be used to compute the damages through risk analysis.


2019 ◽  
Vol 18 (05) ◽  
pp. 1579-1603 ◽  
Author(s):  
Zhijiang Wan ◽  
Hao Zhang ◽  
Jiajin Huang ◽  
Haiyan Zhou ◽  
Jie Yang ◽  
...  

Many studies developed the machine learning method for discriminating Major Depressive Disorder (MDD) and normal control based on multi-channel electroencephalogram (EEG) data, less concerned about using single channel EEG collected from forehead scalp to discriminate the MDD. The EEG dataset is collected by the Fp1 and Fp2 electrode of a 32-channel EEG system. The result demonstrates that the classification performance based on the EEG of Fp1 location exceeds the performance based on the EEG of Fp2 location, and shows that single-channel EEG analysis can provide discrimination of MDD at the level of multi-channel EEG analysis. Furthermore, a portable EEG device collecting the signal from Fp1 location is used to collect the second dataset. The Classification and Regression Tree combining genetic algorithm (GA) achieves the highest accuracy of 86.67% based on leave-one-participant-out cross validation, which shows that the single-channel EEG-based machine learning method is promising to support MDD prescreening application.


Energies ◽  
2020 ◽  
Vol 13 (3) ◽  
pp. 571 ◽  
Author(s):  
Azadeh Sadeghi ◽  
Roohollah Younes Sinaki ◽  
William A. Young ◽  
Gary R. Weckman

As the level of greenhouse gas emissions increases, so does the importance of the energy performance of buildings (EPB). One of the main factors to measure EPB is a structure’s heating load (HL) and cooling load (CL). HLs and CLs depend on several variables, such as relative compactness, surface area, wall area, roof area, overall height, orientation, glazing area, and glazing area distribution. This research uses deep neural networks (DNNs) to forecast HLs and CLs for a variety of structures. The DNNs explored in this research include multi-layer perceptron (MLP) networks, and each of the models in this research was developed through extensive testing with a myriad number of layers, process elements, and other data preprocessing techniques. As a result, a DNN is shown to be an improvement for modeling HLs and CLs compared to traditional artificial neural network (ANN) models. In order to extract knowledge from a trained model, a post-processing technique, called sensitivity analysis (SA), was applied to the model that performed the best with respect to the selected goodness-of-fit metric on an independent set of testing data. There are two forms of SA—local and global methods—but both have the same purpose in terms of determining the significance of independent variables within a model. Local SA assumes inputs are independent of each other, while global SA does not. To further the contribution of the research presented within this article, the results of a global SA, called state-based sensitivity analysis (SBSA), are compared to the results obtained from a traditional local technique, called sensitivity analysis about the mean (SAAM). The results of the research demonstrate an improvement over existing conclusions found in literature, which is of particular interest to decision-makers and designers of building structures.


Energies ◽  
2020 ◽  
Vol 13 (17) ◽  
pp. 4300
Author(s):  
Kosuke Sasakura ◽  
Takeshi Aoki ◽  
Masayoshi Komatsu ◽  
Takeshi Watanabe

Data centers (DCs) are becoming increasingly important in recent years, and highly efficient and reliable operation and management of DCs is now required. The generated heat density of the rack and information and communication technology (ICT) equipment is predicted to get higher in the future, so it is crucial to maintain the appropriate temperature environment in the server room where high heat is generated in order to ensure continuous service. It is especially important to predict changes of rack intake temperature in the server room when the computer room air conditioner (CRAC) is shut down, which can cause a rapid rise in temperature. However, it is quite difficult to predict the rack temperature accurately, which in turn makes it difficult to determine the impact on service in advance. In this research, we propose a model that predicts the rack intake temperature after the CRAC is shut down. Specifically, we use machine learning to construct a gradient boosting decision tree model with data from the CRAC, ICT equipment, and rack intake temperature. Experimental results demonstrate that the proposed method has a very high prediction accuracy: the coefficient of determination was 0.90 and the root mean square error (RMSE) was 0.54. Our model makes it possible to evaluate the impact on service and determine if action to maintain the temperature environment is required. We also clarify the effect of explanatory variables and training data of the machine learning on the model accuracy.


10.2196/18910 ◽  
2020 ◽  
Vol 8 (7) ◽  
pp. e18910
Author(s):  
Debbie Rankin ◽  
Michaela Black ◽  
Raymond Bond ◽  
Jonathan Wallace ◽  
Maurice Mulvenna ◽  
...  

Background The exploitation of synthetic data in health care is at an early stage. Synthetic data could unlock the potential within health care datasets that are too sensitive for release. Several synthetic data generators have been developed to date; however, studies evaluating their efficacy and generalizability are scarce. Objective This work sets out to understand the difference in performance of supervised machine learning models trained on synthetic data compared with those trained on real data. Methods A total of 19 open health datasets were selected for experimental work. Synthetic data were generated using three synthetic data generators that apply classification and regression trees, parametric, and Bayesian network approaches. Real and synthetic data were used (separately) to train five supervised machine learning models: stochastic gradient descent, decision tree, k-nearest neighbors, random forest, and support vector machine. Models were tested only on real data to determine whether a model developed by training on synthetic data can used to accurately classify new, real examples. The impact of statistical disclosure control on model performance was also assessed. Results A total of 92% of models trained on synthetic data have lower accuracy than those trained on real data. Tree-based models trained on synthetic data have deviations in accuracy from models trained on real data of 0.177 (18%) to 0.193 (19%), while other models have lower deviations of 0.058 (6%) to 0.072 (7%). The winning classifier when trained and tested on real data versus models trained on synthetic data and tested on real data is the same in 26% (5/19) of cases for classification and regression tree and parametric synthetic data and in 21% (4/19) of cases for Bayesian network-generated synthetic data. Tree-based models perform best with real data and are the winning classifier in 95% (18/19) of cases. This is not the case for models trained on synthetic data. When tree-based models are not considered, the winning classifier for real and synthetic data is matched in 74% (14/19), 53% (10/19), and 68% (13/19) of cases for classification and regression tree, parametric, and Bayesian network synthetic data, respectively. Statistical disclosure control methods did not have a notable impact on data utility. Conclusions The results of this study are promising with small decreases in accuracy observed in models trained with synthetic data compared with models trained with real data, where both are tested on real data. Such deviations are expected and manageable. Tree-based classifiers have some sensitivity to synthetic data, and the underlying cause requires further investigation. This study highlights the potential of synthetic data and the need for further evaluation of their robustness. Synthetic data must ensure individual privacy and data utility are preserved in order to instill confidence in health care departments when using such data to inform policy decision-making.


Sign in / Sign up

Export Citation Format

Share Document