Data-Driven Modelling of Smart Building Ventilation Subsystem

Considering the advances in building monitoring and control through networks of interconnected devices, effective handling of the associated rich data streams is becoming an important challenge. In many situations, the application of conventional system identification or approximate grey-box models, partly theoretic and partly data driven, is either unfeasible or unsuitable. The paper discusses and illustrates an application of black-box modelling achieved using data mining techniques with the purpose of smart building ventilation subsystem control. We present the implementation and evaluation of a data mining methodology on collected data from over one year of operation. The case study is carried out on four air handling units of a modern campus building for preliminary decision support for facility managers. The data processing and learning framework is based on two steps: raw data streams are compressed using the Symbolic Aggregate Approximation method, followed by the resulting segments being input into a Support Vector Machine algorithm. The results are useful for deriving the behaviour of each equipment in various modi of operation and can be built upon for fault detection or energy efficiency applications. Challenges related to online operation within a commercial Building Management System are also discussed as the approach shows promise for deployment.

Download Full-text

Early Detection of Red Palm Weevil, Rhynchophorus ferrugineus (Olivier), Infestation Using Data Mining

Plants ◽

10.3390/plants10010095 ◽

2021 ◽

Vol 10 (1) ◽

pp. 95

Author(s):

Heba Kurdi ◽

Amal Al-Aldawsari ◽

Isra Al-Turaiki ◽

Abdulrahman S. Aldawood

Keyword(s):

Data Mining ◽

Plant Size ◽

Support Vector ◽

Classification Algorithms ◽

Palm Tree ◽

Rhynchophorus Ferrugineus ◽

Red Palm Weevil ◽

Palm Weevil ◽

Using Data ◽

F Measure

In the past 30 years, the red palm weevil (RPW), Rhynchophorus ferrugineus (Olivier), a pest that is highly destructive to all types of palms, has rapidly spread worldwide. However, detecting infestation with the RPW is highly challenging because symptoms are not visible until the death of the palm tree is inevitable. In addition, the use of automated RPW weevil identification tools to predict infestation is complicated by a lack of RPW datasets. In this study, we assessed the capability of 10 state-of-the-art data mining classification algorithms, Naive Bayes (NB), KSTAR, AdaBoost, bagging, PART, J48 Decision tree, multilayer perceptron (MLP), support vector machine (SVM), random forest, and logistic regression, to use plant-size and temperature measurements collected from individual trees to predict RPW infestation in its early stages before significant damage is caused to the tree. The performance of the classification algorithms was evaluated in terms of accuracy, precision, recall, and F-measure using a real RPW dataset. The experimental results showed that infestations with RPW can be predicted with an accuracy up to 93%, precision above 87%, recall equals 100%, and F-measure greater than 93% using data mining. Additionally, we found that temperature and circumference are the most important features for predicting RPW infestation. However, we strongly call for collecting and aggregating more RPW datasets to run more experiments to validate these results and provide more conclusive findings.

Download Full-text

Osteoporosis Risk Prediction Using Data Mining Algorithms

Journal of Community Health Research ◽

10.18502/jchr.v9i2.3401 ◽

2020 ◽

Author(s):

Efat Jabarpour ◽

Amin Abedini ◽

Abbasali Keshtkar

Keyword(s):

Data Mining ◽

Personal Information ◽

Disease Diagnosis ◽

Support Vector ◽

Data Mining Algorithms ◽

Industry Standard ◽

Disease Information ◽

Increased Risk ◽

Using Data ◽

Mining Algorithms

Introduction: Osteoporosis is a disease that reduces bone density and loses the quality of bone microstructure leading to an increased risk of fractures. It is one of the major causes of inability and death in elderly people. The current study aims at determining the factors influencing the incidence of osteoporosis and providing a predictive model for the disease diagnosis to increase the diagnostic speed and reduce diagnostic costs. Methods: An Individual's data including personal information, lifestyle, and disease information were reviewed. A new model has been presented based on the Cross-Industry Standard Process CRISP methodology. Besides, Support Vector Machine (SVM) and Bayes methods (Tree Augmented Naïve Bayes (TAN)) and Clementine12 have been used as data mining tools. Results: Some features have been detected to affect this disease. The rules have been extracted that can be used as a pattern for the prediction of the patients' status. Classification precision was calculated to be 88.39% for SVM, and 91.29% for (TAN) when the precision of TAN is higher comparing to other methods. Conclusion: The most effective factors concerning osteoporosis are detected and can be used for a new sample with defined characteristics to predict the possibility of osteoporosis in a person.

Download Full-text

Dynamic data processing for building energy consumption

Journal of Construction Materials ◽

10.36756/jcm.v2.2.4 ◽

2021 ◽

Vol 2 (2) ◽

Author(s):

Farid Sartipi ◽

Keyword(s):

Energy Consumption ◽

Local Governments ◽

Study Data ◽

Commercial Building ◽

Air Conditioners ◽

Smart Building ◽

Building Management ◽

Using Data ◽

The Moment ◽

The Given

With the growing attention to smart buildings, local governments are seeking practical ways to optimize the energy consumption of commercial buildings. An ideal smart building is capable of monitoring its own energy consumption and adjusting the operation of electric devices, being lighting and air conditioners, based on the occupant behaviour. In this study, data had been obtained from the monitoring sensors in a commercial building located in the heart of Sydney from 2013 until 2020 on a 15-minute time intervals. The data derivation and analysis are intrinsically static at the moment which makes it difficult for building management to make instantaneous decision regarding the measures to be taken for a lower energy consumption. Using data analysis and visualization tools in Tableau, this study provides detailed insights about the trends in energy consumption in the given building. The outcomes facilitate the decision making for building management and can be seen as a milestone towards a dynamic optimization protocol in a bigger picture which is introduced in the second part of this study.

Download Full-text

Email Worm Detection Using Data Mining

Techniques and Applications for Advanced Information Privacy and Security ◽

10.4018/978-1-60566-210-7.ch002 ◽

2011 ◽

pp. 20-34

Author(s):

Mohammad M. Masud ◽

Latifur Khan ◽

Bhavani Thuraisingham

Keyword(s):

Data Mining ◽

Feature Selection ◽

Principal Component ◽

Classification Model ◽

Support Vector ◽

Two Phase ◽

Feature Selection Technique ◽

Worm Detection ◽

Phase Selection ◽

Using Data

This chapter applies data mining techniques to detect email worms. Email messages contain a number of different features such as the total number of words in message body/subject, presence/absence of binary attachments, type of attachments, and so on. The goal is to obtain an efficient classification model based on these features. The solution consists of several steps. First, the number of features is reduced using two different approaches: feature-selection and dimension-reduction. This step is necessary to reduce noise and redundancy from the data. The feature-selection technique is called Two-phase Selection (TPS), which is a novel combination of decision tree and greedy selection algorithm. The dimensionreduction is performed by Principal Component Analysis. Second, the reduced data is used to train a classifier. Different classification techniques have been used, such as Support Vector Machine (SVM), Naïve Bayes and their combination. Finally, the trained classifiers are tested on a dataset containing both known and unknown types of worms. These results have been compared with published results. It is found that the proposed TPS selection along with SVM classification achieves the best accuracy in detecting both known and unknown types of worms.

Download Full-text

Multi-step streamflow forecasting using data-driven non-linear methods in contrasting climate regimes

Journal of Hydroinformatics ◽

10.2166/hydro.2013.042 ◽

2013 ◽

Vol 16 (3) ◽

pp. 671-689 ◽

Cited By ~ 25

Author(s):

Daniel J. Karran ◽

Efrat Morin ◽

Jan Adamowski

Keyword(s):

Wavelet Transforms ◽

Model Performance ◽

Probability Of Detection ◽

Data Driven ◽

Coefficient Of Determination ◽

Support Vector ◽

Lead Times ◽

Streamflow Forecasting ◽

Non Linear ◽

Using Data

Considering the popularity of using data-driven non-linear methods for forecasting streamflow, there has been no exploration of how well such models perform in climate regimes with differing hydrological characteristics, nor has the performance of these models, coupled with wavelet transforms, been compared for lead times of less than 1 month. This study compares the use of four different models, namely artificial neural networks (ANNs), support vector regression (SVR), wavelet-ANN, and wavelet-SVR in a Mediterranean, Oceanic, and Hemiboreal watershed. Model performance was tested for 1, 2 and 3 day forecasting lead times, measured by fractional standard error, the coefficient of determination, Nash–Sutcliffe model efficiency, multiplicative bias, probability of detection and false alarm rate. SVR based models performed best overall, but no one model outperformed the others in more than one watershed, suggesting that some models may be more suitable for certain types of data. Overall model performance varied greatly between climate regimes, suggesting that higher persistence and slower hydrological processes (i.e. snowmelt, glacial runoff, and subsurface flow) support reliable forecasting using daily and multi-day lead times.

Download Full-text

Prognosis of Diabetes Using Data mining Approach-Fuzzy C Means Clustering and Support Vector Machine

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v11p120 ◽

2014 ◽

Vol 11 (2) ◽

pp. 94-98 ◽

Cited By ~ 14

Author(s):

Ravi Sanakal ◽

◽

Smt. T Jayakumari

Keyword(s):

Data Mining ◽

Support Vector Machine ◽

Support Vector ◽

Fuzzy C Means ◽

Data Mining Approach ◽

Fuzzy C Means Clustering ◽

Using Data

Download Full-text

Data mining, fuzzy AHP and TOPSIS for optimizing taxpayer supervision

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v18.i1.pp75-87 ◽

2020 ◽

Vol 18 (1) ◽

pp. 75

Author(s):

M. Jupri ◽

Riyanarto Sarno

Keyword(s):

Data Mining ◽

Nearest Neighbor ◽

Fuzzy Ahp ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Mining Algorithms ◽

Using Data ◽

Time Required ◽

Mining Algorithms

The achievement of accepting optimal tax need effective and efficient tax supervision can be achieved by classifying taxpayer compliance to tax regulations. Considering this issue, this paper proposes the classification of taxpayer compliance using data mining algorithms; i.e. C4.5, Support Vector Machine, K-Nearest Neighbor, Naive Bayes, and Multilayer Perceptron based on the compliance of taxpayer data. The taxpayer compliance can be classified into four classes, which are (1) formal and material compliant taxpayers, (2) formal compliant taxpayers, (3) material compliant taxpayers, and (4) formal and material non-compliant taxpayers. Furthermore, the results of data mining algorithms are compared by using Fuzzy AHP and TOPSIS to determine the best performance classification based on the criteria of Accuracy, F-Score, and Time required. Selection of the taxpayer's priority for more detailed supervision at each level of taxpayer compliance is ranked using Fuzzy AHP and TOPSIS based on criteria of dataset variables. The results show that C4.5 is the best performance classification and achieves preference value of 0.998; whereas the MLP algorithm results from the lowest preference value of 0.131. Alternative taxpayer A233 is the top priority taxpayer with a preference value of 0.433; whereas alternative taxpayer A051 is the lowest priority taxpayer with a preference value of 0.036.

Download Full-text

Fusion of Unobtrusive Sensing Solutions for Home-Based Activity Recognition and Classification Using Data Mining Models and Methods

Applied Sciences ◽

10.3390/app11199096 ◽

2021 ◽

Vol 11 (19) ◽

pp. 9096

Author(s):

Idongesit Ekerete ◽

Matias Garcia-Constantino ◽

Alexandros Konios ◽

Mustafa A. Mustafa ◽

Yohanca Diaz-Skeete ◽

...

Keyword(s):

Data Mining ◽

Activity Recognition ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Home Environments ◽

Use Of Data ◽

Home Based ◽

Actual Recognition ◽

Using Data ◽

Unobtrusive Sensing

This paper proposes the fusion of Unobtrusive Sensing Solutions (USSs) for human Activity Recognition and Classification (ARC) in home environments. It also considers the use of data mining models and methods for cluster-based analysis of datasets obtained from the USSs. The ability to recognise and classify activities performed in home environments can help monitor health parameters in vulnerable individuals. This study addresses five principal concerns in ARC: (i) users’ privacy, (ii) wearability, (iii) data acquisition in a home environment, (iv) actual recognition of activities, and (v) classification of activities from single to multiple users. Timestamp information from contact sensors mounted at strategic locations in a kitchen environment helped obtain the time, location, and activity of 10 participants during the experiments. A total of 11,980 thermal blobs gleaned from privacy-friendly USSs such as ceiling and lateral thermal sensors were fused using data mining models and methods. Experimental results demonstrated cluster-based activity recognition, classification, and fusion of the datasets with an average regression coefficient of 0.95 for tested features and clusters. In addition, a pooled Mean accuracy of 96.5% was obtained using classification-by-clustering and statistical methods for models such as Neural Network, Support Vector Machine, K-Nearest Neighbour, and Stochastic Gradient Descent on Evaluation Test.

Download Full-text

Data-driven decision-making in creating class rosters

Journal of Research in Innovative Teaching & Learning ◽

10.1108/jrit-03-2019-0045 ◽

2020 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Rebecca Wolf ◽

Joseph M. Reilly ◽

Steven M. Ross

Keyword(s):

Decision Making ◽

Student Learning ◽

Low Income ◽

School Leaders ◽

Data Driven ◽

Data Driven Decision Making ◽

Content Type ◽

Use Of Data ◽

Using Data ◽

Rich Data

PurposeThis article informs school leaders and staffs about existing research findings on the use of data-driven decision-making in creating class rosters. Given that teachers are the most important school-based educational resource, decisions regarding the assignment of students to particular classes and teachers are highly impactful for student learning. Classroom compositions of peers can also influence student learning.Design/methodology/approachA literature review was conducted on the use of data-driven decision-making in the rostering process. The review addressed the merits of using various quantitative metrics in the rostering process.FindingsFindings revealed that, despite often being purposeful about rostering, school leaders and staffs have generally not engaged in data-driven decision-making in creating class rosters. Using data-driven rostering may have benefits, such as limiting the questionable practice of assigning the least effective teachers in the school to the youngest or lowest performing students. School leaders and staffs may also work to minimize negative peer effects due to concentrating low-achieving, low-income, or disruptive students in any one class. Any data-driven system used in rostering, however, would need to be adequately complex to account for multiple influences on student learning. Based on the research reviewed, quantitative data alone may not be sufficient for effective rostering decisions.Practical implicationsGiven the rich data available to school leaders and staffs, data-driven decision-making could inform rostering and contribute to more efficacious and equitable classroom assignments.Originality/valueThis article is the first to summarize relevant research across multiple bodies of literature on the opportunities for and challenges of using data-driven decision-making in creating class rosters.

Download Full-text