Multi-View Data Analysis Techniques for Monitoring Smart Building Systems

In smart buildings, many different systems work in coordination to accomplish their tasks. In this process, the sensors associated with these systems collect large amounts of data generated in a streaming fashion, which is prone to concept drift. Such data are heterogeneous due to the wide range of sensors collecting information about different characteristics of the monitored systems. All these make the monitoring task very challenging. Traditional clustering algorithms are not well equipped to address the mentioned challenges. In this work, we study the use of MV Multi-Instance Clustering algorithm for multi-view analysis and mining of smart building systems’ sensor data. It is demonstrated how this algorithm can be used to perform contextual as well as integrated analysis of the systems. Various scenarios in which the algorithm can be used to analyze the data generated by the systems of a smart building are examined and discussed in this study. In addition, it is also shown how the extracted knowledge can be visualized to detect trends in the systems’ behavior and how it can aid domain experts in the systems’ maintenance. In the experiments conducted, the proposed approach was able to successfully detect the deviating behaviors known to have previously occurred and was also able to identify some new deviations during the monitored period. Based on the results obtained from the experiments, it can be concluded that the proposed algorithm has the ability to be used for monitoring, analysis, and detecting deviating behaviors of the systems in a smart building domain.

Download Full-text

Data Fusion Using a Multi-Sensor Sparse-Based Clustering Algorithm

Remote Sensing ◽

10.3390/rs12234007 ◽

2020 ◽

Vol 12 (23) ◽

pp. 4007

Author(s):

Kasra Rafiezadeh Shahi ◽

Pedram Ghamisi ◽

Behnood Rasti ◽

Robert Jackisch ◽

Paul Scheunders ◽

...

Keyword(s):

Clustering Algorithm ◽

Spatial Information ◽

Clustering Algorithms ◽

Hyperspectral Data ◽

Sensor Data ◽

Data Sets ◽

Data Types ◽

Data Set ◽

Multiple Data Sets ◽

Imaging Sensors

The increasing amount of information acquired by imaging sensors in Earth Sciences results in the availability of a multitude of complementary data (e.g., spectral, spatial, elevation) for monitoring of the Earth’s surface. Many studies were devoted to investigating the usage of multi-sensor data sets in the performance of supervised learning-based approaches at various tasks (i.e., classification and regression) while unsupervised learning-based approaches have received less attention. In this paper, we propose a new approach to fuse multiple data sets from imaging sensors using a multi-sensor sparse-based clustering algorithm (Multi-SSC). A technique for the extraction of spatial features (i.e., morphological profiles (MPs) and invariant attribute profiles (IAPs)) is applied to high spatial-resolution data to derive the spatial and contextual information. This information is then fused with spectrally rich data such as multi- or hyperspectral data. In order to fuse multi-sensor data sets a hierarchical sparse subspace clustering approach is employed. More specifically, a lasso-based binary algorithm is used to fuse the spectral and spatial information prior to automatic clustering. The proposed framework ensures that the generated clustering map is smooth and preserves the spatial structures of the scene. In order to evaluate the generalization capability of the proposed approach, we investigate its performance not only on diverse scenes but also on different sensors and data types. The first two data sets are geological data sets, which consist of hyperspectral and RGB data. The third data set is the well-known benchmark Trento data set, including hyperspectral and LiDAR data. Experimental results indicate that this novel multi-sensor clustering algorithm can provide an accurate clustering map compared to the state-of-the-art sparse subspace-based clustering algorithms.

Download Full-text

Clustering Algorithms and Validation Indices for a Wide mmWave Spectrum

Information ◽

10.3390/info10090287 ◽

2019 ◽

Vol 10 (9) ◽

pp. 287 ◽

Cited By ~ 2

Author(s):

Bogdan Antonescu ◽

Miead Tehrani Moayyed ◽

Stefano Basagni

Keyword(s):

Communication Systems ◽

Clustering Algorithm ◽

Radio Channel ◽

Clustering Algorithms ◽

Wireless Communication Systems ◽

Cluster Validity Indices ◽

Validity Indices ◽

Wide Range ◽

Radio Signals ◽

Urban Scenario

Radio channel propagation models for the millimeter wave (mmWave) spectrum are extremely important for planning future 5G wireless communication systems. Transmitted radio signals are received as clusters of multipath rays. Identifying these clusters provides better spatial and temporal characteristics of the mmWave channel. This paper deals with the clustering process and its validation across a wide range of frequencies in the mmWave spectrum below 100 GHz. By way of simulations, we show that in outdoor communication scenarios clustering of received rays is influenced by the frequency of the transmitted signal. This demonstrates the sparse characteristic of the mmWave spectrum (i.e., we obtain a lower number of rays at the receiver for the same urban scenario). We use the well-known k-means clustering algorithm to group arriving rays at the receiver. The accuracy of this partitioning is studied with both cluster validity indices (CVIs) and score fusion techniques. Finally, we analyze how the clustering solution changes with narrower-beam antennas, and we provide a comparison of the cluster characteristics for different types of antennas.

Download Full-text

Enhancement of Sales promotion using Clustering Techniques in Data Mart

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v15i2.6934 ◽

2015 ◽

Vol 15 (2) ◽

pp. 6534-6540

Author(s):

Vithya Gopalakrishnan

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Linear Time ◽

Clustering Algorithms ◽

Unsupervised Classification ◽

Sales Promotion ◽

Golay Code ◽

Sales Data ◽

Wide Range ◽

Noise Data

Clustering is an important research topic in wide range of unsupervised classification application. Clustering is a technique, which divides a data into meaningful groups. K-means algorithm is one of the popular clustering algorithms. It belongs to partition based grouping techniques, which are based on the iterative relocation of data points between clusters. It does not support global clustering and it has linear time complexity of O(n2). The existing and conventional data clustering algorithms were nâ€™t designed to handle the huge amount of data. So, to overcome these issues Golay code clustering algorithm is selected. Golay code based system used to facilitate the identification of the set of codeword incarnate similar object behaviors. The time complexity associated with Golay code-clustering algorithm is O(n). In this work, the collected sales data is pre processed by removing all null and empty attributes, then eliminating redundant, and noise data. To enhance the sales promotion, K-means and Golay code clustering algorithms are used to cluster the sales data in terms of place and item. Performances of these algorithms are analyzed in terms of accuracy and execution time. Our results show that the Golay code algorithm outperforms than K-mean algorithm in all factors.

Download Full-text

Accurate recapture identification for genetic mark–recapture studies with error-tolerant likelihood-based match calling and sample clustering

Royal Society Open Science ◽

10.1098/rsos.160457 ◽

2016 ◽

Vol 3 (12) ◽

pp. 160457 ◽

Cited By ~ 6

Author(s):

Suresh A. Sethi ◽

Daniel Linden ◽

John Wenburg ◽

Cara Lewis ◽

Patrick Lemons ◽

...

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Study Data ◽

Genotyping Error ◽

Nucleotide Polymorphisms ◽

Pacific Walrus ◽

Mark Recapture ◽

Pekania Pennanti ◽

Wide Range ◽

The Face

Error-tolerant likelihood-based match calling presents a promising technique to accurately identify recapture events in genetic mark–recapture studies by combining probabilities of latent genotypes and probabilities of observed genotypes, which may contain genotyping errors. Combined with clustering algorithms to group samples into sets of recaptures based upon pairwise match calls, these tools can be used to reconstruct accurate capture histories for mark–recapture modelling. Here, we assess the performance of a recently introduced error-tolerant likelihood-based match-calling model and sample clustering algorithm for genetic mark–recapture studies. We assessed both biallelic (i.e. single nucleotide polymorphisms; SNP) and multiallelic (i.e. microsatellite; MSAT) markers using a combination of simulation analyses and case study data on Pacific walrus ( Odobenus rosmarus divergens ) and fishers ( Pekania pennanti ). A novel two-stage clustering approach is demonstrated for genetic mark–recapture applications. First, repeat captures within a sampling occasion are identified. Subsequently, recaptures across sampling occasions are identified. The likelihood-based matching protocol performed well in simulation trials, demonstrating utility for use in a wide range of genetic mark–recapture studies. Moderately sized SNP (64+) and MSAT (10–15) panels produced accurate match calls for recaptures and accurate non-match calls for samples from closely related individuals in the face of low to moderate genotyping error. Furthermore, matching performance remained stable or increased as the number of genetic markers increased, genotyping error notwithstanding.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text

Binary Spectrum Feature for Improved Classiﬁer Performance

10.36227/techrxiv.12993122 ◽

2020 ◽

Author(s):

Nalika Ulapane ◽

Karthick Thiyagarajan ◽

sarath kodagoda

Keyword(s):

Machine Learning ◽

Classification Performance ◽

Feature Reduction ◽

Sensor Data ◽

Machine Learning Techniques ◽

Support Vector ◽

Svm Classifier ◽

Monitoring Task ◽

Classifier Performance ◽

Spectrum Feature

<div>Classiﬁcation has become a vital task in modern machine learning and Artiﬁcial Intelligence applications, including smart sensing. Numerous machine learning techniques are available to perform classiﬁcation. Similarly, numerous practices, such as feature selection (i.e., selection of a subset of descriptor variables that optimally describe the output), are available to improve classiﬁer performance. In this paper, we consider the case of a given supervised learning classiﬁcation task that has to be performed making use of continuous-valued features. It is assumed that an optimal subset of features has already been selected. Therefore, no further feature reduction, or feature addition, is to be carried out. Then, we attempt to improve the classiﬁcation performance by passing the given feature set through a transformation that produces a new feature set which we have named the “Binary Spectrum”. Via a case study example done on some Pulsed Eddy Current sensor data captured from an infrastructure monitoring task, we demonstrate how the classiﬁcation accuracy of a Support Vector Machine (SVM) classiﬁer increases through the use of this Binary Spectrum feature, indicating the feature transformation’s potential for broader usage.</div><div><br></div>

Download Full-text

User Power Behavior Similarity Clustering Based on Unsupervised Extreme Learning Machine Algorithm

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096512666191004130655 ◽

2020 ◽

Vol 13 (5) ◽

pp. 641-649

Author(s):

Yuancheng Li ◽

Yaqi Cui ◽

Xiaolong Zhang

Keyword(s):

Extreme Learning Machine ◽

Clustering Algorithm ◽

Characteristic Curve ◽

Clustering Algorithms ◽

Data Sets ◽

Residential Areas ◽

Processing Power ◽

Learning Machine ◽

Advanced Metering ◽

Matlab Programming

Background: Advanced Metering Infrastructure (AMI) for the smart grid is growing rapidly which results in the exponential growth of data collected and transmitted in the device. By clustering this data, it can give the electricity company a better understanding of the personalized and differentiated needs of the user. Objective: The existing clustering algorithms for processing data generally have some problems, such as insufficient data utilization, high computational complexity and low accuracy of behavior recognition. Methods: In order to improve the clustering accuracy, this paper proposes a new clustering method based on the electrical behavior of the user. Starting with the analysis of user load characteristics, the user electricity data samples were constructed. The daily load characteristic curve was extracted through improved extreme learning machine clustering algorithm and effective index criteria. Moreover, clustering analysis was carried out for different users from industrial areas, commercial areas and residential areas. The improved extreme learning machine algorithm, also called Unsupervised Extreme Learning Machine (US-ELM), is an extension and improvement of the original Extreme Learning Machine (ELM), which realizes the unsupervised clustering task on the basis of the original ELM. Results: Four different data sets have been experimented and compared with other commonly used clustering algorithms by MATLAB programming. The experimental results show that the US-ELM algorithm has higher accuracy in processing power data. Conclusion: The unsupervised ELM algorithm can greatly reduce the time consumption and improve the effectiveness of clustering.

Download Full-text

Career Development Theory: An Integrated Analysis

The Oxford Handbook of Career Development ◽

10.1093/oxfordhb/9780190069704.013.10 ◽

2020 ◽

Author(s):

Julia Yates

Keyword(s):

Career Choice ◽

Subject Matter ◽

Academic Disciplines ◽

Development Theory ◽

Integrated Analysis ◽

Theoretical Frameworks ◽

Wide Range ◽

Comprehensive Picture ◽

The Subject ◽

Career Practice

Career theories are developed to help make sense of the complexity of career choice and development. The intricacy of the subject matter is such that career theories most often focus on one or two aspects of the phenomenon. As such, the challenges of integrating the theories with each other, and integrating them within career practice, are not insignificant. In this chapter, an overview of the theoretical landscape is offered that illustrates how the theories align with each other to build up a comprehensive picture of career choice and development. The chapter introduces a wide range of theoretical frameworks, spanning seven decades and numerous academic disciplines, and discusses the most well-known theorists alongside less familiar names. The chapter is structured around four concepts: identity, environment, career learning, and psychological career resources. Suggestions are offered for the incorporation of theories in career practice.

Download Full-text

The IDEAL household energy dataset, electricity, gas, contextual sensor data and survey data for 255 UK homes

Scientific Data ◽

10.1038/s41597-021-00921-y ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Martin Pullinger ◽

Jonathan Kilgour ◽

Nigel Goddard ◽

Niklas Berliner ◽

Lynda Webb ◽

...

Keyword(s):

Survey Data ◽

Energy Demand ◽

Secondary Data ◽

Hot Water ◽

Power Measurement ◽

Sensor Data ◽

Energy Awareness ◽

Household Energy ◽

Wide Range ◽

The Ideal

AbstractThe IDEAL household energy dataset described here comprises electricity, gas and contextual data from 255 UK homes over a 23-month period ending in June 2018, with a mean participation duration of 286 days. Sensors gathered 1-second electricity data, pulse-level gas data, 12-second temperature, humidity and light data for each room, and 12-second temperature data from boiler pipes for central heating and hot water. 39 homes also included plug-level monitoring of selected electrical appliances, real-power measurement of mains electricity and key sub-circuits, and more detailed temperature monitoring of gas- and heat-using equipment, including radiators and taps. Survey data included occupant demographics, values, attitudes and self-reported energy awareness, household income, energy tariffs, and building, room and appliance characteristics. Linked secondary data comprises weather and level of urbanisation. The data is provided in comma-separated format with a custom-built API to facilitate usage, and has been cleaned and documented. The data has a wide range of applications, including investigating energy demand patterns and drivers, modelling building performance, and undertaking Non-Intrusive Load Monitoring research.

Download Full-text