dsCleaner: A Python Library to Clean, Preprocess and Convert Non-Instrusive Load Monitoring Datasets

Energy disaggregation, also known as non-intrusive load monitoring (NILM), challenges the problem of separating the whole-home electricity usage into appliance-specific individual consumptions, which is a typical application of data analysis. NILM aims to help households understand how the energy is used and consequently tell them how to effectively manage the energy, thus allowing energy efficiency, which is considered as one of the twin pillars of sustainable energy policy (i.e., energy efficiency and renewable energy). Although NILM is unidentifiable, it is widely believed that the NILM problem can be addressed by data science. Most of the existing approaches address the energy disaggregation problem by conventional techniques such as sparse coding, non-negative matrix factorization, and the hidden Markov model. Recent advances reveal that deep neural networks (DNNs) can get favorable performance for NILM since DNNs can inherently learn the discriminative signatures of the different appliances. In this article, we propose a novel method named adversarial energy disaggregation based on DNNs. We introduce the idea of adversarial learning into NILM, which is new for the energy disaggregation task. Our method trains a generator and multiple discriminators via an adversarial fashion. The proposed method not only learns shared representations for different appliances but captures the specific multimode structures of each appliance. Extensive experiments on real-world datasets verify that our method can achieve new state-of-the-art performance.

Download Full-text

The Context of IST for Solid Information Retrieval and Infrastructure Building

Research Anthology on Recent Trends, Tools, and Implications of Computer Programming ◽

10.4018/978-1-7998-3016-0.ch092 ◽

2021 ◽

pp. 2040-2054

Author(s):

Prantosh Kumar Paul

Keyword(s):

Developing Countries ◽

Information Technologies ◽

Data Science ◽

Information Science ◽

Swot Analysis ◽

Vital Role ◽

Knowledge Dissemination ◽

Business Analytics ◽

Limited Information ◽

Information Sciences

Development and progress mainly depends on education and its solid dissemination. Technologies as well as engineering solutions are important for the business and corporate houses. In this context, educational initiatives and programs play a vital role. Developing countries are suffering from many problems and therefore fostering new academic innovation and researches on economic development in today's context. Information Technologies and management science are important for solid business solutions. Therefore, education and knowledge dissemination play an important and valuable role. In many developing countries, gaps between industrial needs and the availability of skilled labor are limited. Information Sciences and Computing are the most valuable areas of study in today's knowledge world. The components, subsets, and subfields of Information Sciences and Technology are rapidly emerging worldwide. Among the emerging and popular areas, a few include Cloud Computing, Green Computing, Green Systems, Big-Data Science, Internet, Business Analytics, and Business Intelligence. Developing countries (like China, Colombia, Malaysia, Mauritius, India, Brazil, South Africa) depend in many ways on knowledge dissemination and solid manpower for their development. Thus, there is an urgent need to introduce such programs and the majority of these programs have been proposed here. Information Science and Technology (IST) with programs such as Bachelors, Masters, and Doctoral Degrees have been listed here with academic and industrial contexts. This article highlights these programs with proper SWOT analysis.

Download Full-text

An Attention-Based Latent Information Extraction Network (ALIEN) for High-Order Feature Interactions

Applied Sciences ◽

10.3390/app10165468 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5468

Author(s):

Ruo Huang ◽

Shelby McIntyre ◽

Meina Song ◽

Haihong E ◽

Zhonghong Ou

Keyword(s):

Information Extraction ◽

Recommender Systems ◽

Vital Role ◽

High Order ◽

Sequential Patterns ◽

Sequence Information ◽

Recommendation Algorithm ◽

Feature Interactions ◽

Real World Datasets ◽

Click Through Rate

One of the primary tasks for commercial recommender systems is to predict the probabilities of users clicking items, e.g., advertisements, music and products. This is because such predictions have a decisive impact on profitability. The classic recommendation algorithm, collaborative filtering (CF), still plays a vital role in many industrial recommender systems. However, although straight CF is good at capturing similar users’ preferences for items based on their past interactions, it lacks regarding (1) modeling the influences of users’ sequential patterns from their individual history interaction sequences and (2) the relevance of users’ and items’ attributes. In this work, we developed an attention-based latent information extraction network (ALIEN) for click-through rate prediction, to integrate (1) implicit user similarity in terms of click patterns (analogous to CF), and (2) modeling the low and high-order feature interactions and (3) historical sequence information. The new model is based on the deep learning, which goes beyond the capabilities of econometric approaches, such as matrix factorization (MF) and k-means. In addition, the approach provides explainability to the recommendation by interpreting the contributions of different features and historical interactions. We have conducted experiments on real-world datasets that demonstrate considerable improvements over strong baselines.

Download Full-text

The Context of IST for Solid Information Retrieval and Infrastructure Building

International Journal of Information Retrieval Research ◽

10.4018/ijirr.2018010106 ◽

2018 ◽

Vol 8 (1) ◽

pp. 86-100

Author(s):

Prantosh Kumar Paul

Keyword(s):

Developing Countries ◽

Information Technologies ◽

Data Science ◽

Information Science ◽

Swot Analysis ◽

Vital Role ◽

Knowledge Dissemination ◽

Business Analytics ◽

Limited Information ◽

Information Sciences

Development and progress mainly depends on education and its solid dissemination. Technologies as well as engineering solutions are important for the business and corporate houses. In this context, educational initiatives and programs play a vital role. Developing countries are suffering from many problems and therefore fostering new academic innovation and researches on economic development in today's context. Information Technologies and management science are important for solid business solutions. Therefore, education and knowledge dissemination play an important and valuable role. In many developing countries, gaps between industrial needs and the availability of skilled labor are limited. Information Sciences and Computing are the most valuable areas of study in today's knowledge world. The components, subsets, and subfields of Information Sciences and Technology are rapidly emerging worldwide. Among the emerging and popular areas, a few include Cloud Computing, Green Computing, Green Systems, Big-Data Science, Internet, Business Analytics, and Business Intelligence. Developing countries (like China, Colombia, Malaysia, Mauritius, India, Brazil, South Africa) depend in many ways on knowledge dissemination and solid manpower for their development. Thus, there is an urgent need to introduce such programs and the majority of these programs have been proposed here. Information Science and Technology (IST) with programs such as Bachelors, Masters, and Doctoral Degrees have been listed here with academic and industrial contexts. This article highlights these programs with proper SWOT analysis.

Download Full-text

A Multi-Agent NILM Architecture for Event Detection and Load Classification

Energies ◽

10.3390/en13174396 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4396

Author(s):

André Eugenio Lazzaretti ◽

Douglas Paulo Bertrand Renaux ◽

Carlos Raimundo Erig Lima ◽

Bruna Machado Mulinari ◽

Hellen Cristina Ancelmo ◽

...

Keyword(s):

Feature Extraction ◽

Event Detection ◽

Classification Accuracy ◽

Agent Architecture ◽

Performance Improvements ◽

Load Monitoring ◽

Multi Agent ◽

Improved Performance ◽

New Algorithms

A multi-agent architecture for a Non-Intrusive Load Monitoring (NILM) solution is presented and evaluated. The underlying rationale for such an architecture is that each agent (load event detection, feature extraction, and classification) outperforms others of the same type in particular scenarios; hence, by combining the expertise of these agents, the system presents an improved performance. Known NILM algorithms, as well as new algorithms, proposed by the authors, were individually evaluated and compared. The proposed architecture considers a NILM system composed of Load Monitoring Modules (LMM) that report to a Center of Operations, required in larger facilities. For the purposed of evaluating and comparing performance, five load event detect agents, five feature extraction agents, and five classification agents were studied so that the best combinations of agents could be implemented in LMMs. To evaluate the proposed system, the COOLL and the LIT-Dataset were used. Performance improvements were detected in all scenarios, with power-ON and power-OFF detection improving up to 13%, while classification accuracy improved up to 9.4%.

Download Full-text

Causal Discovery Combining K2 with Brain Storm Optimization Algorithm

Molecules ◽

10.3390/molecules23071729 ◽

2018 ◽

Vol 23 (7) ◽

pp. 1729

Author(s):

Yinghan Hong ◽

Zhifeng Hao ◽

Guizhen Mai ◽

Han Huang ◽

Arun Kumar Sangaiah

Keyword(s):

Real World ◽

Data Science ◽

Learning Algorithm ◽

Causal Structure ◽

Scientific Discovery ◽

Causal Discovery ◽

Causal Mechanism ◽

Topological Order ◽

Brain Storm Optimization ◽

Real World Datasets

Exploring and detecting the causal relations among variables have shown huge practical values in recent years, with numerous opportunities for scientific discovery, and have been commonly seen as the core of data science. Among all possible causal discovery methods, causal discovery based on a constraint approach could recover the causal structures from passive observational data in general cases, and had shown extensive prospects in numerous real world applications. However, when the graph was sufficiently large, it did not work well. To alleviate this problem, an improved causal structure learning algorithm named brain storm optimization (BSO), is presented in this paper, combining K2 with brain storm optimization (K2-BSO). Here BSO is used to search optimal topological order of nodes instead of graph space. This paper assumes that dataset is generated by conforming to a causal diagram in which each variable is generated from its parent based on a causal mechanism. We designed an elaborate distance function for clustering step in BSO according to the mechanism of K2. The graph space therefore was reduced to a smaller topological order space and the order space can be further reduced by an efficient clustering method. The experimental results on various real-world datasets showed our methods outperformed the traditional search and score methods and the state-of-the-art genetic algorithm-based methods.

Download Full-text

Cached Sufficient Statistics for Efficient Machine Learning with Large Datasets

Journal of Artificial Intelligence Research ◽

10.1613/jair.453 ◽

1998 ◽

Vol 8 ◽

pp. 67-91 ◽

Cited By ~ 93

Author(s):

A. Moore ◽

M. S. Lee

Keyword(s):

Machine Learning ◽

Data Structures ◽

Rule Learning ◽

Worst Case ◽

Sufficient Statistics ◽

Frequent Sets ◽

Efficient Machine ◽

Real World Datasets ◽

Selection Algorithms ◽

New Algorithms

This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.

Download Full-text

Loch Prospector: Metadata Visualization for Lakes of Open Data

10.31219/osf.io/2s76d ◽

2020 ◽

Author(s):

Neha Makhija ◽

Mansi Jain ◽

Nikolaos Tziavelis ◽

Laura Di Rocco ◽

Sara Di Bartolomeo ◽

...

Keyword(s):

Data Management ◽

Data Science ◽

Open Data ◽

Data Availability ◽

Great Promise ◽

Management Techniques ◽

New Challenges ◽

Integration Data ◽

New Algorithms ◽

Structural Aspects

Data lakes are an emerging storage paradigm that promotes data availability over integration. A prime example are repositories of Open Data which show great promise for transparent data science. Due to the lack of proper integration, Data Lakes may not have a common consistent schema and traditional data management techniques fall short with these repositories. Much recent research has tried to address the new challenges associated with these data lakes. Researchers in this area are mainly interested in the structural properties of the data for developing new algorithms, yet typical Open Data portals offer limited functionality in that respect and instead focus on data semantics.We propose Loch Prospector, a visualization to assist data management researchers in exploring and understanding the most crucial structural aspects of Open Data — in particular, metadata attributes — and the associated task abstraction for their work. Our visualization enables researchers to navigate the contents of data lakes effectively and easily accomplish what were previously laborious tasks. A copy of this paper with all supplemental material is available at osf.io/zkxv9

Download Full-text

THE S2-ENSEMBLE FUSION ALGORITHM

International Journal of Neural Systems ◽

10.1142/s0129065711003012 ◽

2011 ◽

Vol 21 (06) ◽

pp. 505-525 ◽

Cited By ~ 11

Author(s):

BRUNO BARUQUE ◽

EMILIO CORCHADO ◽

HUJUN YIN

Keyword(s):

Supervised Learning ◽

Complete Analysis ◽

Fusion Algorithm ◽

Self Organizing Maps ◽

The Family ◽

Real World Datasets ◽

Different Characteristics ◽

Novel Model ◽

New Algorithms ◽

Selection Of

This paper presents a novel model for performing classification and visualization of high-dimensional data by means of combining two enhancing techniques. The first is a semi-supervised learning, an extension of the supervised learning used to incorporate unlabeled information to the learning process. The second is an ensemble learning to replicate the analysis performed, followed by a fusion mechanism that yields as a combined result of previously performed analysis in order to improve the result of a single model. The proposed learning schema, termed S 2-Ensemble, is applied to several unsupervised learning algorithms within the family of topology maps, such as the Self-Organizing Maps and the Neural Gas. This study also includes a thorough research of the characteristics of these novel schemes, by means quality measures, which allow a complete analysis of the resultant classifiers from the viewpoint of various perspectives over the different ways that these classifiers are used. The study conducts empirical evaluations and comparisons on various real-world datasets from the UCI repository, which exhibit different characteristics, so to enable an extensive selection of situations where the presented new algorithms can be applied.

Download Full-text

Microbeam analysis. EMSA/MAS standard file format for spectral-data exchange

10.3403/30230652 ◽

2013 ◽

Keyword(s):

Spectral Data ◽

Data Exchange ◽

File Format ◽

Microbeam Analysis ◽

Standard File Format

Download Full-text