A Machine Learning Pipeline for Demand Response Capacity Scheduling

Demand response (DR) is an integral component of smart grid operations that offers the necessary flexibility to support its decarbonisation. In incentive-based DR programs, deviations from the scheduled DR capacity affect the grid’s energy balance and result in revenue losses for the DR participants. This issue aggravates with increasing DR delivery from participants such as large consumer buildings who have limited standard methods to follow for DR capacity scheduling. Load curtailment based DR capacity availability from such consumers can be forecasted reliably with the help of supervised machine learning (ML) models. This study demonstrates the development of data-driven ML based total and flexible load forecast models for a retail building. The ML model development tasks such as data pre-processing, training-testing dataset preparation, cross-validation, algorithm selection, hyperparameter optimisation, feature ranking, model selection and model evaluation are guided by deployment-centric design criteria such as reliability, computational efficiency and scalability. Based on the selected performance metrics, the day-ahead and week-ahead ML based load forecast models developed for the retail building are shown to outperform the timeseries persistence models used for benchmarking. Furthermore, the deployment of these models for DR capacity scheduling is proposed as an ML pipeline that can be realised with the help of ML workflows, computational resources as well as systems for monitoring and visualisation. The ML pipeline ensures faster, cost-effective and large-scale deployment of forecast models that support reliable DR capacity scheduling without affecting the grid’s energy balance. Minimisation of revenue losses encourages increased DR participation from large consumer buildings, ensuring further flexibility in the smart grid.

Download Full-text

Attack and Anomaly Detection in IoT Networks Using Supervised Machine Learning Approaches

Revue d intelligence artificielle ◽

10.18280/ria.350102 ◽

2021 ◽

Vol 35 (1) ◽

pp. 11-21

Author(s):

Himani Tyagi ◽

Rajendra Kumar

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Detection System ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Testing Time ◽

Learning Approaches ◽

Reduction Techniques ◽

Share Data

IoT is characterized by communication between things (devices) that constantly share data, analyze, and make decisions while connected to the internet. This interconnected architecture is attracting cyber criminals to expose the IoT system to failure. Therefore, it becomes imperative to develop a system that can accurately and automatically detect anomalies and attacks occurring in IoT networks. Therefore, in this paper, an Intrsuion Detection System (IDS) based on extracted novel feature set synthesizing BoT-IoT dataset is developed that can swiftly, accurately and automatically differentiate benign and malicious traffic. Instead of using available feature reduction techniques like PCA that can change the core meaning of variables, a unique feature set consisting of only seven lightweight features is developed that is also IoT specific and attack traffic independent. Also, the results shown in the study demonstrates the effectiveness of fabricated seven features in detecting four wide variety of attacks namely DDoS, DoS, Reconnaissance, and Information Theft. Furthermore, this study also proves the applicability and efficiency of supervised machine learning algorithms (KNN, LR, SVM, MLP, DT, RF) in IoT security. The performance of the proposed system is validated using performance Metrics like accuracy, precision, recall, F-Score and ROC. Though the accuracy of Decision Tree (99.9%) and Randon Forest (99.9%) Classifiers are same but other metrics like training and testing time shows Random Forest comparatively better.

Download Full-text

Insider Threat Detection Using Supervised Machine Learning Algorithms on an Extremely Imbalanced Dataset

International Journal of Cyber Warfare and Terrorism ◽

10.4018/ijcwt.2020040101 ◽

2020 ◽

Vol 10 (2) ◽

pp. 1-26

Author(s):

Naghmeh Moradpoor Sheykhkanloo ◽

Adam Hall

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Machine Learning Algorithms ◽

Third Party ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Insider Threat ◽

Threat Detection ◽

Imbalanced Dataset ◽

The Impact

An insider threat can take on many forms and fall under different categories. This includes malicious insider, careless/unaware/uneducated/naïve employee, and the third-party contractor. Machine learning techniques have been studied in published literature as a promising solution for such threats. However, they can be biased and/or inaccurate when the associated dataset is hugely imbalanced. Therefore, this article addresses the insider threat detection on an extremely imbalanced dataset which includes employing a popular balancing technique known as spread subsample. The results show that although balancing the dataset using this technique did not improve performance metrics, it did improve the time taken to build the model and the time taken to test the model. Additionally, the authors realised that running the chosen classifiers with parameters other than the default ones has an impact on both balanced and imbalanced scenarios, but the impact is significantly stronger when using the imbalanced dataset.

Download Full-text

Using Random Forests on Real-World City Data for Urban Planning in a Visual Semantic Decision Support System

Sensors ◽

10.3390/s19102266 ◽

2019 ◽

Vol 19 (10) ◽

pp. 2266 ◽

Cited By ~ 1

Author(s):

Nikolaos Sideris ◽

Georgios Bardis ◽

Athanasios Voulodimos ◽

Georgios Miaoulis ◽

Djamchid Ghazanfarpour

Keyword(s):

Machine Learning ◽

Urban Planning ◽

Random Forests ◽

Real World ◽

Performance Metrics ◽

World City ◽

Supervised Machine Learning ◽

Machine Learning Techniques ◽

Support Vector ◽

Real World Data

The constantly increasing amount and availability of urban data derived from varying sources leads to an assortment of challenges that include, among others, the consolidation, visualization, and maximal exploitation prospects of the aforementioned data. A preeminent problem affecting urban planning is the appropriate choice of location to host a particular activity (either commercial or common welfare service) or the correct use of an existing building or empty space. In this paper, we propose an approach to address these challenges availed with machine learning techniques. The proposed system combines, fuses, and merges various types of data from different sources, encodes them using a novel semantic model that can capture and utilize both low-level geometric information and higher level semantic information and subsequently feeds them to the random forests classifier, as well as other supervised machine learning models for comparisons. Our experimental evaluation on multiple real-world data sets comparing the performance of several classifiers (including Feedforward Neural Networks, Support Vector Machines, Bag of Decision Trees, k-Nearest Neighbors and Naïve Bayes), indicated the superiority of Random Forests in terms of the examined performance metrics (Accuracy, Specificity, Precision, Recall, F-measure and G-mean).

Download Full-text

A deep learning and novelty detection framework for rapid phenotyping in high-content screening

10.1101/134627 ◽

2017 ◽

Cited By ~ 2

Author(s):

Christoph Sommer ◽

Rudolf Hoefler ◽

Matthias Samwer ◽

Daniel W. Gerlich

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Novelty Detection ◽

A Priori ◽

Mitotic Cell ◽

Supervised Machine Learning ◽

High Content Screening ◽

Data Sets ◽

User Training

AbstractSupervised machine learning is a powerful and widely used method to analyze high-content screening data. Despite its accuracy, efficiency, and versatility, supervised machine learning has drawbacks, most notably its dependence on a priori knowledge of expected phenotypes and time-consuming classifier training. We provide a solution to these limitations with CellCognition Explorer, a generic novelty detection and deep learning framework. Application to several large-scale screening data sets on nuclear and mitotic cell morphologies demonstrates that CellCognition Explorer enables discovery of rare phenotypes without user training, which has broad implications for improved assay development in high-content screening.

Download Full-text

A machine learning approach to predict ethnicity using personal name and census location in Canada

PLoS ONE ◽

10.1371/journal.pone.0241239 ◽

2020 ◽

Vol 15 (11) ◽

pp. e0241239

Author(s):

Kai On Wong ◽

Osmar R. Zaïane ◽

Faith G. Davis ◽

Yutaka Yasui

Keyword(s):

Machine Learning ◽

First Nations ◽

Predictive Value ◽

Large Scale ◽

Performance Metrics ◽

Characteristic Curve ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approach ◽

Machine Learning Approach

Background Canada is an ethnically-diverse country, yet its lack of ethnicity information in many large databases impedes effective population research and interventions. Automated ethnicity classification using machine learning has shown potential to address this data gap but its performance in Canada is largely unknown. This study conducted a large-scale machine learning framework to predict ethnicity using a novel set of name and census location features. Methods Using census 1901, the multiclass and binary class classification machine learning pipelines were developed. The 13 ethnic categories examined were Aboriginal (First Nations, Métis, Inuit, and all-combined)), Chinese, English, French, Irish, Italian, Japanese, Russian, Scottish, and others. Machine learning algorithms included regularized logistic regression, C-support vector, and naïve Bayes classifiers. Name features consisted of the entire name string, substrings, double-metaphones, and various name-entity patterns, while location features consisted of the entire location string and substrings of province, district, and subdistrict. Predictive performance metrics included sensitivity, specificity, positive predictive value, negative predictive value, F1, Area Under the Curve for Receiver Operating Characteristic curve, and accuracy. Results The census had 4,812,958 unique individuals. For multiclass classification, the highest performance achieved was 76% F1 and 91% accuracy. For binary classifications for Chinese, French, Italian, Japanese, Russian, and others, the F1 ranged 68–95% (median 87%). The lower performance for English, Irish, and Scottish (F1 ranged 63–67%) was likely due to their shared cultural and linguistic heritage. Adding census location features to the name-based models strongly improved the prediction in Aboriginal classification (F1 increased from 50% to 84%). Conclusions The automated machine learning approach using only name and census location features can predict the ethnicity of Canadians with varying performance by specific ethnic categories.

Download Full-text

Learning from the 2018 Western Japan Heavy Rains to Detect Floods during the 2019 Hagibis Typhoon

Remote Sensing ◽

10.3390/rs12142244 ◽

2020 ◽

Vol 12 (14) ◽

pp. 2244

Author(s):

Luis Moya ◽

Erick Mas ◽

Shunichi Koshimura

Keyword(s):

Machine Learning ◽

Real Time ◽

Local Governments ◽

Large Scale ◽

Damage Identification ◽

Remote Sensing Data ◽

Early Response ◽

Training Data ◽

Supervised Machine Learning ◽

A Current

Applications of machine learning on remote sensing data appear to be endless. Its use in damage identification for early response in the aftermath of a large-scale disaster has a specific issue. The collection of training data right after a disaster is costly, time-consuming, and many times impossible. This study analyzes a possible solution to the referred issue: the collection of training data from past disaster events to calibrate a discriminant function. Then the identification of affected areas in a current disaster can be performed in near real-time. The performance of a supervised machine learning classifier to learn from training data collected from the 2018 heavy rainfall at Okayama Prefecture, Japan, and to identify floods due to the typhoon Hagibis on 12 October 2019 at eastern Japan is reported in this paper. The results show a moderate agreement with flood maps provided by local governments and public institutions, and support the assumption that previous disaster information can be used to identify a current disaster in near-real time.

Download Full-text

Supervised machine learning for diagnostic classification from large-scale neuroimaging datasets

Brain Imaging and Behavior ◽

10.1007/s11682-019-00191-8 ◽

2019 ◽

Vol 14 (6) ◽

pp. 2378-2416 ◽

Cited By ~ 5

Author(s):

Pradyumna Lanka ◽

D Rangaprakash ◽

Michael N. Dretsch ◽

Jeffrey S. Katz ◽

Thomas S. Denney ◽

...

Keyword(s):

Machine Learning ◽

Large Scale ◽

Supervised Machine Learning ◽

Diagnostic Classification

Download Full-text

Simulation Based Grid Optimization to Enhance Renewable Energy Storage in Iceland

Volume 6A: Energy ◽

10.1115/imece2014-36143 ◽

2014 ◽

Author(s):

Michael Sugar ◽

Runar Unnthorsson

Keyword(s):

Renewable Energy ◽

Energy Storage ◽

Energy Balance ◽

Large Scale ◽

Performance Metrics ◽

Tidal Energy ◽

Transmission System ◽

Transmission Model ◽

Renewable Energy Resources ◽

Peak Shaving

Renewable energy resources are contributing evermore to the generation mix worldwide, however, expanding grids in size and complexity have given rise to unforeseen complications such as frequency oscillations, voltage sags and spikes, and power outages. In 2013, nearly 100% of electricity generation in Iceland was from hydropower and geothermal sources; there is also high potential for wind and tidal energy, both options are being explored and would benefit from additional technologies to manage fluctuations and store energy surplus. Landsnet is the sole transmission system operator (TSO) responsible for energy balance in Iceland. On the consumer side, load variations represent difficulties for utilities to meet ever-changing demand. Research indicates high-capacity electricity energy storage (EES) has the potential to be economically beneficial as well as carbon neutral, all while improving power and voltage quality, peak-shaving, reducing the number of grid failures and reducing natural fluctuations in renewable energy (RE) sources. Two complex resource deployment scenarios are modeled using GridCommand™ Distribution: (1) large-scale 10 MWh capacity EES evenly distributed across the transmission system, and (2) large-scale 10 MWh capacity EES clustered at targeted substations in the transmission system. Results reveal 10 MWh capacity battery EES at a density of 60% in the transmission model provides optimal performance conditions. Optimal conditions are defined by EES performance metrics, and signify improvements in power quality, energy balance, and peak-shaving when electricity demand is at its highest. EES technologies are presented and tested at different locations across the Icelandic grid to predict which solutions are best for the future development of the electricity system.

Download Full-text