A big data framework for intrusion detection in smart grids using apache spark

2016 ◽

Vol XLI-B2 ◽

pp. 343-348

Author(s):

J. Boehm ◽

K. Liu ◽

C. Alis

Keyword(s):

Big Data ◽

Point Cloud ◽

Point Clouds ◽

Geospatial Data ◽

Apache Spark ◽

Cloud Data ◽

Binary File ◽

Data Framework ◽

File Formats ◽

Data Ingestion

In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.

Download Full-text

A Data Analytics/Big Data Framework for Advanced Metering Infrastructure Data

Sensors ◽

10.3390/s21165650 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5650

Author(s):

Jenniffer S. Guerrero-Prado ◽

Wilfredo Alfonso-Morales ◽

Eduardo F. Caicedo-Bravo

Keyword(s):

Decision Making ◽

Big Data ◽

Smart Grids ◽

Data Analytics ◽

Electricity Consumption ◽

Advanced Metering Infrastructure ◽

Data Interoperability ◽

Architecture Model ◽

Data Framework ◽

Advanced Metering

The Advanced Metering Infrastructure (AMI) data represent a source of information in real time not only about electricity consumption but also as an indicator of other social, demographic, and economic dynamics within a city. This paper presents a Data Analytics/Big Data framework applied to AMI data as a tool to leverage the potential of this data within the applications in a Smart City. The framework includes three fundamental aspects. First, the architectural view places AMI within the Smart Grids Architecture Model-SGAM. Second, the methodological view describes the transformation of raw data into knowledge represented by the DIKW hierarchy and the NIST Big Data interoperability model. Finally, a binding element between the two views is represented by human expertise and skills to obtain a deeper understanding of the results and transform knowledge into wisdom. Our new view faces the challenges arriving in energy markets by adding a binding element that gives support for optimal and efficient decision-making. To show how our framework works, we developed a case study. The case implements each component of the framework for a load forecasting application in a Colombian Retail Electricity Provider (REP). The MAPE for some of the REP’s markets was less than 5%. In addition, the case shows the effect of the binding element as it raises new development alternatives and becomes a feedback mechanism for more assertive decision making.

Download Full-text

An Ensemble-Based Scalable Approach for Intrusion Detection Using Big Data Framework

Big Data ◽

10.1089/big.2020.0201 ◽

2021 ◽

Author(s):

Santosh Kumar Sahu ◽

Durga Prasad Mohapatra ◽

Jitendra Kumar Rout ◽

Kshira Sagar Sahoo ◽

Ashish Kr. Luhach

Keyword(s):

Big Data ◽

Intrusion Detection ◽

Data Framework

Download Full-text

SIDELOADING – INGESTION OF LARGE POINT CLOUDS INTO THE APACHE SPARK BIG DATA ENGINE

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsarchives-xli-b2-343-2016 ◽

2016 ◽

Vol XLI-B2 ◽

pp. 343-348 ◽

Cited By ~ 2

Author(s):

J. Boehm ◽

K. Liu ◽

C. Alis

Keyword(s):

Big Data ◽

Point Cloud ◽

Point Clouds ◽

Geospatial Data ◽

Apache Spark ◽

Cloud Data ◽

Binary File ◽

Data Framework ◽

File Formats ◽

Data Ingestion

In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.

Download Full-text

Big data framework for analytics in smart grids

Electric Power Systems Research ◽

10.1016/j.epsr.2017.06.006 ◽

2017 ◽

Vol 151 ◽

pp. 369-380 ◽

Cited By ~ 52

Author(s):

Amr A. Munshi ◽

Yasser A.-R. I. Mohamed

Keyword(s):

Big Data ◽

Smart Grids ◽

Data Framework

Download Full-text

Big Data in Smart City: Management Challenges

Applied Sciences ◽

10.3390/app11104557 ◽

2021 ◽

Vol 11 (10) ◽

pp. 4557

Author(s):

Mladen Amović ◽

Miro Govedarica ◽

Aleksandra Radulović ◽

Ivana Janković

Keyword(s):

Big Data ◽

Smart City ◽

Urban Areas ◽

Management System ◽

Smart Cities ◽

Open Data ◽

Apache Spark ◽

City Management ◽

Real Time System ◽

Data Framework

Smart cities use digital technologies such as cloud computing, Internet of Things, or open data in order to overcome limitations of traditional representation and exchange of geospatial data. This concept ensures a significant increase in the use of data to establish new services that contribute to better sustainable development and monitoring of all phenomena that occur in urban areas. The use of the modern geoinformation technologies, such as sensors for collecting different geospatial and related data, requires adequate storage options for further data analysis. In this paper, we suggest the biG dAta sMart cIty maNagEment SyStem (GAMINESS) that is based on the Apache Spark big data framework. The model of the GAMINESS management system is based on the principles of the big data modeling, which differs greatly from standard databases. This approach provides the ability to store and manage huge amounts of structured, semi-structured, and unstructured data in real time. System performance is increasing to a higher level by using the process parallelization explained through the five V principles of the big data paradigm. The existing solutions based on the five V principles are focused only on the data visualization, not the data themselves. Such solutions are often limited by different storage mechanisms and by the ability to perform complex analyses on large amounts of data with expected performance. The GAMINESS management system overcomes these disadvantages by conversion of smart city data to a big data structure without limitations related to data formats or use standards. The suggested model contains two components: a geospatial component and a sensor component that are based on the CityGML and the SensorThings standards. The developed model has the ability to exchange data regardless of the used standard or the data format into proposed Apache Spark data framework schema. The verification of the proposed model is done within the case study for the part of the city of Novi Sad.

Download Full-text

Classifying UNSW-NB15 Network Traffic in the Big Data Framework using Random Forest in Spark

International Journal of Big Data Intelligence and Applications ◽

10.4018/ijbdia.287617 ◽

2021 ◽

Vol 2 (1) ◽

pp. 0-0

Keyword(s):

Big Data ◽

Random Forest ◽

Network Traffic ◽

Information Gain ◽

Imbalanced Data ◽

Apache Spark ◽

High Dimensionality ◽

Learning Classifier ◽

Data Framework ◽

Statistical Measures

The focus of this work is on detecting and classifying attacks in network traffic using a binary as well as multi-class machine learning classifier, Random Forest, in a distributed Big Data environment using Apache Spark. The classifier is tested using the UNSW-NB15 dataset. Major problems in these types of datasets include high dimensionality and imbalanced data. To address the issue of high dimensionality, both Information Gain as well as Principal Components Analysis (PCA) were applied before training and testing the data using Random Forest in Apache Spark. Binary as well as multi-class Random Forest classifiers were compared in a distributed environment, with and without using PCA, using various number of Spark cores and Random Forest trees, in terms of performance time and statistical measures. The highest accuracy was obtained by the binary classifier at 99.94%, using 8 cores and 30 trees. This study obtained higher accuracy and lower FAR rates than previously achieved, with low testing times.

Download Full-text