A big data framework for intrusion detection in smart grids using apache spark

Author(s):  
K Vimalkumar ◽  
N Radhika
Author(s):  
J. Boehm ◽  
K. Liu ◽  
C. Alis

In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.


Sensors ◽  
2021 ◽  
Vol 21 (16) ◽  
pp. 5650
Author(s):  
Jenniffer S. Guerrero-Prado ◽  
Wilfredo Alfonso-Morales ◽  
Eduardo F. Caicedo-Bravo

The Advanced Metering Infrastructure (AMI) data represent a source of information in real time not only about electricity consumption but also as an indicator of other social, demographic, and economic dynamics within a city. This paper presents a Data Analytics/Big Data framework applied to AMI data as a tool to leverage the potential of this data within the applications in a Smart City. The framework includes three fundamental aspects. First, the architectural view places AMI within the Smart Grids Architecture Model-SGAM. Second, the methodological view describes the transformation of raw data into knowledge represented by the DIKW hierarchy and the NIST Big Data interoperability model. Finally, a binding element between the two views is represented by human expertise and skills to obtain a deeper understanding of the results and transform knowledge into wisdom. Our new view faces the challenges arriving in energy markets by adding a binding element that gives support for optimal and efficient decision-making. To show how our framework works, we developed a case study. The case implements each component of the framework for a load forecasting application in a Colombian Retail Electricity Provider (REP). The MAPE for some of the REP’s markets was less than 5%. In addition, the case shows the effect of the binding element as it raises new development alternatives and becomes a feedback mechanism for more assertive decision making.


Big Data ◽  
2021 ◽  
Author(s):  
Santosh Kumar Sahu ◽  
Durga Prasad Mohapatra ◽  
Jitendra Kumar Rout ◽  
Kshira Sagar Sahoo ◽  
Ashish Kr. Luhach

Author(s):  
J. Boehm ◽  
K. Liu ◽  
C. Alis

In the geospatial domain we have now reached the point where data volumes we handle have clearly grown beyond the capacity of most desktop computers. This is particularly true in the area of point cloud processing. It is therefore naturally lucrative to explore established big data frameworks for big geospatial data. The very first hurdle is the import of geospatial data into big data frameworks, commonly referred to as data ingestion. Geospatial data is typically encoded in specialised binary file formats, which are not naturally supported by the existing big data frameworks. Instead such file formats are supported by software libraries that are restricted to single CPU execution. We present an approach that allows the use of existing point cloud file format libraries on the Apache Spark big data framework. We demonstrate the ingestion of large volumes of point cloud data into a compute cluster. The approach uses a map function to distribute the data ingestion across the nodes of a cluster. We test the capabilities of the proposed method to load billions of points into a commodity hardware compute cluster and we discuss the implications on scalability and performance. The performance is benchmarked against an existing native Apache Spark data import implementation.


2017 ◽  
Vol 151 ◽  
pp. 369-380 ◽  
Author(s):  
Amr A. Munshi ◽  
Yasser A.-R. I. Mohamed
Keyword(s):  
Big Data ◽  

2021 ◽  
Vol 11 (10) ◽  
pp. 4557
Author(s):  
Mladen Amović ◽  
Miro Govedarica ◽  
Aleksandra Radulović ◽  
Ivana Janković

Smart cities use digital technologies such as cloud computing, Internet of Things, or open data in order to overcome limitations of traditional representation and exchange of geospatial data. This concept ensures a significant increase in the use of data to establish new services that contribute to better sustainable development and monitoring of all phenomena that occur in urban areas. The use of the modern geoinformation technologies, such as sensors for collecting different geospatial and related data, requires adequate storage options for further data analysis. In this paper, we suggest the biG dAta sMart cIty maNagEment SyStem (GAMINESS) that is based on the Apache Spark big data framework. The model of the GAMINESS management system is based on the principles of the big data modeling, which differs greatly from standard databases. This approach provides the ability to store and manage huge amounts of structured, semi-structured, and unstructured data in real time. System performance is increasing to a higher level by using the process parallelization explained through the five V principles of the big data paradigm. The existing solutions based on the five V principles are focused only on the data visualization, not the data themselves. Such solutions are often limited by different storage mechanisms and by the ability to perform complex analyses on large amounts of data with expected performance. The GAMINESS management system overcomes these disadvantages by conversion of smart city data to a big data structure without limitations related to data formats or use standards. The suggested model contains two components: a geospatial component and a sensor component that are based on the CityGML and the SensorThings standards. The developed model has the ability to exchange data regardless of the used standard or the data format into proposed Apache Spark data framework schema. The verification of the proposed model is done within the case study for the part of the city of Novi Sad.


The focus of this work is on detecting and classifying attacks in network traffic using a binary as well as multi-class machine learning classifier, Random Forest, in a distributed Big Data environment using Apache Spark. The classifier is tested using the UNSW-NB15 dataset. Major problems in these types of datasets include high dimensionality and imbalanced data. To address the issue of high dimensionality, both Information Gain as well as Principal Components Analysis (PCA) were applied before training and testing the data using Random Forest in Apache Spark. Binary as well as multi-class Random Forest classifiers were compared in a distributed environment, with and without using PCA, using various number of Spark cores and Random Forest trees, in terms of performance time and statistical measures. The highest accuracy was obtained by the binary classifier at 99.94%, using 8 cores and 30 trees. This study obtained higher accuracy and lower FAR rates than previously achieved, with low testing times.


Author(s):  
Muhammad Junaid ◽  
Shiraz Ali Wagan ◽  
Nawab Muhammad Faseeh Qureshi ◽  
Choon Sung Nam ◽  
Dong Ryeol Shin

IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 226380-226396
Author(s):  
Diana Martinez-Mosquera ◽  
Rosa Navarrete ◽  
Sergio Lujan-Mora

Sign in / Sign up

Export Citation Format

Share Document