The Need to Consider Hardware Selection when Designing Big Data Applications Supported by Metadata

Security and Privacy Issues of Big Data

Cyber Law, Privacy, and Security ◽

10.4018/978-1-5225-8897-9.ch019 ◽

2019 ◽

pp. 375-407

Author(s):

José Moura ◽

Carlos Serrão

Keyword(s):

Social Networks ◽

Big Data ◽

Personal Data ◽

Security And Privacy ◽

Data Systems ◽

Computing Systems ◽

Big Data Applications ◽

Big Data Systems ◽

Privacy Issues

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.

Download Full-text

PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop

Big Data and Cognitive Computing ◽

10.3390/bdcc3030047 ◽

2019 ◽

Vol 3 (3) ◽

pp. 47

Author(s):

Johannes Kroß ◽

Helmut Krcmar

Keyword(s):

Big Data ◽

Response Times ◽

Apache Spark ◽

Data Systems ◽

Domain Specific ◽

Model And Simulation ◽

Big Data Applications ◽

Simulation Based ◽

Measurement Results ◽

Big Data Systems

Evaluating and predicting the performance of big data applications are required to efficiently size capacities and manage operations. Gaining profound insights into the system architecture, dependencies of components, resource demands, and configurations cause difficulties to engineers. To address these challenges, this paper presents an approach to automatically extract and transform system specifications to predict the performance of applications. It consists of three components. First, a system-and tool-agnostic domain-specific language (DSL) allows the modeling of performance-relevant factors of big data applications, computing resources, and data workload. Second, DSL instances are automatically extracted from monitored measurements of Apache Spark and Apache Hadoop (i.e., YARN and HDFS) systems. Third, these instances are transformed to model- and simulation-based performance evaluation tools to allow predictions. By adapting DSL instances, our approach enables engineers to predict the performance of applications for different scenarios such as changing data input and resources. We evaluate our approach by predicting the performance of linear regression and random forest applications of the HiBench benchmark suite. Simulation results of adjusted DSL instances compared to measurement results show accurate predictions errors below 15% based upon averages for response times and resource utilization.

Download Full-text

An Empirical Research on Service-Oriented Architecture (SOA) for Data Exchange in BIGDATA Systems

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.c1087.0193s20 ◽

2020 ◽

Vol 9 (3S) ◽

pp. 407-409

Keyword(s):

Big Data ◽

Data Exchange ◽

Success Factors ◽

Service Oriented Architecture ◽

Data Gathering ◽

Value Added ◽

Data Systems ◽

Big Data Applications ◽

Service Oriented ◽

Big Data Systems

Service-Oriented Architecture is a method for scheming dealing and organizing systems that represent ecofriendly business functionality. The objective of this study is to find out the critical success factors that need to implement SOA in BIG DATA systems. Our study aimed at classifying these erroneous performs in execution of SOA. The acceptance of SOA has interested creators to text its requests and applications. The analysed results would be very useful for researchers who would like to implement SOA with BIG DATA systems. SOA lead numerous advantages such as value-added flexibility and appropriate alignment among processes as well as reduced cost of integration and maintenance. Generally, BIG DATA anxieties large-volume, composite, rising figures groups with numerous, self-directed sources. BIG DATA claims where data gathering has grownup extremely and is elsewhere the aptitude of usually used software utensils to detention, accomplish and development within the rise [1]. The greatest essential task for the BIG DATA applications is to discover the large volumes of data and excerpt valuable material or information for upcoming actions. The main purpose of this study is to identify the important factors that are needed to implement SOA in BIG DATA systems. Zhang and Yang suggests a reengineering approach which will restructure the legacy systems that leads to SOA by considering of an organization. This paper also express various challenges of SOA and identify the problems that improve SOA based services for data exchange in BIG DATA systems.

Download Full-text

Tools for the Storage and Analysis of Spatial Big Data

Proccedings of 10th International Conference "Environmental Engineering" ◽

10.3846/enviro.2017.216 ◽

2017 ◽

Author(s):

Przemysław Lisowski ◽

Adam Piórkowski ◽

Andrzej Lesniak

Keyword(s):

Big Data ◽

Spatial Data ◽

File Systems ◽

Large Datasets ◽

Distributed File Systems ◽

Data Systems ◽

Data Production ◽

Spatial Big Data ◽

Big Data Systems ◽

Access To Data

Storing large amounts of spatial data in GIS systems is problematic. This problem is growing due to ever- increasing data production from a variety of data sources. The phenomenon of collecting huge amounts of data is called Big Data. Existing solutions are capable of processing and storing large volumes of spatial data. These solutions also show new approaches to data processing. Conventional techniques work with ordinary data but are not suitable for large datasets. Their efficient action is possible only when connected to distributed file systems and algorithms able to reduce tasks. This review focuses on the characteristics of large spatial data and discusses opportunities offered by spatial big data systems. The work also draws attention to the problems of indexing and access to data, and proposed solutions in this area.

Download Full-text

Security and Privacy Issues of Big Data

Web Services ◽

10.4018/978-1-5225-7501-6.ch114 ◽

2019 ◽

pp. 2197-2229

Author(s):

José Moura ◽

Carlos Serrão

Keyword(s):

Social Networks ◽

Big Data ◽

Personal Data ◽

Security And Privacy ◽

Data Systems ◽

Computing Systems ◽

Big Data Applications ◽

Big Data Systems ◽

Privacy Issues

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.

Download Full-text

Big Data Management Canvas: A Reference Model for Value Creation from Data

Big Data and Cognitive Computing ◽

10.3390/bdcc3010019 ◽

2019 ◽

Vol 3 (1) ◽

pp. 19 ◽

Cited By ~ 5

Author(s):

Michael Kaufmann

Keyword(s):

Big Data ◽

Data Management ◽

Value Creation ◽

Reference Model ◽

Solution Space ◽

Use Cases ◽

Data Systems ◽

Big Data Applications ◽

Big Data Systems ◽

Map Data

Many big data projects are technology-driven and thus, expensive and inefficient. It is often unclear how to exploit existing data resources and map data, systems and analytics results to actual use cases. Existing big data reference models are mostly either technological or business-oriented in nature, but do not consequently align both aspects. To address this issue, a reference model for big data management is proposed that operationalizes value creation from big data by linking business targets with technical implementation. The purpose of this model is to provide a goal- and value-oriented framework to effectively map and plan purposeful big data systems aligned with a clear value proposition. Based on an epistemic model that conceptualizes big data management as a cognitive system, the solution space of data value creation is divided into five layers: preparation, analysis, interaction, effectuation, and intelligence. To operationalize the model, each of these layers is subdivided into corresponding business and IT aspects to create a link from use cases to technological implementation. The resulting reference model, the big data management canvas, can be applied to classify and extend existing big data applications and to derive and plan new big data solutions, visions, and strategies for future projects. To validate the model in the context of existing information systems, the paper describes three cases of big data management in existing companies.

Download Full-text

A Novel Improved Grey Wolf Optimization Algorithm Based Resource Management Strategy for Big Data Systems

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2021.9383 ◽

2021 ◽

Vol 18 (4) ◽

pp. 1227-1232

Author(s):

L. R. Aravind Babu ◽

J. Saravana Kumar

Keyword(s):

Cloud Computing ◽

Big Data ◽

Resource Management ◽

Scheduling Algorithm ◽

Data Systems ◽

Grey Wolf ◽

End User ◽

Grey Wolf Optimization ◽

Cloud Resource ◽

Big Data Systems

Presently, big data is very popular, since it finds helpful in diverse domains like social media, E-commerce transactions, etc. Cloud computing offers services on demand, broader networking access, source collection, quick flexibility and calculated services. The cloud sources are usually different and the application necessities of the end user are rapidly changing from time to time. So, the resource management is the tedious process. At the same time, resource management and scheduling plays a vital part in cloud computing (CC) results, particularly while the environment is employed in the analysis of big data, and minimum predictable workload dynamically enters into the cloud. The identification of the optimal scheduling solutions with diverse variables in varying platform still remains a crucial problem. Under cloud platform, the scheduling techniques should be able to adapt the changes quickly and according to the input workload. In this paper, an improved grey wolf optimization (IGWO) algorithm with oppositional learning principle has been important to carry out the scheduling task in an effective way. The presented IGWO based scheduling algorithm achieves optimal cloud resource usage and offers effective solution over the compared methods in a significant way.

Download Full-text

A Survey of Data Mining Implementation in Smart City Applications

Qubahan Academic Journal ◽

10.48161/qaj.v1n2a52 ◽

2021 ◽

Vol 1 (2) ◽

pp. 91-99

Author(s):

Zainab Salih Ageed ◽

Subhi R. M. Zeebaree ◽

Mohammed Mohammed Sadeeq ◽

Shakir Fattah Kak ◽

Zryan Najat Rashid ◽

...

Keyword(s):

Big Data ◽

Smart Cities ◽

Modern Technology ◽

Data Systems ◽

Community Model ◽

Multiple Data ◽

Big Data Applications ◽

Intelligent City ◽

Intelligent Cities ◽

Big Data Systems

Many policymakers envisage using a community model and Big Data technology to achieve the sustainability demanded by intelligent city components and raise living standards. Smart cities use different technology to make their residents more successful in their health, housing, electricity, learning, and water supplies. This involves reducing prices and the utilization of resources and communicating more effectively and creatively for our employees. Extensive data analysis is a comparatively modern technology that is capable of expanding intelligent urban facilities. Digital extraction has resulted in the processing of large volumes of data that can be used in several valuable areas since digitalization is an essential part of daily life. In many businesses and utility domains, including the intelligent urban domain, successful exploitation and multiple data use is critical. This paper examines how big data can be used for more innovative societies. It explores the possibilities, challenges, and benefits of applying big data systems in intelligent cities and compares and contrasts different intelligent cities and big data ideas. It also seeks to define criteria for the creation of big data applications for innovative city services.

Download Full-text

Security and Privacy Issues of Big Data

Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence - Advances in Data Mining and Database Management ◽

10.4018/978-1-4666-8505-5.ch002 ◽

2015 ◽

pp. 20-52 ◽

Cited By ~ 8

Author(s):

José Moura ◽

Carlos Serrão

Keyword(s):

Social Networks ◽

Big Data ◽

Personal Data ◽

Security And Privacy ◽

Data Systems ◽

Computing Systems ◽

Big Data Applications ◽

Big Data Systems ◽

Privacy Issues

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.

Download Full-text

SEDC-Based Hardware-Level Fault Tolerance and Fault Secure Checker Design for Big Data and Cloud Computing

Scientific Programming ◽

10.1155/2018/7306837 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16

Author(s):

Zahid Ali Siddiqui ◽

Jeong-A Lee ◽

Unsang Park

Keyword(s):

Cloud Computing ◽

Big Data ◽

Fault Tolerance ◽

Error Detection ◽

Power Dissipation ◽

Binary Data ◽

Concurrent Error Detection ◽

Data Systems ◽

Power Efficient ◽

Big Data Systems

Fault tolerance is of great importance for big data systems. Although several software-based application-level techniques exist for fault security in big data systems, there is a potential research space at the hardware level. Big data needs to be processed inexpensively and efficiently, for which traditional hardware architectures are, although adequate, not optimum for this purpose. In this paper, we propose a hardware-level fault tolerance scheme for big data and cloud computing that can be used with the existing software-level fault tolerance for improving the overall performance of the systems. The proposed scheme uses the concurrent error detection (CED) method to detect hardware-level faults, with the help of Scalable Error Detecting Codes (SEDC) and its checker. SEDC is an all unidirectional error detection (AUED) technique capable of detecting multiple unidirectional errors. The SEDC scheme exploits data segmentation and parallel encoding features for assigning code words. Consequently, the SEDC scheme can be scaled to any binary data length “n” with constant latency and less complexity, compared to other AUED schemes, hence making it a perfect candidate for use in big data processing hardware. We also present a novel area, delay, and power efficient, scalable fault secure checker design based on SEDC. In order to show the effectiveness of our scheme, we (1) compared the cost of hardware-based fault tolerance with an existing software-based fault tolerance technique used in HDFS and (2) compared the performance of the proposed checker in terms of area, speed, and power dissipation with the famous Berger code and m-out-of-2m code checkers. The experimental results show that (1) the proposed SEDC-based hardware-level fault tolerance scheme significantly reduces the average cost associated with software-based fault tolerance in a big data application, and (2) the proposed fault secure checker outperforms the state-of-the-art checkers in terms of area, delay, and power dissipation.

Download Full-text