scholarly journals PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop

2019 ◽  
Vol 3 (3) ◽  
pp. 47
Author(s):  
Johannes Kroß ◽  
Helmut Krcmar

Evaluating and predicting the performance of big data applications are required to efficiently size capacities and manage operations. Gaining profound insights into the system architecture, dependencies of components, resource demands, and configurations cause difficulties to engineers. To address these challenges, this paper presents an approach to automatically extract and transform system specifications to predict the performance of applications. It consists of three components. First, a system-and tool-agnostic domain-specific language (DSL) allows the modeling of performance-relevant factors of big data applications, computing resources, and data workload. Second, DSL instances are automatically extracted from monitored measurements of Apache Spark and Apache Hadoop (i.e., YARN and HDFS) systems. Third, these instances are transformed to model- and simulation-based performance evaluation tools to allow predictions. By adapting DSL instances, our approach enables engineers to predict the performance of applications for different scenarios such as changing data input and resources. We evaluate our approach by predicting the performance of linear regression and random forest applications of the HiBench benchmark suite. Simulation results of adjusted DSL instances compared to measurement results show accurate predictions errors below 15% based upon averages for response times and resource utilization.

Author(s):  
José Moura ◽  
Carlos Serrão

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.


Service-Oriented Architecture is a method for scheming dealing and organizing systems that represent ecofriendly business functionality. The objective of this study is to find out the critical success factors that need to implement SOA in BIG DATA systems. Our study aimed at classifying these erroneous performs in execution of SOA. The acceptance of SOA has interested creators to text its requests and applications. The analysed results would be very useful for researchers who would like to implement SOA with BIG DATA systems. SOA lead numerous advantages such as value-added flexibility and appropriate alignment among processes as well as reduced cost of integration and maintenance. Generally, BIG DATA anxieties large-volume, composite, rising figures groups with numerous, self-directed sources. BIG DATA claims where data gathering has grownup extremely and is elsewhere the aptitude of usually used software utensils to detention, accomplish and development within the rise [1]. The greatest essential task for the BIG DATA applications is to discover the large volumes of data and excerpt valuable material or information for upcoming actions. The main purpose of this study is to identify the important factors that are needed to implement SOA in BIG DATA systems. Zhang and Yang suggests a reengineering approach which will restructure the legacy systems that leads to SOA by considering of an organization. This paper also express various challenges of SOA and identify the problems that improve SOA based services for data exchange in BIG DATA systems.


Web Services ◽  
2019 ◽  
pp. 2197-2229
Author(s):  
José Moura ◽  
Carlos Serrão

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.


2019 ◽  
Vol 3 (1) ◽  
pp. 19 ◽  
Author(s):  
Michael Kaufmann

Many big data projects are technology-driven and thus, expensive and inefficient. It is often unclear how to exploit existing data resources and map data, systems and analytics results to actual use cases. Existing big data reference models are mostly either technological or business-oriented in nature, but do not consequently align both aspects. To address this issue, a reference model for big data management is proposed that operationalizes value creation from big data by linking business targets with technical implementation. The purpose of this model is to provide a goal- and value-oriented framework to effectively map and plan purposeful big data systems aligned with a clear value proposition. Based on an epistemic model that conceptualizes big data management as a cognitive system, the solution space of data value creation is divided into five layers: preparation, analysis, interaction, effectuation, and intelligence. To operationalize the model, each of these layers is subdivided into corresponding business and IT aspects to create a link from use cases to technological implementation. The resulting reference model, the big data management canvas, can be applied to classify and extend existing big data applications and to derive and plan new big data solutions, visions, and strategies for future projects. To validate the model in the context of existing information systems, the paper describes three cases of big data management in existing companies.


2021 ◽  
Vol 1 (2) ◽  
pp. 91-99
Author(s):  
Zainab Salih Ageed ◽  
Subhi R. M. Zeebaree ◽  
Mohammed Mohammed Sadeeq ◽  
Shakir Fattah Kak ◽  
Zryan Najat Rashid ◽  
...  

Many policymakers envisage using a community model and Big Data technology to achieve the sustainability demanded by intelligent city components and raise living standards. Smart cities use different technology to make their residents more successful in their health, housing, electricity, learning, and water supplies. This involves reducing prices and the utilization of resources and communicating more effectively and creatively for our employees. Extensive data analysis is a comparatively modern technology that is capable of expanding intelligent urban facilities. Digital extraction has resulted in the processing of large volumes of data that can be used in several valuable areas since digitalization is an essential part of daily life. In many businesses and utility domains, including the intelligent urban domain, successful exploitation and multiple data use is critical. This paper examines how big data can be used for more innovative societies. It explores the possibilities, challenges, and benefits of applying big data systems in intelligent cities and compares and contrasts different intelligent cities and big data ideas. It also seeks to define criteria for the creation of big data applications for innovative city services.


Author(s):  
José Moura ◽  
Carlos Serrão

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.


Author(s):  
Nathan Regola ◽  
David A. Cieslak ◽  
Nitesh V. Chawla

The selection of hardware to support big data systems is complex. Even defining the term “big data” is difficult. “Big data” can mean a large volume of data in a database, a MapReduce cluster that processes data, analytics and reporting applications that must access large datasets to operate, algorithms that can effectively operate on large datasets, or even basic scripts that produce a needed resulted by leveraging data. Big data systems can be composed of many component systems. For these reasons, it appears difficult to create a universal, representative benchmark that approximates a “big data” workload. Along with the trend to utilize large datasets and sophisticated tools to analyze data, the trend of cloud computing has emerged as an effective method of leasing compute time. This chapter explores some of the issues at the intersection of virtualized computing (since cloud computing often uses virtual machines), metadata stores, and big data. Metadata is important because it enables many applications and users to access datasets and effectively use them without relying on extensive knowledge from humans about the data.


2019 ◽  
pp. 1598-1630
Author(s):  
José Moura ◽  
Carlos Serrão

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.


Sign in / Sign up

Export Citation Format

Share Document