scholarly journals The Need to Consider Hardware Selection when Designing Big Data Applications Supported by Metadata

Author(s):  
Nathan Regola ◽  
David A. Cieslak ◽  
Nitesh V. Chawla

The selection of hardware to support big data systems is complex. Even defining the term “big data” is difficult. “Big data” can mean a large volume of data in a database, a MapReduce cluster that processes data, analytics and reporting applications that must access large datasets to operate, algorithms that can effectively operate on large datasets, or even basic scripts that produce a needed resulted by leveraging data. Big data systems can be composed of many component systems. For these reasons, it appears difficult to create a universal, representative benchmark that approximates a “big data” workload. Along with the trend to utilize large datasets and sophisticated tools to analyze data, the trend of cloud computing has emerged as an effective method of leasing compute time. This chapter explores some of the issues at the intersection of virtualized computing (since cloud computing often uses virtual machines), metadata stores, and big data. Metadata is important because it enables many applications and users to access datasets and effectively use them without relying on extensive knowledge from humans about the data.

Author(s):  
José Moura ◽  
Carlos Serrão

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.


2019 ◽  
Vol 3 (3) ◽  
pp. 47
Author(s):  
Johannes Kroß ◽  
Helmut Krcmar

Evaluating and predicting the performance of big data applications are required to efficiently size capacities and manage operations. Gaining profound insights into the system architecture, dependencies of components, resource demands, and configurations cause difficulties to engineers. To address these challenges, this paper presents an approach to automatically extract and transform system specifications to predict the performance of applications. It consists of three components. First, a system-and tool-agnostic domain-specific language (DSL) allows the modeling of performance-relevant factors of big data applications, computing resources, and data workload. Second, DSL instances are automatically extracted from monitored measurements of Apache Spark and Apache Hadoop (i.e., YARN and HDFS) systems. Third, these instances are transformed to model- and simulation-based performance evaluation tools to allow predictions. By adapting DSL instances, our approach enables engineers to predict the performance of applications for different scenarios such as changing data input and resources. We evaluate our approach by predicting the performance of linear regression and random forest applications of the HiBench benchmark suite. Simulation results of adjusted DSL instances compared to measurement results show accurate predictions errors below 15% based upon averages for response times and resource utilization.


Service-Oriented Architecture is a method for scheming dealing and organizing systems that represent ecofriendly business functionality. The objective of this study is to find out the critical success factors that need to implement SOA in BIG DATA systems. Our study aimed at classifying these erroneous performs in execution of SOA. The acceptance of SOA has interested creators to text its requests and applications. The analysed results would be very useful for researchers who would like to implement SOA with BIG DATA systems. SOA lead numerous advantages such as value-added flexibility and appropriate alignment among processes as well as reduced cost of integration and maintenance. Generally, BIG DATA anxieties large-volume, composite, rising figures groups with numerous, self-directed sources. BIG DATA claims where data gathering has grownup extremely and is elsewhere the aptitude of usually used software utensils to detention, accomplish and development within the rise [1]. The greatest essential task for the BIG DATA applications is to discover the large volumes of data and excerpt valuable material or information for upcoming actions. The main purpose of this study is to identify the important factors that are needed to implement SOA in BIG DATA systems. Zhang and Yang suggests a reengineering approach which will restructure the legacy systems that leads to SOA by considering of an organization. This paper also express various challenges of SOA and identify the problems that improve SOA based services for data exchange in BIG DATA systems.


Author(s):  
Przemysław Lisowski ◽  
Adam Piórkowski ◽  
Andrzej Lesniak

Storing large amounts of spatial data in GIS systems is problematic. This problem is growing due to ever- increasing data production from a variety of data sources. The phenomenon of collecting huge amounts of data is called Big Data. Existing solutions are capable of processing and storing large volumes of spatial data. These solutions also show new approaches to data processing. Conventional techniques work with ordinary data but are not suitable for large datasets. Their efficient action is possible only when connected to distributed file systems and algorithms able to reduce tasks. This review focuses on the characteristics of large spatial data and discusses opportunities offered by spatial big data systems. The work also draws attention to the problems of indexing and access to data, and proposed solutions in this area.


Web Services ◽  
2019 ◽  
pp. 2197-2229
Author(s):  
José Moura ◽  
Carlos Serrão

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.


2019 ◽  
Vol 3 (1) ◽  
pp. 19 ◽  
Author(s):  
Michael Kaufmann

Many big data projects are technology-driven and thus, expensive and inefficient. It is often unclear how to exploit existing data resources and map data, systems and analytics results to actual use cases. Existing big data reference models are mostly either technological or business-oriented in nature, but do not consequently align both aspects. To address this issue, a reference model for big data management is proposed that operationalizes value creation from big data by linking business targets with technical implementation. The purpose of this model is to provide a goal- and value-oriented framework to effectively map and plan purposeful big data systems aligned with a clear value proposition. Based on an epistemic model that conceptualizes big data management as a cognitive system, the solution space of data value creation is divided into five layers: preparation, analysis, interaction, effectuation, and intelligence. To operationalize the model, each of these layers is subdivided into corresponding business and IT aspects to create a link from use cases to technological implementation. The resulting reference model, the big data management canvas, can be applied to classify and extend existing big data applications and to derive and plan new big data solutions, visions, and strategies for future projects. To validate the model in the context of existing information systems, the paper describes three cases of big data management in existing companies.


2021 ◽  
Vol 18 (4) ◽  
pp. 1227-1232
Author(s):  
L. R. Aravind Babu ◽  
J. Saravana Kumar

Presently, big data is very popular, since it finds helpful in diverse domains like social media, E-commerce transactions, etc. Cloud computing offers services on demand, broader networking access, source collection, quick flexibility and calculated services. The cloud sources are usually different and the application necessities of the end user are rapidly changing from time to time. So, the resource management is the tedious process. At the same time, resource management and scheduling plays a vital part in cloud computing (CC) results, particularly while the environment is employed in the analysis of big data, and minimum predictable workload dynamically enters into the cloud. The identification of the optimal scheduling solutions with diverse variables in varying platform still remains a crucial problem. Under cloud platform, the scheduling techniques should be able to adapt the changes quickly and according to the input workload. In this paper, an improved grey wolf optimization (IGWO) algorithm with oppositional learning principle has been important to carry out the scheduling task in an effective way. The presented IGWO based scheduling algorithm achieves optimal cloud resource usage and offers effective solution over the compared methods in a significant way.


2021 ◽  
Vol 1 (2) ◽  
pp. 91-99
Author(s):  
Zainab Salih Ageed ◽  
Subhi R. M. Zeebaree ◽  
Mohammed Mohammed Sadeeq ◽  
Shakir Fattah Kak ◽  
Zryan Najat Rashid ◽  
...  

Many policymakers envisage using a community model and Big Data technology to achieve the sustainability demanded by intelligent city components and raise living standards. Smart cities use different technology to make their residents more successful in their health, housing, electricity, learning, and water supplies. This involves reducing prices and the utilization of resources and communicating more effectively and creatively for our employees. Extensive data analysis is a comparatively modern technology that is capable of expanding intelligent urban facilities. Digital extraction has resulted in the processing of large volumes of data that can be used in several valuable areas since digitalization is an essential part of daily life. In many businesses and utility domains, including the intelligent urban domain, successful exploitation and multiple data use is critical. This paper examines how big data can be used for more innovative societies. It explores the possibilities, challenges, and benefits of applying big data systems in intelligent cities and compares and contrasts different intelligent cities and big data ideas. It also seeks to define criteria for the creation of big data applications for innovative city services.


Author(s):  
José Moura ◽  
Carlos Serrão

This chapter revises the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by Big Data applications. One of them is privacy. It is a pertinent aspect to be addressed because users share more and more personal data and content through their devices and computers to social networks and public clouds. So, a secure framework to social networks is a very hot topic research. This last topic is addressed in one of the two sections of the current chapter with case studies. In addition, the traditional mechanisms to support security such as firewalls and demilitarized zones are not suitable to be applied in computing systems to support Big Data. SDN is an emergent management solution that could become a convenient mechanism to implement security in Big Data systems, as we show through a second case study at the end of the chapter. This also discusses current relevant work and identifies open issues.


2018 ◽  
Vol 2018 ◽  
pp. 1-16
Author(s):  
Zahid Ali Siddiqui ◽  
Jeong-A Lee ◽  
Unsang Park

Fault tolerance is of great importance for big data systems. Although several software-based application-level techniques exist for fault security in big data systems, there is a potential research space at the hardware level. Big data needs to be processed inexpensively and efficiently, for which traditional hardware architectures are, although adequate, not optimum for this purpose. In this paper, we propose a hardware-level fault tolerance scheme for big data and cloud computing that can be used with the existing software-level fault tolerance for improving the overall performance of the systems. The proposed scheme uses the concurrent error detection (CED) method to detect hardware-level faults, with the help of Scalable Error Detecting Codes (SEDC) and its checker. SEDC is an all unidirectional error detection (AUED) technique capable of detecting multiple unidirectional errors. The SEDC scheme exploits data segmentation and parallel encoding features for assigning code words. Consequently, the SEDC scheme can be scaled to any binary data length “n” with constant latency and less complexity, compared to other AUED schemes, hence making it a perfect candidate for use in big data processing hardware. We also present a novel area, delay, and power efficient, scalable fault secure checker design based on SEDC. In order to show the effectiveness of our scheme, we (1) compared the cost of hardware-based fault tolerance with an existing software-based fault tolerance technique used in HDFS and (2) compared the performance of the proposed checker in terms of area, speed, and power dissipation with the famous Berger code and m-out-of-2m code checkers. The experimental results show that (1) the proposed SEDC-based hardware-level fault tolerance scheme significantly reduces the average cost associated with software-based fault tolerance in a big data application, and (2) the proposed fault secure checker outperforms the state-of-the-art checkers in terms of area, delay, and power dissipation.


Sign in / Sign up

Export Citation Format

Share Document