MR-MVPP: A map-reduce-based approach for creating MVPP in data warehouses for big data applications

2021 ◽  
Vol 570 ◽  
pp. 200-224
Author(s):  
Hossein Azgomi ◽  
Mohammad Karim Sohrabi
Author(s):  
P. Lalitha Surya Kumari

This chapter gives information about the most important aspects in how computing infrastructures should be configured and intelligently managed to fulfill the most notably security aspects required by big data applications. Big data is one area where we can store, extract, and process a large amount of data. All these data are very often unstructured. Using big data, security functions are required to work over the heterogeneous composition of diverse hardware, operating systems, and network domains. A clearly defined security boundary like firewalls and demilitarized zones (DMZs), conventional security solutions, are not effective for big data as it expands with the help of public clouds. This chapter discusses the different concepts like characteristics, risks, life cycle, and data collection of big data, map reduce components, issues and challenges in big data, cloud secure alliance, approaches to solve security issues, introduction of cybercrime, YARN, and Hadoop components.


Author(s):  
K. Radha ◽  
B. Thirumala Rao

<p>We are living in on-Demand Digital Universe with data spread by users and organizations at a very high rate. This data is categorized as Big Data because of its Variety, Velocity, Veracity and Volume. This data is again classified into unstructured, semi-structured and structured. Large datasets require special processing systems; it is a unique challenge for academicians and researchers. Map Reduce jobs use efficient data processing techniques which are applied in every phases of Map Reduce such as Mapping, Combining, Shuffling, Indexing, Grouping and Reducing. Big Data has essential characteristics as follows Variety, Volume and Velocity, Viscosity, Virality. Big Data is one of the current and future research frontiers. In many areas Big Data is changed such as public administration, scientific research, business, The Financial Services Industry, Automotive Industry, Supply Chain, Logistics, and Industrial Engineering, Retail, Entertainment, etc. Other Big Data applications are exist in atmospheric science, astronomy, medicine, biologic, biogeochemistry, genomics and interdisciplinary and complex researches.  This paper is presents the Essential Characteristics of Big Data Applications and State of-the-art tools and techniques to handle data-intensive applications and also building index for web pages available online and see how Map and Reduce functions can be executed by considering input as a set of documents.</p><p> </p>


Author(s):  
Shalin Eliabeth S. ◽  
Sarju S.

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.


Sign in / Sign up

Export Citation Format

Share Document