Negative Cost Girth Problem Using Map-Reduce Framework

Abstract On a graph with a negative cost cycle, the shortest path is undefined, but the number of edges of the shortest negative cost cycle could be computed. It is called Negative Cost Girth (NCG). The NCG problem is applied in many optimization issues such as scheduling and model verification. The existing polynomial algorithms suffer from high computation and memory consumption. In this paper, a powerful Map-Reduce framework implemented to find the NCG of a graph. The proposed algorithm runs in O(log k) parallel time over O(n3) on each Hadoop nodes, where n; k are the size of the graph and the value of NCG, respectively. The Hadoop implementation of the algorithm shows that the total execution time is reduced by 50% compared with polynomial algorithms, especially in large networks concerning increasing the numbers of Hadoop nodes. The result proves the efficiency of the approach for solving the NCG problem to process big data in a parallel and distributed way.

Download Full-text

Big data Performance Evalution of Map-Reduce Pig and Hive

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f9002.088619 ◽

2019 ◽

Vol 8 (6) ◽

pp. 2982-2985

Keyword(s):

Big Data ◽

Data Processing ◽

Execution Time ◽

Research Work ◽

Map Reduce ◽

Bench Mark ◽

Big Data Processing ◽

Traditional System ◽

Client Request ◽

Societal Problems

Big data is nothing but unstructured and structured data which is not possible to process by our traditional system its not only have the volume of data also velocity and verity of data, Processing means ( store and analyze for knowledge information to take decision), Every living, non living and each and every device generates tremendous amount of data every fraction of seconds, Hadoop is a software frame work to process big data to get knowledge out of stored data and enhance the business and solve the societal problems, Hadoop basically have two important components HDFS and Map Reduce HDFS for store and mapreduce to process. HDFS includes name node and data nodes for storage, Map-Reduce includes frame works of Job tracker and Task tracker. Whenever client request Hadoop to store name node responds with available free memory data nodes then client will write data to respective data nodes then replication factor of hadoop copies the blocks of data with other data nodes to overcome fault tolerance Name node stores the meta of data nodes. Replication is for back-up as hadoop HDFS uses commodity hardware for storage, also name node have back-up secondary name node as only point of failure the hadoop. Whenever clients want to process the data, client request the name node Job tracker then Name node communicate to Task tracker for task done. All the above components of hadoop are frame works on-top of OS for efficient utilization and manage the system recourses for big data processing. Big data processing performance is measured with bench marks programs in our research work we compared the processing i.e. execution time of bench mark program word count with Hadoop Map-Reduce python Jar code, PIG script and Hive query with same input file big.txt. and we can say that Hive is much faster than PIG and Map-reduce Python jar code Map-reduce execution time is 1m, 29sec Pig Execution time is 57 sec Hive execution time is 31 sec

Download Full-text

Fast Frequent Item Mining from Big Data using Map Reduce and Bit Vectors

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i2.1525 ◽

2021 ◽

Vol 12 (2) ◽

pp. 1866-1875

Author(s):

Thirumaran. S, Et. al.

Keyword(s):

Big Data ◽

Large Data ◽

Search Space ◽

Large Data Sets ◽

Map Reduce ◽

Data Sets ◽

Memory Consumption ◽

Vertical Data ◽

Data Representations ◽

Common Problems

One of the most important areas that are constantly being focused recently is the big data and mining frequent patterns from them is an interesting vertical which is perpetually being evolved and gained plethora of attention among the research fraternities. Generally, the data is mined with the aid of Apriori based algorithms, tree based algorithm and hash based algorithm but most of these existing algorithms suffer many snags and limitations. This paper proposes a new method that overrides and overcomes the most common problems related to speed, memory consumption and search space. The algorithm named Dual Mine employs binary vector representation and vertical data representations in the map reduce and then discover the most patterns from the large data sets. The Dual mine algorithm is then compared with some of the existing algorithms to determine the efficiency of the proposed algorithm and from the experimental results it is quite evident that the proposed algorithm “Dual Mine” outscored the other algorithms by a big magnitude with respect to speed and memory.

Download Full-text

A Comparison of ORC-Compress Performance with Big Data Workload on Virtualization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.855.153 ◽

2016 ◽

Vol 855 ◽

pp. 153-158

Author(s):

Kritwara Rattanaopas ◽

Sureerat Kaewkeerat ◽

Yanapat Chuchuen

Keyword(s):

Big Data ◽

Execution Time ◽

Large Data ◽

Map Reduce ◽

Data Set ◽

Relational Information ◽

Open Source Data ◽

Space Saving ◽

Source Data ◽

Better Than

Big Data is widely used in many organizations nowadays. Hive is an open source data warehouse system for managing large data set. It provides a SQL-like interface to Hadoop over Map-Reduce framework. Currently, Big Data solution starts to adopt HiveQL tool to improve execution time of relational information. In this paper, we investigate on an execution time of query processing issues comparing two algorithm of ORC file: ZLIB and SNAPPY. The results show that ZLIB can compress data up to 87% compared to NONE compressing data. It was better than SNAPPY which has space saving 79%. However, the key for reducing execution time is Map-Reduce that were shown by a less query execution time when mapper and data node were equal. For example, all query suites in 6-node(ZLIB/SNAPPY) with 250-million table rows has quite similar execution time comparison to 9-node(ZLIB/SNAPPY) with 350-million table rows.

Download Full-text

Big Data Privacy Preservation Using Two Phase Top-Down Specialization Algorithm with Multidimensional Map Reduce Framework on Hadoop

International Journal of Distributed and Cloud Computing ◽

10.21863/ijdcc/2015.3.2.009 ◽

2015 ◽

Vol 3 (2) ◽

Author(s):

Shalin Eliabeth S. ◽

Sarju S.

Keyword(s):

Big Data ◽

Data Privacy ◽

Privacy Preservation ◽

Experimental Result ◽

Map Reduce ◽

Distributed Environment ◽

Top Down ◽

Two Phase ◽

Data Anonymization ◽

Big Data Privacy

Big data privacy preservation is one of the most disturbed issues in current industry. Sometimes the data privacy problems never identified when input data is published on cloud environment. Data privacy preservation in hadoop deals in hiding and publishing input dataset to the distributed environment. In this paper investigate the problem of big data anonymization for privacy preservation from the perspectives of scalability and time factor etc. At present, many cloud applications with big data anonymization faces the same kind of problems. For recovering this kind of problems, here introduced a data anonymization algorithm called Two Phase Top-Down Specialization (TPTDS) algorithm that is implemented in hadoop. For the data anonymization-45,222 records of adults information with 15 attribute values was taken as the input big data. With the help of multidimensional anonymization in map reduce framework, here implemented proposed Two-Phase Top-Down Specialization anonymization algorithm in hadoop and it will increases the efficiency on the big data processing system. By conducting experiment in both one dimensional and multidimensional map reduce framework with Two Phase Top-Down Specialization algorithm on hadoop, the better result shown in multidimensional anonymization on input adult dataset. Data sets is generalized in a top-down manner and the better result was shown in multidimensional map reduce framework by the better IGPL values generated by the algorithm. The anonymization was performed with specialization operation on taxonomy tree. The experiment shows that the solutions improves the IGPL values, anonymity parameter and decreases the execution time of big data privacy preservation by compared to the existing algorithm. This experimental result will leads to great application to the distributed environment.

Download Full-text

Preemptive Scheduling for Two-Processor Systems

Fundamenta Informaticae ◽

10.3233/fi-1988-11102 ◽

1988 ◽

Vol 11 (1) ◽

pp. 1-19

Author(s):

Andrzej Rowicki

Keyword(s):

Execution Time ◽

Preemptive Scheduling ◽

Total Execution Time ◽

Schedule Length ◽

Execution Times ◽

Dependent Tasks

The purpose of the paper is to consider an algorithm for preemptive scheduling for two-processor systems with identical processors. Computations submitted to the systems are composed of dependent tasks with arbitrary execution times and contain no loops and have only one output. We assume that preemptions times are completely unconstrained, and preemptions consume no time. Moreover, the algorithm determines the total execution time of the computation. It has been proved that this algorithm is optimal, that is, the total execution time of the computation (schedule length) is minimized.

Download Full-text

Addressing big data problem using Hadoop and Map Reduce

2012 Nirma University International Conference on Engineering (NUiCONE) ◽

10.1109/nuicone.2012.6493198 ◽

2012 ◽

Cited By ~ 77

Author(s):

Aditya B. Patel ◽

Manashvi Birla ◽

Ushma Nair

Keyword(s):

Big Data ◽

Map Reduce ◽

Data Problem

Download Full-text

Kringing Regressive Map reduce Entropy Feature Extraction based Rocchio Adaptive Boost Ensemble Classifier for Early Disease Diagnosis with Big Data

Dynamic Systems and Applications ◽

10.46719/dsa20213064 ◽

2021 ◽

Vol 30 (6) ◽

Author(s):

A Kaliappan ◽

D Chitra

Keyword(s):

Feature Extraction ◽

Big Data ◽

Disease Diagnosis ◽

Ensemble Classifier ◽

Map Reduce ◽

Early Disease

Download Full-text

Data-Intensive Task Scheduling for Heterogeneous Big Data Analytics in IoT System

Energies ◽

10.3390/en13174508 ◽

2020 ◽

Vol 13 (17) ◽

pp. 4508

Author(s):

Xin Li ◽

Liangyuan Wang ◽

Jemal H. Abawajy ◽

Xiaolin Qin ◽

Giovanni Pau ◽

...

Keyword(s):

Big Data ◽

Data Analysis ◽

Task Scheduling ◽

Execution Time ◽

Data Centers ◽

Big Data Analysis ◽

Data Locality ◽

Data Migration ◽

Task Execution ◽

Task Execution Time

Efficient big data analysis is critical to support applications or services in Internet of Things (IoT) system, especially for the time-intensive services. Hence, the data center may host heterogeneous big data analysis tasks for multiple IoT systems. It is a challenging problem since the data centers usually need to schedule a large number of periodic or online tasks in a short time. In this paper, we investigate the heterogeneous task scheduling problem to reduce the global task execution time, which is also an efficient method to reduce energy consumption for data centers. We establish the task execution for heterogeneous tasks respectively based on the data locality feature, which also indicate the relationship among the tasks, data blocks and servers. We propose a heterogeneous task scheduling algorithm with data migration. The core idea of the algorithm is to maximize the efficiency by comparing the cost between remote task execution and data migration, which could improve the data locality and reduce task execution time. We conduct extensive simulations and the experimental results show that our algorithm has better performance than the traditional methods, and data migration actually works to reduce th overall task execution time. The algorithm also shows acceptable fairness for the heterogeneous tasks.

Download Full-text