Greedy and Local Ratio Algorithms in the MapReduce Model

In this article investigated the features of processing large arrays of information for distributed systems. A method of singular data decomposition is used to reduce the amount of data processed, eliminating redundancy. Dependencies of computational efficiency on distributed systems were obtained using the MPI messaging protocol and MapReduce node interaction software model. Were analyzed the efficiency of the application of each technology for the processing of different sizes of data: Non — distributed systems are inefficient for large volumes of information due to low computing performance. It is proposed to use distributed systems that use the method of singular data decomposition, which will reduce the amount of information processed. The study of systems using the MPI protocol and MapReduce model obtained the dependence of the duration calculations time on the number of processes, which testify to the expediency of using distributed computing when processing large data sets. It is also found that distributed systems using MapReduce model work much more efficiently than MPI, especially with large amounts of data. MPI makes it possible to perform calculations more efficiently for small amounts of information. When increased the data sets, advisable to use the Map Reduce model.

Download Full-text

Supervised Classifier Approach for Intrusion Detection on KDD with Optimal MapReduce Framework Model in Cloud Computing

Recent Patents on Computer Science ◽

10.2174/1573401315666190619113510 ◽

2019 ◽

Vol 12 ◽

Author(s):

M. Ilayaraja ◽

S. Hemalatha ◽

P. Manickam ◽

K. Sathesh Kumar ◽

K. Shankar

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Intrusion Detection ◽

Decision Tree ◽

Learning Strategies ◽

Nearest Neighbor ◽

Detection System ◽

K Nearest Neighbor ◽

Mapreduce Model ◽

The Web

Cloud computing is characterized as the arrangement of assets or administrations accessible through the web to the clients on their request by cloud providers. It communicates everything as administrations over the web in view of the client request, for example operating system, organize equipment, storage, assets, and software. Nowadays, Intrusion Detection System (IDS) plays a powerful system, which deals with the influence of experts to get actions when the system is hacked under some intrusions. Most intrusion detection frameworks are created in light of machine learning strategies. Since the datasets, this utilized as a part of intrusion detection is Knowledge Discovery in Database (KDD). In this paper detect or classify the intruded data utilizing Machine Learning (ML) with the MapReduce model. The primary face considers Hadoop MapReduce model to reduce the extent of database ideal weight decided for reducer model and second stage utilizing Decision Tree (DT) classifier to detect the data. This DT classifier comprises utilizing an appropriate classifier to decide the class labels for the non-homogeneous leaf nodes. The decision tree fragment gives a coarse section profile while the leaf level classifier can give data about the qualities that influence the label inside a portion. From the proposed result accuracy for detection is 96.21% contrasted with existing classifiers, for example, Neural Network (NN), Naive Bayes (NB) and K Nearest Neighbor (KNN).

Download Full-text

Parallel processing optimization strategy based on MapReduce model in cloud storage environment

10.1063/1.4982581 ◽

2017 ◽

Author(s):

Jianming Cui ◽

Jiayi Liu ◽

Qiuyan Li

Keyword(s):

Parallel Processing ◽

Cloud Storage ◽

Optimization Strategy ◽

Mapreduce Model ◽

Cloud Storage Environment ◽

Processing Optimization

Download Full-text

A Distributed Parallel Algorithm Based on Low-Rank and Sparse Representation for Anomaly Detection in Hyperspectral Images

Sensors ◽

10.3390/s18113627 ◽

2018 ◽

Vol 18 (11) ◽

pp. 3627 ◽

Cited By ~ 1

Author(s):

Yi Zhang ◽

Zebin Wu ◽

Jin Sun ◽

Yan Zhang ◽

Yaoqin Zhu ◽

...

Keyword(s):

Anomaly Detection ◽

Sparse Representation ◽

Parallel Algorithm ◽

Hyperspectral Image ◽

Hyperspectral Images ◽

Low Rank ◽

Detection Methods ◽

Matrix Computations ◽

Hyperspectral Image Processing ◽

Mapreduce Model

Anomaly detection aims to separate anomalous pixels from the background, and has become an important application of remotely sensed hyperspectral image processing. Anomaly detection methods based on low-rank and sparse representation (LRASR) can accurately detect anomalous pixels. However, with the significant volume increase of hyperspectral image repositories, such techniques consume a significant amount of time (mainly due to the massive amount of matrix computations involved). In this paper, we propose a novel distributed parallel algorithm (DPA) by redesigning key operators of LRASR in terms of MapReduce model to accelerate LRASR on cloud computing architectures. Independent computation operators are explored and executed in parallel on Spark. Specifically, we reconstitute the hyperspectral images in an appropriate format for efficient DPA processing, design the optimized storage strategy, and develop a pre-merge mechanism to reduce data transmission. Besides, a repartitioning policy is also proposed to improve DPA’s efficiency. Our experimental results demonstrate that the newly developed DPA achieves very high speedups when accelerating LRASR, in addition to maintaining similar accuracies. Moreover, our proposed DPA is shown to be scalable with the number of computing nodes and capable of processing big hyperspectral images involving massive amounts of data.

Download Full-text

Primal-Dual Schema and Local Ratio

Design and Analysis of Approximation Algorithms - Springer Optimization and Its Applications ◽

10.1007/978-1-4614-1701-9_8 ◽

2011 ◽

pp. 297-337 ◽

Cited By ~ 1

Author(s):

Ding-Zhu Du ◽

Ker-I Ko ◽

Xiaodong Hu

Keyword(s):

Local Ratio ◽

Primal Dual

Download Full-text

INTERESTING SPATIO-TEMPORAL REGION DISCOVERY COMPUTATIONS OVER GPU AND MAPREDUCE PLATFORMS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprsannals-ii-4-w2-35-2015 ◽

2015 ◽

Vol II-4/W2 ◽

pp. 35-41 ◽

Cited By ~ 1

Author(s):

M. McDermott ◽

S. K. Prasad ◽

S. Shekhar ◽

X. Zhou

Keyword(s):

Parallel Architecture ◽

Computation Time ◽

Data Sets ◽

Temporal Region ◽

Data Set ◽

Atmospheric Sciences ◽

3 Dimensional ◽

Linear Speedup ◽

Spatio Temporal ◽

Mapreduce Model

Discovery of interesting paths and regions in spatio-temporal data sets is important to many fields such as the earth and atmospheric sciences, GIS, public safety and public health both as a goal and as a preliminary step in a larger series of computations. This discovery is usually an exhaustive procedure that quickly becomes extremely time consuming to perform using traditional paradigms and hardware and given the rapidly growing sizes of today’s data sets is quickly outpacing the speed at which computational capacity is growing. In our previous work (Prasad et al., 2013a) we achieved a 50 times speedup over sequential using a single GPU. We were able to achieve near linear speedup over this result on interesting path discovery by using Apache Hadoop to distribute the workload across multiple GPU nodes. Leveraging the parallel architecture of GPUs we were able to drastically reduce the computation time of a 3-dimensional spatio-temporal interest region search on a single tile of normalized difference vegetative index for Saudi Arabia. We were further able to see an almost linear speedup in compute performance by distributing this workload across several GPUs with a simple MapReduce model. This increases the speed of processing 10 fold over the comparable sequential while simultaneously increasing the amount of data being processed by 384 fold. This allowed us to process the entirety of the selected data set instead of a constrained window.

Download Full-text

Highly Scalable Sequential Pattern Mining Based on MapReduce Model on the Cloud

2013 IEEE International Congress on Big Data ◽

10.1109/bigdata.congress.2013.48 ◽

2013 ◽

Cited By ~ 18

Author(s):

Chun-Chieh Chen ◽

Chi-Yao Tseng ◽

Ming-Syan Chen

Keyword(s):

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Mapreduce Model

Download Full-text

An Improved MapReduce Model for Computation-Intensive Task

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.756-759.1701 ◽

2013 ◽

Vol 756-759 ◽

pp. 1701-1705

Author(s):

Han Lin Sun

Keyword(s):

Communication Channel ◽

Programming Model ◽

Machine Learning Algorithms ◽

Fluctuation Analysis ◽

Data Set ◽

Multifractal Detrended Fluctuation Analysis ◽

Data Intensive ◽

Parallel Programming Model ◽

Mapreduce Model ◽

Detrended Fluctuation

MapReduce is a widely adopted parallel programming model. The standard MapReduce model is designed for data-intensive processing. However, some machine learning algorithms are computation-intensive and time-consuming tasks which process the same data set repeatedly. In this paper, we proposed an improved MapReduce model for computation-intensive algorithms. The model is constructed from a service combination perspective. In the model, the whole task is divided into lots of subtasks taking account into the algorithms parameters, and the datagram with acknowledgement mechanism is used as the communication channel among cluster workers. We took the multifractal detrended fluctuation analysis algorithm as an example to demonstrate the model.

Download Full-text

Performance Analysis Using Petri Net Based MapReduce Model in Heterogeneous Clusters

Lecture Notes in Computer Science - Advances in Web-Based Learning – ICWL 2013 Workshops ◽

10.1007/978-3-662-46315-4_18 ◽

2015 ◽

pp. 170-179 ◽

Cited By ~ 2

Author(s):

Sheng-Tzong Cheng ◽

Hsi-Chuan Wang ◽

Yin-Jun Chen ◽

Chen-Fei Chen

Keyword(s):

Performance Analysis ◽

Petri Net ◽

Heterogeneous Clusters ◽

Mapreduce Model

Download Full-text

Flexible MapReduce Workflows for Cloud Data Analytics

International Journal of Grid and High Performance Computing ◽

10.4018/ijghpc.2013100104 ◽

2013 ◽

Vol 5 (4) ◽

pp. 48-64 ◽

Cited By ~ 1

Author(s):

Carlos Goncalves ◽

Luis Assuncao ◽

Jose C. Cunha

Keyword(s):

Text Mining ◽

Data Analytics ◽

Large Data ◽

Large Data Sets ◽

Data Sets ◽

Tuple Space ◽

Cloud Data ◽

Intermediate Data ◽

Speed Up ◽

Mapreduce Model

Data analytics applications handle large data sets subject to multiple processing phases, some of which can execute in parallel on clusters, grids or clouds. Such applications can benefit from using MapReduce model, only requiring the end-user to define the application algorithms for input data processing and the map and reduce functions, but this poses a need to install/configure specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. In order to provide more flexibility in defining and adjusting the application configurations, as well as in the specification of the composition of the application phases and their orchestration, the authors describe an approach for supporting MapReduce stages as sub-workflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). The authors discuss how a text mining application is represented as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. Access to intermediate data produced during the MapReduce computations is supported by a data sharing abstraction. The authors describe two implementations of this abstraction, one based on a shared tuple space and another based on an in-memory distributed key/value store. The authors describe the implementation of the framework, a set of developed tools, and our experimentation with the execution of the text mining algorithm over multiple Amazon EC2 (Elastic Compute Cloud) instances, and report on the speed-up and size-up results obtained up to 20 EC2 instances and for different corpus sizes, up to 97 million words.

Download Full-text