Document Similarity Measurement Using Ferret Algorithm and Map Reduce Programming Model

To represent the text document more expressively, a kind of graph-based semantic model is proposed, in which more semantic information among keyphrases as well as the structural information of the text are incorporated. The method produces structured representations of texts by utilizing common, popular knowledge bases (e.g. DBpedia, Wikipedia) to acquire fine-grained information about concepts, entities, and their semantic relations, thus resulting in a knowledge-rich interpretation. We demonstrate the benefits of these representations in the task of document similarity measurement. Relevance evaluation between two documents is done by calculating the semantic similarity between two keyphrase graphs that represent them. Experimental results show that our approach outperforms standard baselines based on traditional document representations, and able to come close in performance to the specialized methods particularly tuned to this task on the specific dataset.

Download Full-text

An Approach of Semantic Similarity Measure between Documents Based on Big Data

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v6i5.10853 ◽

2016 ◽

Vol 6 (5) ◽

pp. 2454 ◽

Cited By ~ 2

Author(s):

Mohammed Erritali ◽

Abderrahim Beni-Hssane ◽

Marouane Birjali ◽

Youness Madani

Keyword(s):

Big Data ◽

Semantic Similarity ◽

Retrieval System ◽

Programming Model ◽

State Of The Art ◽

Distributed Processing ◽

Similarity Measures ◽

The State ◽

Semantic Similarity Measure ◽

Document Similarity

<p>Semantic indexing and document similarity is an important information retrieval system problem in Big Data with broad applications. In this paper, we investigate MapReduce programming model as a specific framework for managing distributed processing in a large of amount documents. Then we study the state of the art of different approaches for computing the similarity of documents. Finally, we propose our approach of semantic similarity measures using WordNet as an external network semantic resource. For evaluation, we compare the proposed approach with other approaches previously presented by using our new MapReduce algorithm. Experimental results review that our proposed approach outperforms the state of the art ones on running time performance and increases the measurement of semantic similarity.</p>

Download Full-text

An Abstract Description Method of Map-Reduce-Merge Using Haskell

Mathematical Problems in Engineering ◽

10.1155/2013/147593 ◽

2013 ◽

Vol 2013 ◽

pp. 1-12

Author(s):

Lei Liu ◽

Dongqing Liu ◽

Shuai Lü ◽

Peng Zhang

Keyword(s):

Parallel Programming ◽

Programming Model ◽

Basic Program ◽

Map Reduce ◽

Cloud Computing Environment ◽

Abstract Description ◽

Heterogeneous Datasets ◽

Rigorous Description ◽

Parallel Programming Model ◽

And Function

Map-Reduce-Merge is an improved parallel programming model based on Map-Reduce in cloud computing environment. Through the new Merge module, Map-Reduce-Merge can support processing multiple related heterogeneous datasets more efficiently. In order to demonstrate the validity and effectiveness of this new model, we present a rigorous description for Map-Reduce-Merge model using Haskell. Firstly, we describe the basic program skeleton of Map-Reduce-Merge programming model. Secondly, an abstract description for the Merge module is presented by analyzing the structure and function of the Merge module with Haskell as the description tool. Thirdly, we evaluate the Map-Reduce-Merge model on the basis of our description. We capture the functional characteristics of the Map-Reduce-Merge model by our abstract description, which can provide theoretical basis for designing more efficient parallel programming model to process join operation.

Download Full-text

A Parallel Direct‐Vertical Map Reduce Programming model for an effective frequent pattern mining in a dispersed environment

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6470 ◽

2021 ◽

Author(s):

N Yamuna Devi

Keyword(s):

Pattern Mining ◽

Programming Model ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

Map Reduce

Download Full-text

Scalable Differential Evolutionary Clustering Algorithm for Big Data Using Map-Reduce Paradigm

International Journal of Applied Metaheuristic Computing ◽

10.4018/ijamc.2017010103 ◽

2017 ◽

Vol 8 (1) ◽

pp. 45-60 ◽

Cited By ~ 4

Author(s):

Zakaria Benmounah ◽

Souham Meshoul ◽

Mohamed Batouche

Keyword(s):

Differential Evolution ◽

Clustering Algorithm ◽

Programming Model ◽

Differential Evolution Algorithm ◽

Map Reduce ◽

Data Sets ◽

Traditional Use ◽

Evolutionary Clustering ◽

De Algorithm ◽

Large Sets

One of the remarkable results of the rapid advances in information technology is the production of tremendous amounts of data sets, so large or complex that available processing methods are inadequate, among these methods cluster analysis. Clustering becomes more challenging and complex. In this paper, the authors describe a highly scalable Differential Evolution (DE) algorithm based on map-reduce programming model. The traditional use of DE to deal with clustering of large sets of data is so time-consuming that it is not feasible. On the other hand, map-reduce is a programming model emerged lately to allow the design of parallel and distributed approaches. In this paper, four stages map-reduce differential evolution algorithm termed as DE-MRC is presented; each of these four phases is a map-reduce process and dedicated to a particular DE operation. DE-MRC has been tested on a real parallel platform of 128 computers connected with each other and more than 30 GB of data. Experimental results show the high scalability and robustness of DE-MRC.

Download Full-text

Near Duplicated Text Detection Based on MapReduce

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.427-429.2618 ◽

2013 ◽

Vol 427-429 ◽

pp. 2618-2621 ◽

Cited By ~ 1

Author(s):

Ling Shen ◽

Qing Xi Peng

Keyword(s):

Large Scale ◽

Programming Model ◽

Linear Time ◽

Large Data ◽

Text Detection ◽

Experimental Result ◽

Data Sets ◽

Document Collections ◽

Document Similarity ◽

Map Function

As the emerging date intensive applications have received more and more attentions from researchers, its a severe challenge for near duplicated text detection for large scale data. This paper presents an algorithm based on MapReduce and ontology for near duplicated text detection via computing pair document similarity in large scale document collections. We mapping the words in the document to the synonym and then calculate the similarity between them. MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key /value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. In large scale test, experimental result demonstrates that this approach outperforms other state of the art solutions. Many advantages such as linear time and accuracy make the algorithm valuable in actual practice.

Download Full-text