Feasible study of K-mean and K-medoids for analysis of Hadoop mapreduce framework for big data

In this century big data manipulation is a challenging task in the field of web mining because content of web data is massively increasing day by day. Using search engine retrieving efficient, relevant and meaningful information from massive amount of Web Data is quite impossible. Different search engine uses different ranking algorithm to retrieve relevant information easily. A new page ranking algorithm is presented based on synonymous word count using Hadoop MapReduce framework named as Similarity Measurement Technique (SMT). Hadoop MapReduce framework is used to partition Big Data and provides a scalable, economical and easier way to process these data. It stores intermediate result for running iterative jobs in the local disk. In this algorithm, SMT takes a query from user and parse it using Hadoop and calculate rank of web pages. For experimental purpose wiki data file have been used and applied page rank algorithm (PR), improvised page rank algorithm (IPR) and proposed SMT method to calculate page rank of all web pages and compare among these methods. Proposed method provides better scoring accuracy than other approaches and reduces theme drift problem.

Download Full-text

An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data

2016 International Conference on Micro-Electronics and Telecommunication Engineering (ICMETE) ◽

10.1109/icmete.2016.64 ◽

2016 ◽

Cited By ~ 8

Author(s):

Subhash Chandra ◽

Deepak Motwani

Keyword(s):

Big Data ◽

Mapreduce Framework ◽

Hadoop Mapreduce

Download Full-text

Hadoop Mapreduce Framework in Big Data Analytics

International Journal of Computer Trends and Technology ◽

10.14445/22312803/ijctt-v8p121 ◽

2014 ◽

Vol 8 (3) ◽

pp. 115-119

Author(s):

Vidyullatha Pellakuri ◽

◽

Dr.D. Rajeswara Rao

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Mapreduce Framework ◽

Hadoop Mapreduce

Download Full-text

Effective Feature Selection Using Hybrid GA-EHO for Classifying Big Data SIoT

International Journal of Web Portals ◽

10.4018/ijwp.2020010102 ◽

2020 ◽

Vol 12 (1) ◽

pp. 12-25

Author(s):

Iyapparaja M ◽

Deva Arul S

Keyword(s):

Feature Selection ◽

Big Data ◽

Gabor Filter ◽

Mapreduce Framework ◽

Machine Learning Classifiers ◽

Hadoop Mapreduce ◽

The Social ◽

Social Internet Of Things ◽

Simulation Results ◽

Novel Applications

Several novel applications and services of networking for the IoT are supported by the Social Internet of Things (SIoT) in a more productive and powerful way. SIoTs are the recent hot topics rather than other extensions of IoTs. In this research, the authors have extracted the Big Data SIoT using the well-known model named MapReduce framework. Moreover, the unwanted data and noise from the database are reduced using the Gabor filter, and the big databases are mapped and reduced using the Hadoop MapReduce (HMR) technique for improving the efficiency of the proposed GA-EHO. Furthermore, the feature selection using GA-EHO is processed on the filtered dataset. The implementation of the proposed system is done by using some machine learning classifiers for classifying the data and the efficiency is predicted for the proposed work. From the simulation results, the specificity, maximum accuracy, and sensitivity of the proposed GA-EHO are produced about 87.88%, 99.1%, and 81%. Also, the results are compared with other existing techniques.

Download Full-text

Big Data Analytics with Apache Hadoop MapReduce Framework

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i26/93418 ◽

2016 ◽

Vol 9 (26) ◽

Cited By ~ 3

Author(s):

L . Greeshma ◽

G. Pradeepini

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Mapreduce Framework ◽

Apache Hadoop ◽

Hadoop Mapreduce

Download Full-text

CNB-MRF: Adapting Correlative Naive Bayes Classifier and MapReduce Framework for Big Data Classification

International Review on Computers and Software (IRECOS) ◽

10.15866/irecos.v11i11.10116 ◽

2016 ◽

Vol 11 (11) ◽

pp. 1007 ◽

Cited By ~ 3

Author(s):

Chitrakant Banchhor ◽

N. Srinivasu

Keyword(s):

Big Data ◽

Naive Bayes ◽

Data Classification ◽

Naïve Bayes ◽

Naive Bayes Classifier ◽

Bayes Classifier ◽

Naïve Bayes Classifier ◽

Mapreduce Framework ◽

Big Data Classification

Download Full-text

Budget Constraint Scheduler for Big Data Using Hadoop MapReduce

SN Computer Science ◽

10.1007/s42979-021-00638-0 ◽

2021 ◽

Vol 2 (4) ◽

Author(s):

D. C. Vinutha ◽

G. T. Raju

Keyword(s):

Big Data ◽

Budget Constraint ◽

Hadoop Mapreduce

Download Full-text

Computational storage: an efficient and scalable platform for big data and HPC applications

Journal Of Big Data ◽

10.1186/s40537-019-0265-5 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 2

Author(s):

Mahdi Torabzadehkashi ◽

Siavash Rezaei ◽

Ali HeydariGorji ◽

Hosein Bobarshad ◽

Vladimir Alves ◽

...

Keyword(s):

Big Data ◽

High Performance ◽

Distributed Processing ◽

Data Access ◽

Distributed Applications ◽

Process Data ◽

Storage Devices ◽

Hadoop Mapreduce ◽

Big Data Applications ◽

Application Processor

AbstractIn the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Download Full-text

Efficient indexing and retrieval of patient information from the big data using MapReduce framework and optimisation

Journal of Information Science ◽

10.1177/01655515211013708 ◽

2021 ◽

pp. 016555152110137

Author(s):

N.R. Gladiss Merlin ◽

Vigilson Prem. M

Keyword(s):

Big Data ◽

Similarity Measure ◽

Patient Information ◽

Complex Data ◽

Mapreduce Framework ◽

Maximum Value ◽

User Query ◽

Indexing And Retrieval ◽

Sine Cosine Algorithm ◽

Disparate Source

Large and complex data becomes a valuable resource in biomedical discovery, which is highly facilitated to increase the scientific resources for retrieving the helpful information. However, indexing and retrieving the patient information from the disparate source of big data is challenging in biomedical research. Indexing and retrieving the patient information from big data is performed using the MapReduce framework. In this research, the indexing and retrieval of information are performed using the proposed Jaya-Sine Cosine Algorithm (Jaya–SCA)-based MapReduce framework. Initially, the input big data is forwarded to the mapper randomly. The average of each mapper data is calculated, and these data are forwarded to the reducer, where the representative data are stored. For each user query, the input query is matched with the reducer, and thereby, it switches over to the mapper for retrieving the matched best result. The bilevel matching is performed while retrieving the data from the mapper based on the distance between the query. The similarity measure is computed based on the parametric-enabled similarity measure (PESM), cosine similarity and the proposed Jaya–SCA, which is the integration of the Jaya algorithm and the SCA. Moreover, the proposed Jaya–SCA algorithm attained the maximum value of F-measure, recall and precision of 0.5323, 0.4400 and 0.6867, respectively, using the StatLog Heart Disease dataset.

Download Full-text