Weather Data Analytics Using Hadoop with Map-Reduce

Author(s):  
Priyanka Dinesh More ◽  
Sunita Nandgave ◽  
Megha Kadam
2018 ◽  
Vol 7 (1) ◽  
pp. 113-116
Author(s):  
Alaa Hussein Al-Hamami ◽  
Ali Adel Flayyih

Database is defined as a set of data that is organized and distributed in a manner that permits the user to access the data being stored in an easy and more convenient manner. However, in the era of big-data the traditional methods of data analytics may not be able to manage and process the large amount of data. In order to develop an efficient way of handling big-data, this work enhances the use of Map-Reduce technique to handle big-data distributed on the cloud. This approach was evaluated using Hadoop server and applied on Electroencephalogram (EEG) Big-data as a case study. The proposed approach showed clear enhancement on managing and processing the EEG Big-data with average of 50% reduction on response time. The obtained results provide EEG researchers and specialist with an easy and fast method of handling the EEG big data.


2019 ◽  
Vol 8 (S3) ◽  
pp. 35-40
Author(s):  
S. Mamatha ◽  
T. Sudha

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.


Author(s):  
K.G. Rani Roopha Devi ◽  
◽  
R. Mahendra Chozhan ◽  
M Karthika

2021 ◽  
Vol 10 (4) ◽  
pp. 0-0

Big Data Analytics is an innovative approach for extracting the data from a huge volume of data warehouse systems. It reveals the method to compress the high volume of data into clusters by MapReduce and HDFS. However, the data processing has taken more time for extract and store in Hadoop clusters. The proposed system deals with the challenges of time delay in shuffle phase of map-reduce due to scheduling and sequencing. For improving the speed of big data, this proposed work using the Compressed Elastic Search Index (CESI) and MapReduce-Based Next Generation Sequencing Approach (MRBNGSA). This approach helps to increase the speed of data retrieval from HDFS clusters because of the way it is stored in that. this method is stored only the metadata in HDFS which takes less memory during runtime compare to big data due to the volume of data stored in HDFS. This approach is reduces the CPU utilization and memory allocation of the resource manager in Hadoop Framework and imroves data processing speed, such a way that time delay has to be reduced with minimum latency.


Data analytics (DA) is the job of reviewing datasets in order to frame conclusions about the information they have, increasingly using specialized systems and software. As with the emergence of Big Data, data analytics was needed. The problems that we are considering are going to be in a fraud detection application. Where we'll considering major aspects such application-independent format(XML/JSON) for the clusterization process based on the no label classification algorithm where we will focusing on the clusters to enhance the oversampling process and utilize the merits of parallel computing to speed up our system. We aim to use MapReduce functionality in our application and deploy it on Amazon AWS. Datasets gathered for studies often comprise millions of records and can carry hard-to-detect concealed pitfalls. In this paper, we are working on two datasets. The first one is a medical dataset and the second one is a customer dataset. Big Data Analytics is the suggested solution in this day and age, with growing demands for analyzing huge information sets and performing the required processing on complicated data structures. The problem faced at the moment is mainly, how to store and analyze the large amount of data which is generated from heterogeneous sources like social media and what to use to make data fast accessible as well as in pocket budget. To resolve all problems Map-Reduce framework is useful-by offering an integrated technique towards machine learning, it speeds up processing. In this, we will explore the LEOS algorithm, SVM, MapReduce and JOSE algorithm, their requirements, their benefits, their disadvantages, difficulties, and their corresponding solutions.


Sign in / Sign up

Export Citation Format

Share Document