EFFECTIVE AND EFFICIENT WAY OF REDUCE DEPENDENCY ON DATASET WITH THE HELP OF MAPREDUCE ON BIG DATA

Author(s):  
Satish Londhe ◽  
Smita Mahajan

With the fast development of networks these days organizations has overflowing with the collection of millions of data with big number of combination. This big data challenges over trade troubles. It requires more analysis for the high-performance procedure. The new method of hadoop and MapReduce methods are discussed starting the data mining standpoint. In the proposed research work we have to progress performance through parallelization of different operations such as loading the information, index building and evaluating the queries. Thus the performance analysis is completed with the minimum of three nodes with in the Amazon cloud environment. Hbase is a open source, non-relational and distributed database model. It executes on the pinnacle of Hadoop. It consists of a single key with multiple values. Looping is avoid in retrieving a meticulous data from huge datasets and it consume less amount of time for execute the data. HDFS file system is used to store the data after performing arts the map reduces operations and the execution time is decreased when the amount of nodes gets increased. The performance analysis is tuned with the parameters such as the carrying out complexity.

2021 ◽  
Vol 26 (1) ◽  
pp. 67-77
Author(s):  
Siva Sankari Subbiah ◽  
Jayakumar Chinnappan

Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.


2020 ◽  
Vol 12 (21) ◽  
pp. 9255
Author(s):  
Madhubala Ganesan ◽  
Ah-Lian Kor ◽  
Colin Pattinson ◽  
Eric Rondeau

Internet of Things (IoT) coupled with big data analytics is emerging as the core of smart and sustainable systems which bolsters economic, environmental and social sustainability. Cloud-based data centers provide high performance computing power to analyze voluminous IoT data to provide invaluable insights to support decision making. However, multifarious servers in data centers appear to be the black hole of superfluous energy consumption that contributes to 23% of the global carbon dioxide (CO2) emissions in ICT (Information and Communication Technology) industry. IoT-related energy research focuses on low-power sensors and enhanced machine-to-machine communication performance. To date, cloud-based data centers still face energy–related challenges which are detrimental to the environment. Virtual machine (VM) consolidation is a well-known approach to affect energy-efficient cloud infrastructures. Although several research works demonstrate positive results for VM consolidation in simulated environments, there is a gap for investigations on real, physical cloud infrastructure for big data workloads. This research work addresses the gap of conducting real physical cloud infrastructure-based experiments. The primary goal of setting up a real physical cloud infrastructure is for the evaluation of dynamic VM consolidation approaches which include integrated algorithms from existing relevant research. An open source VM consolidation framework, Openstack NEAT is adopted and experiments are conducted on a Multi-node Openstack Cloud with Apache Spark as the big data platform. Open sourced Openstack has been deployed because it enables rapid innovation, and boosts scalability as well as resource utilization. Additionally, this research work investigates the performance based on service level agreement (SLA) metrics and energy usage of compute hosts. Relevant results concerning the best performing combination of algorithms are presented and discussed.


The Speedy development of Internet has led to huge quantities of digital data available online and vast capacity of digital data is increasing and successfully stored. In demand to the process, analyzed, and linked huge volume of stored data to achieve correct Information, some computation is required. Even efficient processing and implementation is needed for scientific data performance analysis. We will compare with already existing MapReduce Technique with Hadoop to afford high performance and efficiency for large volume of dataset. Hadoop distributed architecture with MapReduce programming is analysis here.


2020 ◽  
Vol 3 (2) ◽  
pp. 245-254
Author(s):  
Firuza Tahmazli-Khaligova ◽  

In a traditional High Performance Computing system, it is possible to process a huge data volume. The nature of events in classic High Performance computing is static. In Distributed Exa-scale System has a different nature. The processing Big data in a distributed exascale system evokes a new challenge. The dynamic and interactive character of a distributed exascale system changes processes status and system elements. This paper discusses the challenge that Big data attributes: volume, velocity, variety, how they influence distributed exascale system dynamic and interactive nature. While investigating the effect of the Dynamic and Interactive nature of exascale systems in computing Big data, this research work suggests the Markov chains model. This model suggests the transition matrix, which identifies system status and memory sharing. It lets us analyze the two systems convergence. As a result in both systems are explored by the influence of each other.


Author(s):  
. Monika ◽  
Pardeep Kumar ◽  
Sanjay Tyagi

In Cloud computing environment QoS i.e. Quality-of-Service and cost is the key element that to be take care of. As, today in the era of big data, the data must be handled properly while satisfying the request. In such case, while handling request of large data or for scientific applications request, flow of information must be sustained. In this paper, a brief introduction of workflow scheduling is given and also a detailed survey of various scheduling algorithms is performed using various parameter.


2005 ◽  
Vol 29 (4) ◽  
pp. 507-517
Author(s):  
Alex Ellery ◽  
Lutz Richter ◽  
Reinhold Bertrand

The European Space Agency’s (ESA) ExoMars rover has recently been subject to a Phase A study led by EADS Astrium, UK. This rover mission represents a highly ambitious venture in that the rover is of considerable size ~200+kg with high mobility carrying a highly complex scientific instrument suite (Pasteur) of up to 40 kg in mass devoted to exobiological investigation of the Martian surface and sub-surface. The chassis design has been a particular challenge given the inhospitable terrain on Mars and the need to traverse such terrain robustly in order to deliver the scientific instruments to science targets of exobiological interest, We present some of the results and design issues encountered during the Phase A study related to the chassis. In particular, we have focussed on the overall tractive performance of a number of candidate chassis designs and selected the RCL (Science & Technology Rover Company Ltd in Russian) concept C design as the baseline option in terms of high performance with minimal mechanical complexity overhead. This design is a six-wheeled double-rocker bogie design to provide springless suspension and maintain approximately equal weight distribution across each wheel.


Symmetry ◽  
2021 ◽  
Vol 13 (2) ◽  
pp. 317
Author(s):  
Chithambaramani Ramalingam ◽  
Prakash Mohan

The increasing demand for cloud computing has shifted business toward a huge demand for cloud services, which offer platform, software, and infrastructure for the day-to-day use of cloud consumers. Numerous new cloud service providers have been introduced to the market with unique features that assist service developers collaborate and migrate services among multiple cloud service providers to address the varying requirements of cloud consumers. Many interfaces and proprietary application programming interfaces (API) are available for migration and collaboration services among cloud providers, but lack standardization efforts. The target of the research work was to summarize the issues involved in semantic cloud portability and interoperability in the multi-cloud environment and define the standardization effort imminently needed for migrating and collaborating services in the multi-cloud environment.


Sign in / Sign up

Export Citation Format

Share Document