Development of a Holistic Prototype Hadoop System for Big Data Handling

Spatial queries frequently used in Hadoop for significant data process. However, vast and massive size of spatial information makes it difficult to process the spatial inquiries proficiently, so they utilized the Hadoop system for process Big Data. We have used Boolean Queries & Geometry Boolean Spatial Data for Query Optimization using Hadoop System. In this paper, we show a lightweight and adaptable spatial data index for big data which will process in Hadoop frameworks. Results demonstrate the proficiency and adequacy of our spatial ordering system for various spatial inquiries.

IoT Based Decision Making System to Improve Veracity of Big Data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.1.16799 ◽

2018 ◽

Vol 7 (3.1) ◽

pp. 63 ◽

Author(s):

R Revathy ◽

R Aroul Canessane

Keyword(s):

Decision Making ◽

Big Data ◽

Internet Of Things ◽

Data Streams ◽

Data Cleaning ◽

Data Handling ◽

Secure Data ◽

Decision Making System ◽

Data Investigation ◽

Impact Data

Data are vital to help decision making. On the off chance that data have low veracity, choices are not liable to be sound. Internet of Things (IoT) quality rates big data with error, irregularity, deficiency, trickery, and model guess. Improving data veracity is critical to address these difficulties. In this article, we condense the key qualities and difficulties of IoT, which impact data handling and decision making. We audit the scene of estimating and upgrading data veracity and mining indeterminate data streams. Also, we propose five suggestions for future advancement of veracious big IoT data investigation that are identified with the heterogeneous and appropriated nature of IoT data, self-governing basic leadership, setting mindful and area streamlined philosophies, data cleaning and handling procedures for IoT edge gadgets, and protection safeguarding, customized, and secure data administration.

Biomedical Cyber-Physical Systems in the Light of Database as a Service (DBaaS) Paradigm

Medical Technologies Journal ◽

10.26415/2572-004x-vol4iss3p577-577 ◽

2020 ◽

Vol 4 (3) ◽

pp. 577-577

Author(s):

Vania V Estrela

Keyword(s):

Big Data ◽

Data Warehouse ◽

Query Language ◽

Treatment Decisions ◽

Cyber Physical Systems ◽

Detailed Knowledge ◽

Data Handling ◽

Search Query ◽

Physical Systems ◽

Database As A Service

Background: A database (DB) to store indexed information about drug delivery, test, and their temporal behavior is paramount in new Biomedical Cyber-Physical Systems (BCPSs). The term Database as a Service (DBaaS) means that a corporation delivers the hardware, software, and other infrastructure required by companies to operate their databases according to their demands instead of keeping an internal data warehouse. Methods: BCPSs attributes are presented and discussed. One needs to retrieve detailed knowledge reliably to make adequate healthcare treatment decisions. Furthermore, these DBs store, organize, manipulate, and retrieve the necessary data from an ocean of Big Data (BD) associated processes. There are Search Query Language (SQL), and NoSQL DBs. Results: This work investigates how to retrieve biomedical-related knowledge reliably to make adequate healthcare treatment decisions. Furthermore, Biomedical DBaaSs store, organize, manipulate, and retrieve the necessary data from an ocean of Big Data (BD) associated processes. Conclusion: A NoSQL DB allows more flexibility with changes while the BCPSs are running, which allows for queries and data handling according to the context and situation. A DBaaS must be adaptive and permit the DB management within an extensive variety of distinctive sources, modalities, dimensionalities, and data handling according to conventional ways.

Using Hadoop Technology to Overcome Big Data Problems by Choosing Proposed Cost-efficient Scheduler Algorithm for Heterogeneous Hadoop System (BD3)

Journal of Scientific Research and Reports ◽

10.9734/jsrr/2020/v26i930310 ◽

2020 ◽

pp. 58-84

Author(s):

Abou_el_ela Abdou Hussein

Keyword(s):

Big Data ◽

Data Processing ◽

Data Storage ◽

Database Management System ◽

Data Sets ◽

Complex Data ◽

Daily Data ◽

Complex Data Sets ◽

Cost Efficient ◽

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.

The Acquisition and Management of Healthcare Data, Within a Hospital Infrastructure

Volume 2B: Advanced Manufacturing ◽

10.1115/imece2020-24120 ◽

2020 ◽

Author(s):

James Darby-Taylor ◽

Fernando Luís-Ferreira ◽

João Sarraipa ◽

Ricardo Jardim-Goncalves

Keyword(s):

Big Data ◽

Clinical Practice ◽

Big Data Analysis ◽

Data Handling ◽

Healthcare Data ◽

Health And Wellbeing ◽

Clinical Procedures ◽

Legal Constraints ◽

Hospital Practices

Abstract The quality of care provided to citizens by professionals and institutions depends on the quality and availability of information. Early commencement of treatment and medication, and the decisions on how to proceed, depend a lot on patients’ data in the different modalities available. It is also important to notice that large pools of data help inform health and wellbeing parameters for the largest possible community. To make that possible it is necessary both to have the best hospital practices but also to get consent and collaboration from patients. In order to accomplish such a goal, it is necessary to use practices, which adhere to legal constraints and are transparent while handling data and also to transmit those practices and protocols to professionals and patients. The present document aims to provide a framework envisaging the seamless application of the clinical procedures, following legal guidance and making the process known, secure and trustworthy. It aims to contribute to clinical practice, and clinical research, thereby contributing to big data analysis by ensuring trust and best clinical data handling.

A novel performance aware real-time data handling for big data platforms on Lambda architecture

International Journal of Computer Aided Engineering and Technology ◽

10.1504/ijcaet.2018.092840 ◽

2018 ◽

Vol 10 (4) ◽

pp. 418 ◽

Author(s):

Rizwan Patan ◽

M. Rajasekhara Babu

Keyword(s):

Big Data ◽

Real Time ◽

Data Handling ◽

Time Data ◽

Lambda Architecture ◽

Real Time Data

Hadoop Performance Analysis Model with Deep Data Locality

Information ◽

10.3390/info10070222 ◽

2019 ◽

Vol 10 (7) ◽

pp. 222 ◽

Author(s):

Sungchul Lee ◽

Ju-Yeon Jo ◽

Yoohwan Kim

Keyword(s):

Big Data ◽

Performance Analysis ◽

Data Locality ◽

Performance Model ◽

Data System ◽

Analysis Model ◽

Physical Test ◽

Data Movement ◽

Hadoop Distributed File System ◽

Background: Hadoop has become the base framework on the big data system via the simple concept that moving computation is cheaper than moving data. Hadoop increases a data locality in the Hadoop Distributed File System (HDFS) to improve the performance of the system. The network traffic among nodes in the big data system is reduced by increasing a data-local on the machine. Traditional research increased the data-local on one of the MapReduce stages to increase the Hadoop performance. However, there is currently no mathematical performance model for the data locality on the Hadoop. Methods: This study made the Hadoop performance analysis model with data locality for analyzing the entire process of MapReduce. In this paper, the data locality concept on the map stage and shuffle stage was explained. Also, this research showed how to apply the Hadoop performance analysis model to increase the performance of the Hadoop system by making the deep data locality. Results: This research proved the deep data locality for increasing performance of Hadoop via three tests, such as, a simulation base test, a cloud test and a physical test. According to the test, the authors improved the Hadoop system by over 34% by using the deep data locality. Conclusions: The deep data locality improved the Hadoop performance by reducing the data movement in HDFS.

Application of Multivariate-Rank-Based Techniques in Clustering of Big Data

Vikalpa The Journal for Decision Makers ◽

10.1177/0256090918804385 ◽

2018 ◽

Vol 43 (4) ◽

pp. 179-190

Author(s):

Pritha Guha

Keyword(s):

Big Data ◽

Dna Analysis ◽

Retail Banking ◽

Data Sets ◽

Complex Data ◽

Data Handling ◽

Credit Risk Management ◽

Statistical Tools ◽

Executive Summary ◽

Complex Data Sets

Executive Summary Very large or complex data sets, which are difficult to process or analyse using traditional data handling techniques, are usually referred to as big data. The idea of big data is characterized by the three ‘v’s which are volume, velocity, and variety ( Liu, McGree, Ge, & Xie, 2015 ) referring respectively to the volume of data, the velocity at which the data are processed and the wide varieties in which big data are available. Every single day, different sectors such as credit risk management, healthcare, media, retail, retail banking, climate prediction, DNA analysis and, sports generate petabytes of data (1 petabyte = 250 bytes). Even basic handling of big data, therefore, poses significant challenges, one of them being organizing the data in such a way that it can give better insights into analysing and decision-making. With the explosion of data in our life, it has become very important to use statistical tools to analyse them.

An Imbalanced Data Handling Framework for Industrial Big Data Using a Gaussian Process Regression-Based Generative Adversarial Network

Symmetry ◽

10.3390/sym12040669 ◽

2020 ◽

Vol 12 (4) ◽

pp. 669 ◽

Author(s):

Eunseo Oh ◽

Hyunsoo Lee

Keyword(s):

Big Data ◽

Gaussian Process ◽

Missing Values ◽

Gaussian Process Regression ◽

Estimation Methods ◽

Data Handling ◽

Generative Adversarial Network ◽

Data Set ◽

Adversarial Network ◽

Industrial Big Data

The developments in the fields of industrial Internet of Things (IIoT) and big data technologies have made it possible to collect a lot of meaningful industrial process and quality-based data. The gathered data are analyzed using contemporary statistical methods and machine learning techniques. Then, the extracted knowledge can be used for predictive maintenance or prognostic health management. However, it is difficult to gather complete data due to several issues in IIoT, such as devices breaking down, running out of battery, or undergoing scheduled maintenance. Data with missing values are often ignored, as they may contain insufficient information from which to draw conclusions. In order to overcome these issues, we propose a novel, effective missing data handling mechanism for the concepts of symmetry principles. While other existing methods only attempt to estimate missing parts, the proposed method generates a whole set of data set using Gaussian process regression and a generative adversarial network. In order to prove the effectiveness of the proposed framework, we examine a real-world, industrial case involving an air pressure system (APS), where we use the proposed method to make quality predictions and compare the results with existing state-of-the-art estimation methods.

2015 IEEE International Conference on Communication Software and Networks (ICCSN) ◽

Research of road disease big data process based on Hadoop system

10.1109/iccsn.2015.7296116 ◽

2015 ◽

Author(s):

YinCheng Liang ◽

Chaoyu Yang ◽

Xinjun Xu

Keyword(s):

Big Data ◽

Data Process ◽