Examining Heterogeneity Structured on a Large Data Volume with Minimal Incompleteness

Nahla Aljojo

doi:10.14500/aro.10857

Examining Heterogeneity Structured on a Large Data Volume with Minimal Incompleteness

ARO-The Scientific Journal of Koya University ◽

10.14500/aro.10857 ◽

2021 ◽

Vol 9 (2) ◽

pp. 30-37

Author(s):

Nahla Aljojo

Keyword(s):

Big Data ◽

Data Analytics ◽

Historical Data ◽

Big Data Analytics ◽

Large Data ◽

Heterogeneous Data ◽

Twitter Data ◽

Data Volume ◽

Hadoop Platform ◽

Transaction Pattern

While Big Data analytics can provide a variety of benefits, processing heterogeneous data comes with its own set of limitations. A transaction pattern must be studied independently while working with Bitcoin data, this study examines twitter data related to Bitcoin and investigate communications pattern on bitcoin transactional tweet. Using the hashtags #Bitcoin or #BTC on Twitter, a vast amount of data was gathered, which was mined to uncover a pattern that everyone either (speculators, teaches, or the stakeholders) uses on Twitter to discuss Bitcoin transactions. This aim is to determine the direction of Bitcoin transaction tweets based on historical data. As a result, this research proposes using Big Data analytics to track Bitcoin transaction communications in tweets in order to discover a pattern. Hadoop platform MapReduce was used. The finding indicate that In the map step of the procedure, Hadoop's tokenize the dataset and parse them to the mapper where thirteen patterns were established and reduced to three patterns using the attributes previously stored data in the Hadoop context, one of which is the Emoji data that was left out in previous research discussions, but the text is only one piece of the puzzle on bitcoin transaction interaction, and the key part of it is “No certainty, only possibilities” in Bitcoin transactions

Download Full-text

Big Data Analytics in Aquaculture Using Hive and Hadoop Platform

Exploring the Convergence of Big Data and the Internet of Things - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-2947-7.ch002 ◽

2018 ◽

pp. 29-35

Author(s):

P. Venkateswara Rao ◽

A. Ramamohan Reddy ◽

V. Sucharita

Keyword(s):

Big Data ◽

Data Management ◽

Single Machine ◽

Data Analytics ◽

Historical Data ◽

Big Data Analytics ◽

Huge Amount ◽

Hadoop Platform ◽

Different Sources

In the field of Aquaculture with the help of digital advancements huge amount of data is constantly produced for which the data of the aquaculture has entered in the big data world. The requirement for data management and analytics model is increased as the development progresses. Therefore, all the data cannot be stored on single machine. There is need for solution that stores and analyzes huge amounts of data which is nothing but Big Data. In this chapter a framework is developed that provides a solution for shrimp disease by using historical data based on Hive and Hadoop. The data regarding shrimps is acquired from different sources like aquaculture websites, various reports of laboratory etc. The noise is removed after the collection of data from various sources. Data is to be uploaded on HDFS after normalization is done and is to be put in a file that supports Hive. Finally classified data will be located in particular place. Based on the features extracted from aquaculture data, HiveQL can be used to analyze shrimp diseases symptoms.

Download Full-text

Big Data Analytics in Healthcare using Machine Learning Algorithms: A Comparative Study

International Journal of Online and Biomedical Engineering (iJOE) ◽

10.3991/ijoe.v16i13.18609 ◽

2020 ◽

Vol 16 (13) ◽

pp. 19

Author(s):

Sai Hanuman Akundi ◽

Soujanya R ◽

Madhuri PM

Keyword(s):

Machine Learning ◽

Big Data ◽

Comparative Study ◽

Data Analytics ◽

Learning Algorithms ◽

Big Data Analytics ◽

Large Data ◽

Heterogeneous Data ◽

Machine Learning Algorithms ◽

Healthcare Sector

In recent years vast quantities of data have been managed in various ways of medical applications and multiple organizations worldwide have developed this type of data and, together, these heterogeneous data are called big data. Data with other characteristics, quantity, speed and variety are the word big data. The healthcare sector has faced the need to handle the large data from different sources, renowned for generating large amounts of heterogeneous data. We can use the Big Data analysis to make proper decision in the health system by tweaking some of the current machine learning algorithms. If we have a large amount of knowledge that we want to predict or identify patterns, master learning would be the way forward. In this article, a brief overview of the Big Data, functionality and ways of Big data analytics are presented, which play an important role and affect healthcare information technology significantly. Within this paper we have presented a comparative study of algorithms for machine learning. We need to make effective use of all the current machine learning algorithms to anticipate accurate outcomes in the world of nursing.

Download Full-text

Research Challenges in Big Data Analytics

Decision Management ◽

10.4018/978-1-5225-1837-2.ch006 ◽

2017 ◽

pp. 83-99

Author(s):

Sivamathi Chokkalingam ◽

Vijayarani S.

Keyword(s):

Big Data ◽

Data Analytics ◽

Large Scale ◽

New Technologies ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Data Types ◽

Customer Preferences ◽

Research Challenges

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.

Download Full-text

Big data analytics for proactive industrial decision support

atp magazin ◽

10.17560/atp.v58i09.580 ◽

2016 ◽

Vol 58 (09) ◽

pp. 62 ◽

Cited By ~ 10

Author(s):

Martin Atzmueller ◽

Benjamin Klöpper ◽

Hassan Al Mawla ◽

Benjamin Jäschke ◽

Martin Hollender ◽

...

Keyword(s):

Big Data ◽

Data Analytics ◽

Historical Data ◽

Big Data Analytics ◽

Lessons Learned ◽

Support Functions ◽

Big Data Applications ◽

Big Data Technologies ◽

Process Plants ◽

Plant Data

Big data technologies offer new opportunities for analyzing historical data generated by process plants. The development of new types of operator support systems (OSS) which help the plant operators during operations and in dealing with critical situations is one of these possibilities. The project FEE has the objective to develop such support functions based on big data analytics of historical plant data. In this contribution, we share our first insights and lessons learned in the development of big data applications and outline the approaches and tools that we developed in the course of the project.

Download Full-text

A Survey on Big Data Analytics Using HADOOP

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.s3.2091 ◽

2019 ◽

Vol 8 (S3) ◽

pp. 35-40

Author(s):

S. Mamatha ◽

T. Sudha

Keyword(s):

Big Data ◽

Social Networking Sites ◽

Data Analytics ◽

Business Processes ◽

Big Data Analytics ◽

Large Data ◽

Structured Data ◽

Map Reduce ◽

Data Set ◽

Digital World

In this digital world, as organizations are evolving rapidly with data centric asset the explosion of data and size of the databases have been growing exponentially. Data is generated from different sources like business processes, transactions, social networking sites, web servers, etc. and remains in structured as well as unstructured form. The term ― Big data is used for large data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Big data varies in size ranging from a few dozen terabytes to many petabytes of data in a single data set. Difficulties include capture, storage, search, sharing, analytics and visualizing. Big data is available in structured, unstructured and semi-structured data format. Relational database fails to store this multi-structured data. Apache Hadoop is efficient, robust, reliable and scalable framework to store, process, transforms and extracts big data. Hadoop framework is open source and fee software which is available at Apache Software Foundation. In this paper we will present Hadoop, HDFS, Map Reduce and c-means big data algorithm to minimize efforts of big data analysis using Map Reduce code. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools and related fields.

Download Full-text

Big Data Analytics in Online Structural Health Monitoring

International Journal of Prognostics and Health Management ◽

10.36001/ijphm.2016.v7i4.2462 ◽

2020 ◽

Vol 7 (4) ◽

Author(s):

Guowei Cai ◽

Sankaran Mahadevan

Keyword(s):

Big Data ◽

Structural Health Monitoring ◽

Health Monitoring ◽

Data Analytics ◽

Structural Damage ◽

Big Data Analytics ◽

High Volume ◽

Heterogeneous Data ◽

Sensor Technology ◽

Structural Health

This manuscript explores the application of big data analytics in online structural health monitoring. As smart sensor technology is making progress and low cost online monitoring is increasingly possible, large quantities of highly heterogeneous data can be acquired during the monitoring, thus exceeding the capacity of traditional data analytics techniques. This paper investigates big data techniques to handle the highvolume data obtained in structural health monitoring. In particular, we investigate the analysis of infrared thermal images for structural damage diagnosis. We explore the MapReduce technique to parallelize the data analytics and efficiently handle the high volume, high velocity and high variety of information. In our study, MapReduce is implemented with the Spark platform, and image processing functions such as uniform filter and Sobel filter are wrapped in the mappers. The methodology is illustrated with concrete slabs, using actual experimental data with induced damage

Download Full-text

Modern Health Management With Cognitive Computing and Big Data Analytics

Cognitive Computing in Technology-Enhanced Learning - Advances in Educational Technologies and Instructional Design ◽

10.4018/978-1-5225-9031-6.ch010 ◽

2019 ◽

pp. 206-224

Author(s):

Mamata Rath

Keyword(s):

Big Data ◽

Data Analytics ◽

Health Management ◽

Big Data Analytics ◽

Large Data ◽

Data Sets ◽

Cognitive Computing ◽

Business Information ◽

Data Querying ◽

Data Elements

Big data analytics is an refined advancement for fusion of large data sets that include a collection of data elements to expose hidden prototype, undetected associations, showcase business logic, client inclinations, and other helpful business information. Big data analytics involves challenging techniques to mine and extract relevant data that includes the actions of penetrating a database, effectively mining the data, querying and inspecting data committed to enhance the technical execution of various task segments. The capacity to synthesize a lot of data can enable an association to manage impressive data that can influence the business. In this way, the primary goal of big data analytics is to help business relationship to have enhanced comprehension of data and, subsequently, settle on proficient and educated decisions.

Download Full-text

Big Data Analytics in Intra-Data Center Networks and Components of Data Mining

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206272 ◽

2016 ◽

pp. 82-89

Author(s):

Pushpa Mannava

Keyword(s):

Data Mining ◽

Big Data ◽

Data Center ◽

Data Analytics ◽

Big Data Analytics ◽

Large Data ◽

Data Center Networks ◽

Advanced Analytics ◽

Scalable Design ◽

Data Collections

Data mining is considered as a vital procedure as it is used for locating brand-new, legitimate, useful as well as reasonable kinds of data. The assimilation of data mining methods in cloud computing gives a versatile and also scalable design that can be made use of for reliable mining of significant quantity of data from virtually incorporated data resources with the goal of creating beneficial information which is useful in decision making. The procedure of removing concealed, beneficial patterns, as well as useful info from big data is called big data analytics. This is done via using advanced analytics techniques on large data collections. This paper provides the information about big data analytics in intra-data center networks, components of data mining and also techniques of Data mining.

Download Full-text

A Long Short Term Memory with Peephole Connections and Generative Adversarial Network Based Collaborative Methodology to Identify Outliers in ECG Dataset

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9273 ◽

2020 ◽

Vol 17 (8) ◽

pp. 3798-3803

Author(s):

M. D. Anto Praveena ◽

B. Bharathi

Keyword(s):

Time Series ◽

Big Data ◽

Data Analytics ◽

Time Series Data ◽

Short Term Memory ◽

Big Data Analytics ◽

Data Preprocessing ◽

Heterogeneous Data ◽

Series Data ◽

Outlier Identification

Big Data analytics has become an upward field, and it plays a pivotal role in Healthcare and research practices. Big data analytics in healthcare cover vast numbers of dynamic heterogeneous data integration and analysis. Medical records of patients include several data including medical conditions, medications and test findings. One of the major challenges of analytics and prediction in healthcare is data preprocessing. In data preprocessing the outlier identification and correction is the important challenge. Outliers are exciting values that deviates from other values of the attribute; they may simply experimental errors or novelty. Outlier identification is the method of identifying data objects with somewhat different behaviors than expectations. Detecting outliers in time series data is different from normal data. Time series data are the data that are in a series of certain time periods. This kind of data are identified and cleared to bring the quality dataset. In this proposed work a hybrid outlier detection algorithm extended LSTM-GAN is helped to recognize the outliers in time series data. The outcome of the proposed extended algorithm attained better enactment in the time series analysis on ECG dataset processing compared with traditional methodologies.

Download Full-text

Big Data Analytics Recommendation Solutions for Crop Disease using Hive and Hadoop Platform

Indian Journal of Science and Technology ◽

10.17485/ijst/2016/v9i32/100728 ◽

2016 ◽

Vol 9 (32) ◽

Cited By ~ 9

Author(s):

Raghu Garg ◽

Himanshu Aggarwal

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Crop Disease ◽

Hadoop Platform

Download Full-text