An independent time optimized hybrid infrastructure for big data analytics

Satvik Vats; B. B. Sagar

doi:10.1142/s021798492050311x

An independent time optimized hybrid infrastructure for big data analytics

Modern Physics Letters B ◽

10.1142/s021798492050311x ◽

2020 ◽

Vol 34 (28) ◽

pp. 2050311

Author(s):

Satvik Vats ◽

B. B. Sagar

Keyword(s):

Big Data ◽

Execution Time ◽

Resource Sharing ◽

Data Analytics ◽

Big Data Analytics ◽

Demilitarized Zone ◽

Proposed Model ◽

Hadoop Distributed File System ◽

Improved Performance ◽

Data Independence

In Big data domain, platform dependency can alter the behavior of the business. It is because of the different kinds (Structured, Semi-structured and Unstructured) and characteristics of the data. By the traditional infrastructure, different kinds of data cannot be processed simultaneously due to their platform dependency for a particular task. Therefore, the responsibility of selecting suitable tools lies with the user. The variety of data generated by different sources requires the selection of suitable tools without human intervention. Further, these tools also face the limitation of recourses to deal with a large volume of data. This limitation of resources affects the performance of the tools in terms of execution time. Therefore, in this work, we proposed a model in which different data analytics tools share a common infrastructure to provide data independence and resource sharing environment, i.e. the proposed model shares common (Hybrid) Hadoop Distributed File System (HDFS) between three Name-Node (Master Node), three Data-Node and one Client-node, which works under the DeMilitarized zone (DMZ). To realize this model, we have implemented Mahout, R-Hadoop and Splunk sharing a common HDFS. Further using our model, we run [Formula: see text]-means clustering, Naïve Bayes and recommender algorithms on three different datasets, movie rating, newsgroup, and Spam SMS dataset, representing structured, semi-structured and unstructured, respectively. Our model selected the appropriate tool, e.g. Mahout to run on the newsgroup dataset as other tools cannot run on this data. This shows that our model provides data independence. Further results of our proposed model are compared with the legacy (individual) model in terms of execution time and scalability. The improved performance of the proposed model establishes the hypothesis that our model overcomes the limitation of the resources of the legacy model.

Download Full-text

Performance Evaluation of an Independent Time Optimized Infrastructure for Big Data Analytics that Maintains Symmetry

Symmetry ◽

10.3390/sym12081274 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1274 ◽

Cited By ~ 1

Author(s):

Satvik Vats ◽

Bharat Bhushan Sagar ◽

Karan Singh ◽

Ali Ahmadian ◽

Bruno A. Pansera

Keyword(s):

Machine Learning ◽

Execution Time ◽

Data Analytics ◽

Big Data Analytics ◽

Data Sets ◽

Data Set ◽

Demilitarized Zone ◽

Proposed Model ◽

Hadoop Distributed File System ◽

Selection Of

Traditional data analytics tools are designed to deal with the asymmetrical type of data i.e., structured, semi-structured, and unstructured. The diverse behavior of data produced by different sources requires the selection of suitable tools. The restriction of recourses to deal with a huge volume of data is a challenge for these tools, which affects the performances of the tool’s execution time. Therefore, in the present paper, we proposed a time optimization model, shares common HDFS (Hadoop Distributed File System) between three Name-node (Master Node), three Data-node, and one Client-node. These nodes work under the DeMilitarized zone (DMZ) to maintain symmetry. Machine learning jobs are explored from an independent platform to realize this model. In the first node (Name-node 1), Mahout is installed with all machine learning libraries through the maven repositories. The second node (Name-node 2), R connected to Hadoop, is running through the shiny-server. Splunk is configured in the third node (Name-node 3) and is used to analyze the logs. Experiments are performed between the proposed and legacy model to evaluate the response time, execution time, and throughput. K-means clustering, Navies Bayes, and recommender algorithms are run on three different data sets, i.e., movie rating, newsgroup, and Spam SMS data set, representing structured, semi-structured, and unstructured data, respectively. The selection of tools defines data independence, e.g., Newsgroup data set to run on Mahout as others cannot be compatible with this data. It is evident from the outcome of the data that the performance of the proposed model establishes the hypothesis that our model overcomes the limitation of the resources of the legacy model. In addition, the proposed model can process any kind of algorithm on different sets of data, which resides in its native formats.

Download Full-text

Polystore and Tensor Data Model for Logical Data Independence and Impedance Mismatch in Big Data Analytics

Transactions on Large-Scale Data- and Knowledge-Centered Systems XLII - Lecture Notes in Computer Science ◽

10.1007/978-3-662-60531-8_3 ◽

2019 ◽

pp. 51-90

Author(s):

Éric Leclercq ◽

Annabelle Gillet ◽

Thierry Grison ◽

Marinette Savonnet

Keyword(s):

Big Data ◽

Data Model ◽

Data Analytics ◽

Big Data Analytics ◽

Impedance Mismatch ◽

Tensor Data ◽

Data Independence ◽

Logical Data

Download Full-text

Applying compression algorithms on hadoop cluster implementing through apache tez and hadoop mapreduce

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.26.12539 ◽

2018 ◽

Vol 7 (2.26) ◽

pp. 80

Author(s):

Dr E. Laxmi Lydia ◽

M Srinivasa Rao

Keyword(s):

Big Data ◽

Execution Time ◽

Big Data Analytics ◽

Research Area ◽

Word Count ◽

Hadoop Mapreduce ◽

Interactive Query ◽

Hadoop Distributed File System ◽

Hadoop Cluster ◽

Compressed Data

The latest and famous subject all over the cloud research area is Big Data; its main appearances are volume, velocity and variety. The characteristics are difficult to manage through traditional software and their various available methodologies. To manage the data which is occurring from various domains of big data are handled through Hadoop, which is open framework software which is mainly developed to provide solutions. Handling of big data analytics is done through Hadoop Map Reduce framework and it is the key engine of hadoop cluster and it is extensively used in these days. It uses batch processing system.Apache developed an engine named "Tez", which supports interactive query system and it won't writes any temporary data into the Hadoop Distributed File System(HDFS).The paper mainly focuses on performance juxtaposition of MapReduce and TeZ, performance of these two engines are examined through the compression of input files and map output files. To compare two engines we used Bzip compression algorithm for the input files and snappy for the map out files. Word Count and Terasort gauge are used on our experiments. For the Word Count gauge, the results shown that Tez engine has better execution time than Hadoop MapReduce engine for the both compressed and non-compressed data. It has reduced the execution time nearly 39% comparing to the execution time of the Hadoop MapReduce engine. Correspondingly for the terasort gauge, the Tez engine has higher execution time than Hadoop MapReduce engine.

Download Full-text

Data Analytics to Examine Trending Topics for Indonesian Election 2019

INOVTEK Polbeng - Seri Informatika ◽

10.35314/isi.v4i2.984 ◽

2019 ◽

Vol 4 (2) ◽

pp. 235

Author(s):

Firman Arifin ◽

Budi Nur Iman ◽

Elly Purwantini ◽

...

Keyword(s):

Social Media ◽

Big Data ◽

Public Interest ◽

Data Analytics ◽

Political Competition ◽

Big Data Analytics ◽

Model Based ◽

Political Campaigning ◽

Proposed Model ◽

Source Of Information

Understanding public interest and opinion are necessary tasks in high intense political competition. Utilizing big data analytics from social media provide an important source of information that candidates can utilize, manage and even engage them in targeted political campaigning agenda. One of the source in big data is social media’s interactions. Social media empowers public to participate proactivelyin the campaigning activities. This paper examines trends gathered from data analytics of two contenders’ group for Indonesian Election in 2019. It tracks the recent patterns of people engagement via social media analytic specifically Twitter. The study developed the analysis into the proposed model based on their trends and patterns.

Download Full-text

A Comparison of Big Data Frameworks on a Layered Dataflow Model

Parallel Processing Letters ◽

10.1142/s0129626417400035 ◽

2017 ◽

Vol 27 (01) ◽

pp. 1740003 ◽

Cited By ~ 12

Author(s):

Claudia Misale ◽

Maurizio Drocco ◽

Marco Aldinucci ◽

Guy Tremblay

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

Levels Of Abstraction ◽

Level Data ◽

Dataflow Model ◽

Proposed Model ◽

High Level ◽

Execution Models ◽

Different Levels

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models–for which only informal (and often confusing) semantics is generally provided–all share a common underlying model, namely, the Dataflow model. The model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.

Download Full-text

Does big data mean big knowledge? KM perspectives on big data and analytics

Journal of Knowledge Management ◽

10.1108/jkm-08-2016-0339 ◽

2017 ◽

Vol 21 (1) ◽

pp. 1-6 ◽

Cited By ~ 47

Author(s):

David J. Pauleen ◽

William Y.C. Wang

Keyword(s):

Big Data ◽

Knowledge Management ◽

Data Analytics ◽

Design Methodology ◽

Big Data Analytics ◽

Content Type ◽

Guiding Principle ◽

Proposed Model ◽

Organizational Value ◽

Practical Implications

Purpose This viewpoint study aims to make the case that the field of knowledge management (KM) must respond to the significant changes that big data/analytics is bringing to operationalizing the production of organizational data and information. Design/methodology/approach This study expresses the opinions of the guest editors of “Does Big Data Mean Big Knowledge? Knowledge Management Perspectives on Big Data and Analytics”. Findings A Big Data/Analytics-Knowledge Management (BDA-KM) model is proposed that illustrates the centrality of knowledge as the guiding principle in the use of big data/analytics in organizations. Research limitations/implications This is an opinion piece, and the proposed model still needs to be empirically verified. Practical implications It is suggested that academics and practitioners in KM must be capable of controlling the application of big data/analytics, and calls for further research investigating how KM can conceptually and operationally use and integrate big data/analytics to foster organizational knowledge for better decision-making and organizational value creation. Originality/value The BDA-KM model is one of the early models placing knowledge as the primary consideration in the successful organizational use of big data/analytics.

Download Full-text

Study on Big Data Frameworks

International Journal of Scientific Research in Science and Technology ◽

10.32628/ijsrst218475 ◽

2021 ◽

pp. 491-499

Author(s):

Adriano Fernandes ◽

Jonathan Barretto ◽

Jonas Fernandes

Keyword(s):

Big Data ◽

Execution Time ◽

Data Analytics ◽

Processing Time ◽

Performance Metrics ◽

Big Data Analytics ◽

Apache Hadoop ◽

And Performance ◽

General Collection ◽

Apache Storm

Big data analytics is becoming more and more popular every day as a tool for evaluating large volumes of data on demand. Apache Hadoop, Spark, Storm, and Flink are four of the most widely used big data processing frameworks. Although all four architectures support big data analysis, they vary in how they are used and the infrastructure that supports it. This paper defines a general collection of main performance metrics, which include Processing Time, CPU Use, Latency, Execution Time, Performance, Scalability, and Fault-tolerance, and contrasting the four big data architectures against these KPIs in a literature review. When compared to Apache Hadoop and Apache Storm frameworks for non-real-time results, Spark was found to be the winner over multiple KPIs, including processing time, CPU usage, Latency, Execution time, and Scalability. In terms of processing time, CPU consumption, latency, execution time, and performance, Flink surpassed Apache Spark and Apache Storm architectures.

Download Full-text

BIG DATA ANALYTICS FOR SMART CLOUD-FOG BASED APPLICATIONS

Journal of Trends in Computer Science and Smart Technology - September 2019 ◽

10.36548/jtcsst.2019.2.001 ◽

2019 ◽

Vol 2019 (02) ◽

pp. 74-83 ◽

Cited By ~ 14

Author(s):

Dr. Robert Bestak ◽

Dr. S. Smys

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics ◽

The Internet ◽

Data Generation ◽

Smart Healthcare ◽

Proposed Model ◽

Timely Access ◽

Healthcare Application ◽

Health Care Application

The internet connectivity extended by the internet of things to all the tangible things lying around and used by us in our day today life has convert the devices into smart objects and led to huge set of data generation that holds both the valuable and invaluable information. In order to perfectly handle the information’s generated and mine the valuables from them, the analytics are engaged by the cloud. To have a timely access, most probably the fog services are preferred than the cloud as they bring down the service of the cloud to the user edge and reduces the time complexity in accessing of the information. So the paper proposes the big data analytics for the fog assisted health care application to effectively handle the health information’s diagnosed for the aged persons. The proposed model is simulated using the IFogSim toolkit to examine the performance fogassisted smart healthcare application.

Download Full-text

Big data analytics capability for improved performance of higher education institutions in the Era of IR 4.0: A multi-analytical SEM & ANN perspective.

Technological Forecasting and Social Change ◽

10.1016/j.techfore.2021.121119 ◽

2021 ◽

Vol 173 ◽

pp. 121119

Author(s):

Mohamed Azlan Ashaari ◽

Karpal Singh Dara Singh ◽

Ghazanfar Ali Abbasi ◽

Azlan Amran ◽

Francisco J. Liebana-Cabanillas

Keyword(s):

Higher Education ◽

Big Data ◽

Data Analytics ◽

Higher Education Institutions ◽

Big Data Analytics ◽

Improved Performance ◽

Analytics Capability

Download Full-text

Big Data Analytics: The Next Big Thing

The Management Accountant Journal ◽

10.33516/maj.v54i5.20-24p ◽

2019 ◽

Vol 54 (5) ◽

pp. 20

Author(s):

Dheeraj Kumar Pradhan

Keyword(s):

Big Data ◽

Data Analytics ◽

Big Data Analytics

Download Full-text