An independent time optimized hybrid infrastructure for big data analytics

2020 ◽  
Vol 34 (28) ◽  
pp. 2050311
Author(s):  
Satvik Vats ◽  
B. B. Sagar

In Big data domain, platform dependency can alter the behavior of the business. It is because of the different kinds (Structured, Semi-structured and Unstructured) and characteristics of the data. By the traditional infrastructure, different kinds of data cannot be processed simultaneously due to their platform dependency for a particular task. Therefore, the responsibility of selecting suitable tools lies with the user. The variety of data generated by different sources requires the selection of suitable tools without human intervention. Further, these tools also face the limitation of recourses to deal with a large volume of data. This limitation of resources affects the performance of the tools in terms of execution time. Therefore, in this work, we proposed a model in which different data analytics tools share a common infrastructure to provide data independence and resource sharing environment, i.e. the proposed model shares common (Hybrid) Hadoop Distributed File System (HDFS) between three Name-Node (Master Node), three Data-Node and one Client-node, which works under the DeMilitarized zone (DMZ). To realize this model, we have implemented Mahout, R-Hadoop and Splunk sharing a common HDFS. Further using our model, we run [Formula: see text]-means clustering, Naïve Bayes and recommender algorithms on three different datasets, movie rating, newsgroup, and Spam SMS dataset, representing structured, semi-structured and unstructured, respectively. Our model selected the appropriate tool, e.g. Mahout to run on the newsgroup dataset as other tools cannot run on this data. This shows that our model provides data independence. Further results of our proposed model are compared with the legacy (individual) model in terms of execution time and scalability. The improved performance of the proposed model establishes the hypothesis that our model overcomes the limitation of the resources of the legacy model.

Symmetry ◽  
2020 ◽  
Vol 12 (8) ◽  
pp. 1274 ◽  
Author(s):  
Satvik Vats ◽  
Bharat Bhushan Sagar ◽  
Karan Singh ◽  
Ali Ahmadian ◽  
Bruno A. Pansera

Traditional data analytics tools are designed to deal with the asymmetrical type of data i.e., structured, semi-structured, and unstructured. The diverse behavior of data produced by different sources requires the selection of suitable tools. The restriction of recourses to deal with a huge volume of data is a challenge for these tools, which affects the performances of the tool’s execution time. Therefore, in the present paper, we proposed a time optimization model, shares common HDFS (Hadoop Distributed File System) between three Name-node (Master Node), three Data-node, and one Client-node. These nodes work under the DeMilitarized zone (DMZ) to maintain symmetry. Machine learning jobs are explored from an independent platform to realize this model. In the first node (Name-node 1), Mahout is installed with all machine learning libraries through the maven repositories. The second node (Name-node 2), R connected to Hadoop, is running through the shiny-server. Splunk is configured in the third node (Name-node 3) and is used to analyze the logs. Experiments are performed between the proposed and legacy model to evaluate the response time, execution time, and throughput. K-means clustering, Navies Bayes, and recommender algorithms are run on three different data sets, i.e., movie rating, newsgroup, and Spam SMS data set, representing structured, semi-structured, and unstructured data, respectively. The selection of tools defines data independence, e.g., Newsgroup data set to run on Mahout as others cannot be compatible with this data. It is evident from the outcome of the data that the performance of the proposed model establishes the hypothesis that our model overcomes the limitation of the resources of the legacy model. In addition, the proposed model can process any kind of algorithm on different sets of data, which resides in its native formats.


2018 ◽  
Vol 7 (2.26) ◽  
pp. 80
Author(s):  
Dr E. Laxmi Lydia ◽  
M Srinivasa Rao

The latest and famous subject all over the cloud research area is Big Data; its main appearances are volume, velocity and variety. The characteristics are difficult to manage through traditional software and their various available methodologies. To manage the data which is occurring from various domains of big data are handled through Hadoop, which is open framework software which is mainly developed to provide solutions. Handling of big data analytics is done through Hadoop Map Reduce framework and it is the key engine of hadoop cluster and it is extensively used in these days. It uses batch processing system.Apache developed an engine named "Tez", which supports interactive query system and it won't writes any temporary data into the Hadoop Distributed File System(HDFS).The paper mainly focuses on performance juxtaposition of MapReduce and TeZ, performance of these two engines are examined through the compression of input files and map output files. To compare two engines we used Bzip compression algorithm for the input files and snappy for the map out files. Word Count and Terasort gauge are used on our experiments. For the Word Count gauge, the results shown that Tez engine has better execution time than Hadoop MapReduce engine for the both compressed and non-compressed data. It has reduced the execution time nearly 39% comparing to the execution time of the Hadoop MapReduce engine. Correspondingly for the terasort gauge, the Tez engine has higher execution time than Hadoop MapReduce engine.  


2019 ◽  
Vol 4 (2) ◽  
pp. 235
Author(s):  
Firman Arifin ◽  
Budi Nur Iman ◽  
Budi Nur Iman ◽  
Elly Purwantini ◽  
Elly Purwantini ◽  
...  

Understanding public interest and opinion are necessary tasks in high intense political competition. Utilizing big data analytics from social media provide an important source of information that candidates can utilize, manage and even engage them in targeted political campaigning agenda. One of the source in big data is social media’s interactions. Social media empowers public to participate proactivelyin the campaigning activities. This paper examines trends gathered from data analytics of two contenders’ group for Indonesian Election in 2019. It tracks the recent patterns of people engagement via social media analytic specifically Twitter. The study developed the analysis into the proposed model based on their trends and patterns.


2017 ◽  
Vol 27 (01) ◽  
pp. 1740003 ◽  
Author(s):  
Claudia Misale ◽  
Maurizio Drocco ◽  
Marco Aldinucci ◽  
Guy Tremblay

In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models–for which only informal (and often confusing) semantics is generally provided–all share a common underlying model, namely, the Dataflow model. The model we propose shows how various tools share the same expressiveness at different levels of abstraction. The contribution of this work is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to understand high-level data-processing applications written in such frameworks. Second, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level.


2017 ◽  
Vol 21 (1) ◽  
pp. 1-6 ◽  
Author(s):  
David J. Pauleen ◽  
William Y.C. Wang

Purpose This viewpoint study aims to make the case that the field of knowledge management (KM) must respond to the significant changes that big data/analytics is bringing to operationalizing the production of organizational data and information. Design/methodology/approach This study expresses the opinions of the guest editors of “Does Big Data Mean Big Knowledge? Knowledge Management Perspectives on Big Data and Analytics”. Findings A Big Data/Analytics-Knowledge Management (BDA-KM) model is proposed that illustrates the centrality of knowledge as the guiding principle in the use of big data/analytics in organizations. Research limitations/implications This is an opinion piece, and the proposed model still needs to be empirically verified. Practical implications It is suggested that academics and practitioners in KM must be capable of controlling the application of big data/analytics, and calls for further research investigating how KM can conceptually and operationally use and integrate big data/analytics to foster organizational knowledge for better decision-making and organizational value creation. Originality/value The BDA-KM model is one of the early models placing knowledge as the primary consideration in the successful organizational use of big data/analytics.


Author(s):  
Adriano Fernandes ◽  
Jonathan Barretto ◽  
Jonas Fernandes

Big data analytics is becoming more and more popular every day as a tool for evaluating large volumes of data on demand. Apache Hadoop, Spark, Storm, and Flink are four of the most widely used big data processing frameworks. Although all four architectures support big data analysis, they vary in how they are used and the infrastructure that supports it. This paper defines a general collection of main performance metrics, which include Processing Time, CPU Use, Latency, Execution Time, Performance, Scalability, and Fault-tolerance, and contrasting the four big data architectures against these KPIs in a literature review. When compared to Apache Hadoop and Apache Storm frameworks for non-real-time results, Spark was found to be the winner over multiple KPIs, including processing time, CPU usage, Latency, Execution time, and Scalability. In terms of processing time, CPU consumption, latency, execution time, and performance, Flink surpassed Apache Spark and Apache Storm architectures.


Author(s):  
Dr. Robert Bestak ◽  
Dr. S. Smys

The internet connectivity extended by the internet of things to all the tangible things lying around and used by us in our day today life has convert the devices into smart objects and led to huge set of data generation that holds both the valuable and invaluable information. In order to perfectly handle the information’s generated and mine the valuables from them, the analytics are engaged by the cloud. To have a timely access, most probably the fog services are preferred than the cloud as they bring down the service of the cloud to the user edge and reduces the time complexity in accessing of the information. So the paper proposes the big data analytics for the fog assisted health care application to effectively handle the health information’s diagnosed for the aged persons. The proposed model is simulated using the IFogSim toolkit to examine the performance fogassisted smart healthcare application.


2019 ◽  
Vol 54 (5) ◽  
pp. 20
Author(s):  
Dheeraj Kumar Pradhan

Sign in / Sign up

Export Citation Format

Share Document