Transparent Network Memory Storage for Efficient Container Execution in Big Data Clouds

Purpose – The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment. Design/methodology/approach – First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve. Findings – As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode. Research limitations/implications – Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run. Practical implications – The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time. Originality/value – When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.

Download Full-text

YouTube: big data analytics using Hadoop and map reduce

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.29.18451 ◽

2018 ◽

Vol 7 (3.29) ◽

pp. 12

Author(s):

L Chandra Sekhar Reddy ◽

Dr D. Murali

Keyword(s):

Big Data ◽

Programming Model ◽

Data Extraction ◽

Big Data Analytics ◽

Large Data ◽

Memory Storage ◽

Large Data Sets ◽

Data Sets ◽

Digital Service ◽

Digital World

We live today in a digital world a tremendous amount of data is generated by each digital service we use. This vast amount of data generated is called Big Data. According to Wikipedia, Big Data is a word for large data sets or compositions that the traditional data monitoring application software is pitiful to compress [5]. Extensive data cannot be used to receive data, store data, analyse data, search, share, transfer, view, consult, and update and maintain the confidentiality of information. Google's streaming services, YouTube, are one of the best examples of services that produce a massive amount of data in a brief period. Data extraction of a significant amount of data is done using Hadoop and MapReduce to measure performance. Hadoop is a system that offers consistent memory. Storage is provided by HDFS (Hadoop Distributed File System) and MapReduce analysis. MapReduce is a programming model and a corresponding implementation for processing large data sets. This article presents the analysis of Big Data on YouTube using the Hadoop and MapReduce techniques.

Download Full-text

A novel approach for big data processing using message passing interface based on memory mapping

Journal Of Big Data ◽

10.1186/s40537-019-0275-3 ◽

2019 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Saad Ahmed Dheyab ◽

Mohammed Najm Abdullah ◽

Buthainah Fahran Abed

Keyword(s):

Big Data ◽

Data Processing ◽

Message Passing ◽

High Performance ◽

Message Passing Interface ◽

Memory Storage ◽

Big Data Processing ◽

Memory Space ◽

Novel Approach ◽

Memory Mapping

AbstractThe analysis and processing of big data are one of the most important challenges that researchers are working on to find the best approaches to handle it with high performance, low cost and high accuracy. In this paper, a novel approach for big data processing and management was proposed that differed from the existing ones; the proposed method employs not only the memory space to reads and handle big data, it also uses space of memory-mapped extended from memory storage. From a methodological viewpoint, the novelty of this paper is the segmentation stage of big data using memory mapping and broadcasting all segments to a number of processors using a parallel message passing interface. From an application viewpoint, the paper presents a high-performance approach based on a homogenous network which works parallelly to encrypt-decrypt big data using AES algorithm. This approach can be done on Windows Operating System using .NET libraries.

Download Full-text

High order singular value decomposition for plant diversity estimation

Bollettino dell Unione Matematica Italiana ◽

10.1007/s40574-021-00300-w ◽

2021 ◽

Author(s):

Alessandra Bernardi ◽

Martina Iannacito ◽

Duccio Rocchini

Keyword(s):

Big Data ◽

Singular Value Decomposition ◽

Plant Diversity ◽

Singular Value ◽

High Order ◽

Memory Storage ◽

New Method ◽

Ecological Framework ◽

Value Decomposition ◽

Diversity Estimates

AbstractWe propose a new method to estimate plant diversity with Rényi and Rao indexes through the so called High Order Singular Value Decomposition (HOSVD) of tensors. Starting from NASA multi-spectral images we evaluate diversity and we compare original diversity estimates with those realized via the HOSVD compression methods for big data. Our strategy turns out to be extremely powerful in terms of memory storage and precision of the outcome. The obtained results are so promising that we can support the efficiency of our method in the ecological framework.

Download Full-text

A New Pattern Mining Algorithm for Analytics of Real-Time Internet of Things Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a4506.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 1178-1183

Keyword(s):

Big Data ◽

Real Time ◽

High Performance ◽

Pattern Mining ◽

Predictive Analytics ◽

Memory Storage ◽

New Method ◽

Frequent Pattern ◽

Time Data ◽

Conventional Methods

The rise of IoT Real time data has led to new demands for mining systems to learn complex models with millions to billions of parameters, which promise adequate capacity to digest massive datasets and offer powerful predictive analytics. To support Big Data mining, high-performance powerful computing platforms are required, which impose regular designs to unleash the full power of the Big Data. Pattern mining poses a lot of interesting research problems and there are many areas that are still not well understood. The specifically very elementary challenges are to understand the meaningful data from the junk data that anticipated into the internet, refer as “Smart Data”. Eighty-five percent of the entire data are noisy or meaningless. It is a very tough often assigned to verify and separate to refine the data from the noisy junk. Researchers’ are proposing an algorithm of distributed pattern mining to give some sort of solution of the heterogeneity, scaling and hidden Big Data problems. The algorithm has evaluated in parameters like cost, speed, space and overhead. Researchers’ used IoT as the source of Big Data that generates heterogeneous Big Data. In this paper, we are representing the results of all tests proved that; the new method gives accurate results and valid outputs based on verifying them with the results of the other valid methods. Also, the results show that, the new method can handle the big datasets and decides the frequent pattern and produces the associate rule sets faster than that of the conventional methods and less amount of memory storage for processing. Overall the new method has a challenging performance as regard the memory storage and the speed of processing as compared to the conventional methods of frequent pattern mining like Apriori and FP-Growth techniques.

Download Full-text

Data Prefetching and Eviction Mechanisms of In-Memory Storage Systems Based on Scheduling for Big Data Processing

IEEE Transactions on Parallel and Distributed Systems ◽

10.1109/tpds.2019.2892957 ◽

2019 ◽

Vol 30 (8) ◽

pp. 1738-1752 ◽

Cited By ~ 1

Author(s):

Chien-Hung Chen ◽

Ting-Yuan Hsia ◽

Yennun Huang ◽

Sy-Yen Kuo

Keyword(s):

Big Data ◽

Data Processing ◽

Storage Systems ◽

Memory Storage ◽

Data Prefetching ◽

Big Data Processing

Download Full-text

Bridging high velocity and high volume industrial big data through distributed in-memory storage & analytics

2014 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata.2014.7004325 ◽

2014 ◽

Cited By ~ 8

Author(s):

Jenny Weisenberg Williams ◽

Kareem S. Aggour ◽

John Interrante ◽

Justin McHugh ◽

Eric Pool

Keyword(s):

Big Data ◽

High Velocity ◽

High Volume ◽

Memory Storage ◽

Industrial Big Data

Download Full-text

Better Big Data Store for Real-Time Processing of Modern MRO System Via Transaction Key Groups

Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University ◽

10.1051/jnwpu/20183661139 ◽

2018 ◽

Vol 36 (6) ◽

pp. 1139-1144

Author(s):

Xiaochun Tang ◽

Jiawen Zhou

Keyword(s):

Big Data ◽

Data Management ◽

Real Time ◽

Present Method ◽

Memory Storage ◽

Mass Storage ◽

Real Time Processing ◽

Time Processing ◽

Management Platform ◽

Data Store

Aim. MRO2 system is a data management platform. It has the ability to manage and store all kinds of data in the product's lifecycle, that is both the mass storage capacity and the scalability are required. For the existing big-data stores for MRO2 systemeither only focus on the storage problem, or only do the scalability issue. In this paper, a two-layer data management model is proposed, in which the top layer uses the memory storage for scalability and the bottom layer uses the distributed key-value storage for mass storage. By adding a middle layer of the key group between the application and the KV storage system, the keys for real-time processing are combined to cache in a node. It satisfies the characteristics of the real-time application and improves the dynamic scalability. The present protocol of the dynamic key groups for real-time distributed computation for MRO2 system is explained in detail. And then the protocol for creating and deleting key groups is introduced. The third topic is an implement of a big-data store for supporting MRO2 system. In this topic, the delay times for creating and deleting the dynamic transaction groups to estimation are used. Finally, the experiments to appraise the present method are done. The response time of the present method is quite efficient in comparison with the other methods to be used inbig data storage systems.

Download Full-text

An Architecture of IoT Service Delegation and Resource Allocation Based on Collaboration between Fog and Cloud Computing

Mobile Information Systems ◽

10.1155/2016/6123234 ◽

2016 ◽

Vol 2016 ◽

pp. 1-15 ◽

Cited By ~ 23

Author(s):

Aymen Abdullah Alsaffar ◽

Hung Phuoc Pham ◽

Choong-Seon Hong ◽

Eui-Nam Huh ◽

Mohammad Aazam

Keyword(s):

Resource Allocation ◽

Cloud Computing ◽

Big Data ◽

Fog Computing ◽

Decision Rules ◽

Service Level Agreement ◽

Data Distribution ◽

Service Level ◽

Memory Storage ◽

Smart Devices

Despite the wide utilization of cloud computing (e.g., services, applications, and resources), some of the services, applications, and smart devices are not able to fully benefit from this attractive cloud computing paradigm due to the following issues: (1) smart devices might be lacking in their capacity (e.g., processing, memory, storage, battery, and resource allocation), (2) they might be lacking in their network resources, and (3) the high network latency to centralized server in cloud might not be efficient for delay-sensitive application, services, and resource allocations requests. Fog computing is promising paradigm that can extend cloud resources to edge of network, solving the abovementioned issue. As a result, in this work, we propose an architecture of IoT service delegation and resource allocation based on collaboration between fog and cloud computing. We provide new algorithm that is decision rules of linearized decision tree based on three conditions (services size, completion time, and VMs capacity) for managing and delegating user request in order to balance workload. Moreover, we propose algorithm to allocate resources to meet service level agreement (SLA) and quality of services (QoS) as well as optimizing big data distribution in fog and cloud computing. Our simulation result shows that our proposed approach can efficiently balance workload, improve resource allocation efficiently, optimize big data distribution, and show better performance than other existing methods.

Download Full-text

Find Out About 'Big Data' to Track Outcomes

ASHA Leader ◽

10.1044/leader.an5.18022013.59 ◽

2013 ◽

Vol 18 (2) ◽

pp. 59-59

Keyword(s):

Big Data

Find Out About 'Big Data' to Track Outcomes

Download Full-text