Transparent Network Memory Storage for Efficient Container Execution in Big Data Clouds

Author(s):  
Juhyun Bae ◽  
Ling Liu ◽  
KaHo Chow ◽  
Yanzhao Wu ◽  
Gong Su ◽  
...  
2016 ◽  
Vol 33 (6) ◽  
pp. 1680-1704 ◽  
Author(s):  
Bao-Rong Chang ◽  
Hsiu-Fen Tsai ◽  
Yun-Che Tsai ◽  
Chin-Fu Kuo ◽  
Chi-Chung Chen

Purpose – The purpose of this paper is to integrate and optimize a multiple big data processing platform with the features of high performance, high availability and high scalability in big data environment. Design/methodology/approach – First, the integration of Apache Hive, Cloudera Impala and BDAS Shark make the platform support SQL-like query. Next, users can access a single interface and select the best performance of big data warehouse platform automatically by the proposed optimizer. Finally, the distributed memory storage system Memcached incorporated into the distributed file system, Apache HDFS, is employed for fast caching query results. Therefore, if users query the same SQL command, the same result responds rapidly from the cache system instead of suffering the repeated searches in a big data warehouse and taking a longer time to retrieve. Findings – As a result the proposed approach significantly improves the overall performance and dramatically reduces the search time as querying a database, especially applying for the high-repeatable SQL commands under multi-user mode. Research limitations/implications – Currently, Shark’s latest stable version 0.9.1 does not support the latest versions of Spark and Hive. In addition, this series of software only supports Oracle JDK7. Using Oracle JDK8 or Open JDK will cause serious errors, and some software will be unable to run. Practical implications – The problem with this system is that some blocks are missing when too many blocks are stored in one result (about 100,000 records). Another problem is that the sequential writing into In-memory cache wastes time. Originality/value – When the remaining memory capacity is 2 GB or less on each server, Impala and Shark will have a lot of page swapping, causing extremely low performance. When the data scale is larger, it may cause the JVM I/O exception and make the program crash. However, when the remaining memory capacity is sufficient, Shark is faster than Hive and Impala. Impala’s consumption of memory resources is between those of Shark and Hive. This amount of remaining memory is sufficient for Impala’s maximum performance. In this study, each server allocates 20 GB of memory for cluster computing and sets the amount of remaining memory as Level 1: 3 percent (0.6 GB), Level 2: 15 percent (3 GB) and Level 3: 75 percent (15 GB) as the critical points. The program automatically selects Hive when memory is less than 15 percent, Impala at 15 to 75 percent and Shark at more than 75 percent.


2018 ◽  
Vol 7 (3.29) ◽  
pp. 12
Author(s):  
L Chandra Sekhar Reddy ◽  
Dr D. Murali

We live today in a digital world a tremendous amount of data is generated by each digital service we use. This vast amount of data generated is called Big Data. According to Wikipedia, Big Data is a word for large data sets or compositions that the traditional data monitoring application software is pitiful to compress [5]. Extensive data cannot be used to receive data, store data, analyse data, search, share, transfer, view, consult, and update and maintain the confidentiality of information. Google's streaming services, YouTube, are one of the best examples of services that produce a massive amount of data in a brief period. Data extraction of a significant amount of data is done using Hadoop and MapReduce to measure performance. Hadoop is a system that offers consistent memory. Storage is provided by HDFS (Hadoop Distributed File System) and MapReduce analysis. MapReduce is a programming model and a corresponding implementation for processing large data sets. This article presents the analysis of Big Data on YouTube using the Hadoop and MapReduce techniques.   


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Saad Ahmed Dheyab ◽  
Mohammed Najm Abdullah ◽  
Buthainah Fahran Abed

AbstractThe analysis and processing of big data are one of the most important challenges that researchers are working on to find the best approaches to handle it with high performance, low cost and high accuracy. In this paper, a novel approach for big data processing and management was proposed that differed from the existing ones; the proposed method employs not only the memory space to reads and handle big data, it also uses space of memory-mapped extended from memory storage. From a methodological viewpoint, the novelty of this paper is the segmentation stage of big data using memory mapping and broadcasting all segments to a number of processors using a parallel message passing interface. From an application viewpoint, the paper presents a high-performance approach based on a homogenous network which works parallelly to encrypt-decrypt big data using AES algorithm. This approach can be done on Windows Operating System using .NET libraries.


Author(s):  
Alessandra Bernardi ◽  
Martina Iannacito ◽  
Duccio Rocchini

AbstractWe propose a new method to estimate plant diversity with Rényi and Rao indexes through the so called High Order Singular Value Decomposition (HOSVD) of tensors. Starting from NASA multi-spectral images we evaluate diversity and we compare original diversity estimates with those realized via the HOSVD compression methods for big data. Our strategy turns out to be extremely powerful in terms of memory storage and precision of the outcome. The obtained results are so promising that we can support the efficiency of our method in the ecological framework.


The rise of IoT Real time data has led to new demands for mining systems to learn complex models with millions to billions of parameters, which promise adequate capacity to digest massive datasets and offer powerful predictive analytics. To support Big Data mining, high-performance powerful computing platforms are required, which impose regular designs to unleash the full power of the Big Data. Pattern mining poses a lot of interesting research problems and there are many areas that are still not well understood. The specifically very elementary challenges are to understand the meaningful data from the junk data that anticipated into the internet, refer as “Smart Data”. Eighty-five percent of the entire data are noisy or meaningless. It is a very tough often assigned to verify and separate to refine the data from the noisy junk. Researchers’ are proposing an algorithm of distributed pattern mining to give some sort of solution of the heterogeneity, scaling and hidden Big Data problems. The algorithm has evaluated in parameters like cost, speed, space and overhead. Researchers’ used IoT as the source of Big Data that generates heterogeneous Big Data. In this paper, we are representing the results of all tests proved that; the new method gives accurate results and valid outputs based on verifying them with the results of the other valid methods. Also, the results show that, the new method can handle the big datasets and decides the frequent pattern and produces the associate rule sets faster than that of the conventional methods and less amount of memory storage for processing. Overall the new method has a challenging performance as regard the memory storage and the speed of processing as compared to the conventional methods of frequent pattern mining like Apriori and FP-Growth techniques.


Author(s):  
Xiaochun Tang ◽  
Jiawen Zhou

Aim. MRO2 system is a data management platform. It has the ability to manage and store all kinds of data in the product's lifecycle, that is both the mass storage capacity and the scalability are required. For the existing big-data stores for MRO2 systemeither only focus on the storage problem, or only do the scalability issue. In this paper, a two-layer data management model is proposed, in which the top layer uses the memory storage for scalability and the bottom layer uses the distributed key-value storage for mass storage. By adding a middle layer of the key group between the application and the KV storage system, the keys for real-time processing are combined to cache in a node. It satisfies the characteristics of the real-time application and improves the dynamic scalability. The present protocol of the dynamic key groups for real-time distributed computation for MRO2 system is explained in detail. And then the protocol for creating and deleting key groups is introduced. The third topic is an implement of a big-data store for supporting MRO2 system. In this topic, the delay times for creating and deleting the dynamic transaction groups to estimation are used. Finally, the experiments to appraise the present method are done. The response time of the present method is quite efficient in comparison with the other methods to be used inbig data storage systems.


2016 ◽  
Vol 2016 ◽  
pp. 1-15 ◽  
Author(s):  
Aymen Abdullah Alsaffar ◽  
Hung Phuoc Pham ◽  
Choong-Seon Hong ◽  
Eui-Nam Huh ◽  
Mohammad Aazam

Despite the wide utilization of cloud computing (e.g., services, applications, and resources), some of the services, applications, and smart devices are not able to fully benefit from this attractive cloud computing paradigm due to the following issues: (1) smart devices might be lacking in their capacity (e.g., processing, memory, storage, battery, and resource allocation), (2) they might be lacking in their network resources, and (3) the high network latency to centralized server in cloud might not be efficient for delay-sensitive application, services, and resource allocations requests. Fog computing is promising paradigm that can extend cloud resources to edge of network, solving the abovementioned issue. As a result, in this work, we propose an architecture of IoT service delegation and resource allocation based on collaboration between fog and cloud computing. We provide new algorithm that is decision rules of linearized decision tree based on three conditions (services size, completion time, and VMs capacity) for managing and delegating user request in order to balance workload. Moreover, we propose algorithm to allocate resources to meet service level agreement (SLA) and quality of services (QoS) as well as optimizing big data distribution in fog and cloud computing. Our simulation result shows that our proposed approach can efficiently balance workload, improve resource allocation efficiently, optimize big data distribution, and show better performance than other existing methods.


ASHA Leader ◽  
2013 ◽  
Vol 18 (2) ◽  
pp. 59-59
Keyword(s):  

Find Out About 'Big Data' to Track Outcomes


Sign in / Sign up

Export Citation Format

Share Document