scholarly journals DynDL: Scheduling Data-Locality-Aware Tasks with Dynamic Data Transfer Cost for Multicore-Server-Based Big Data Clusters

2018 ◽  
Vol 8 (11) ◽  
pp. 2216
Author(s):  
Jiahui Jin ◽  
Qi An ◽  
Wei Zhou ◽  
Jiakai Tang ◽  
Runqun Xiong

Network bandwidth is a scarce resource in big data environments, so data locality is a fundamental problem for data-parallel frameworks such as Hadoop and Spark. This problem is exacerbated in multicore server-based clusters, where multiple tasks running on the same server compete for the server’s network bandwidth. Existing approaches solve this problem by scheduling computational tasks near the input data and considering the server’s free time, data placements, and data transfer costs. However, such approaches usually set identical values for data transfer costs, even though a multicore server’s data transfer cost increases with the number of data-remote tasks. Eventually, this hampers data-processing time, by minimizing it ineffectively. As a solution, we propose DynDL (Dynamic Data Locality), a novel data-locality-aware task-scheduling model that handles dynamic data transfer costs for multicore servers. DynDL offers greater flexibility than existing approaches by using a set of non-decreasing functions to evaluate dynamic data transfer costs. We also propose online and offline algorithms (based on DynDL) that minimize data-processing time and adaptively adjust data locality. Although DynDL is NP-complete (nondeterministic polynomial-complete), we prove that the offline algorithm runs in quadratic time and generates optimal results for DynDL’s specific uses. Using a series of simulations and real-world executions, we show that our algorithms are 30% better than algorithms that do not consider dynamic data transfer costs in terms of data-processing time. Moreover, they can adaptively adjust data localities based on the server’s free time, data placement, and network bandwidth, and schedule tens of thousands of tasks within subseconds or seconds.

2020 ◽  
Vol 14 ◽  
pp. 174830262096239 ◽  
Author(s):  
Chuang Wang ◽  
Wenbo Du ◽  
Zhixiang Zhu ◽  
Zhifeng Yue

With the wide application of intelligent sensors and internet of things (IoT) in the smart job shop, a large number of real-time production data is collected. Accurate analysis of the collected data can help producers to make effective decisions. Compared with the traditional data processing methods, artificial intelligence, as the main big data analysis method, is more and more applied to the manufacturing industry. However, the ability of different AI models to process real-time data of smart job shop production is also different. Based on this, a real-time big data processing method for the job shop production process based on Long Short-Term Memory (LSTM) and Gate Recurrent Unit (GRU) is proposed. This method uses the historical production data extracted by the IoT job shop as the original data set, and after data preprocessing, uses the LSTM and GRU model to train and predict the real-time data of the job shop. Through the description and implementation of the model, it is compared with KNN, DT and traditional neural network model. The results show that in the real-time big data processing of production process, the performance of the LSTM and GRU models is superior to the traditional neural network, K nearest neighbor (KNN), decision tree (DT). When the performance is similar to LSTM, the training time of GRU is much lower than LSTM model.


Author(s):  
Amitava Choudhury ◽  
Kalpana Rangra

Data type and amount in human society is growing at an amazing speed, which is caused by emerging new services such as cloud computing, internet of things, and location-based services. The era of big data has arrived. As data has been a fundamental resource, how to manage and utilize big data better has attracted much attention. Especially with the development of the internet of things, how to process a large amount of real-time data has become a great challenge in research and applications. Recently, cloud computing technology has attracted much attention to high performance, but how to use cloud computing technology for large-scale real-time data processing has not been studied. In this chapter, various big data processing techniques are discussed.


2020 ◽  
Vol 1432 ◽  
pp. 012110
Author(s):  
Jesús Silva ◽  
Hugo Hernández Palma ◽  
William Niebles Núñez ◽  
David Ovallos-Gazabon ◽  
Noel Varela

2019 ◽  
Vol 16 (8) ◽  
pp. 3211-3215 ◽  
Author(s):  
S. Prince Mary ◽  
D. Usha Nandini ◽  
B. Ankayarkanni ◽  
R. Sathyabama Krishna

Integration of cloud and big data very difficult and challenging task and to find the number of resources to complete their job is very difficult and challenging. So, virtualization is implemented it involves 3 phases map reduce, shuffle phase and reduce phase. Many researchers have been done already they have applied Heterogeneousmap reduce application and they use least-work-left policy technique to distributed server system. In this paper we have discussed about virtualization is used for hadoop jobs for effective data processing and to find the processing time of job and balance partition algorithm is used. The main objective is to implement virtualization in our local machines.


2015 ◽  
Vol 9s1 ◽  
pp. BBI.S28988 ◽  
Author(s):  
Frank A. Feltus ◽  
Joseph R. Breen ◽  
Juan Deng ◽  
Ryan S. Izard ◽  
Christopher A. Konger ◽  
...  

In the last decade, high-throughput DNA sequencing has become a disruptive technology and pushed the life sciences into a distributed ecosystem of sequence data producers and consumers. Given the power of genomics and declining sequencing costs, biology is an emerging “Big Data” discipline that will soon enter the exabyte data range when all subdisciplines are combined. These datasets must be transferred across commercial and research networks in creative ways since sending data without thought can have serious consequences on data processing time frames. Thus, it is imperative that biologists, bioinformaticians, and information technology engineers recalibrate data processing paradigms to fit this emerging reality. This review attempts to provide a snapshot of Big Data transfer across networks, which is often overlooked by many biologists. Specifically, we discuss four key areas: 1) data transfer networks, protocols, and applications; 2) data transfer security including encryption, access, firewalls, and the Science DMZ; 3) data flow control with software-defined networking; and 4) data storage, staging, archiving and access. A primary intention of this article is to orient the biologist in key aspects of the data transfer process in order to frame their genomics-oriented needs to enterprise IT professionals.


Sensors ◽  
2018 ◽  
Vol 18 (9) ◽  
pp. 2994 ◽  
Author(s):  
Bhagya Silva ◽  
Murad Khan ◽  
Changsu Jung ◽  
Jihun Seo ◽  
Diyan Muhammad ◽  
...  

The Internet of Things (IoT), inspired by the tremendous growth of connected heterogeneous devices, has pioneered the notion of smart city. Various components, i.e., smart transportation, smart community, smart healthcare, smart grid, etc. which are integrated within smart city architecture aims to enrich the quality of life (QoL) of urban citizens. However, real-time processing requirements and exponential data growth withhold smart city realization. Therefore, herein we propose a Big Data analytics (BDA)-embedded experimental architecture for smart cities. Two major aspects are served by the BDA-embedded smart city. Firstly, it facilitates exploitation of urban Big Data (UBD) in planning, designing, and maintaining smart cities. Secondly, it occupies BDA to manage and process voluminous UBD to enhance the quality of urban services. Three tiers of the proposed architecture are liable for data aggregation, real-time data management, and service provisioning. Moreover, offline and online data processing tasks are further expedited by integrating data normalizing and data filtering techniques to the proposed work. By analyzing authenticated datasets, we obtained the threshold values required for urban planning and city operation management. Performance metrics in terms of online and offline data processing for the proposed dual-node Hadoop cluster is obtained using aforementioned authentic datasets. Throughput and processing time analysis performed with regard to existing works guarantee the performance superiority of the proposed work. Hence, we can claim the applicability and reliability of implementing proposed BDA-embedded smart city architecture in the real world.


Author(s):  
D A Smuseva ◽  
A Y Rolich ◽  
L S Voskov ◽  
I Y Malakhov

The paper reviews the current situation of the Augmented Reality and Internet of Things markets. The implementing possibilities of AR for Big Data visualization from IoT devices are considered in this paper. The review and the analysis of methods, tools, products and data system of the visualization are presented. The paper provides an overview of the programs and devices of Augmented Reality, and an overview of development environments. The paper presents the existing classifications of computerized data visualization tools and proposes new classification, which takes into account interactive visualization, the purpose of the tool, the type of software product, the availability of ready-made templates, and other characteristics. The article proposes the architecture of the system for collecting data from IoT endpoint devices based on the Heltec modules. Experiments based on the developed experimental stand were carried out with Heltec devices of both versions to determine the number of losses with increasing distance between the sending device and the receiving device. The results of measuring the power consumption of these devices are presented in two modes: in standby mode and when sending a message to the Heltec endpoint device and in standby mode and when receiving a message for the base station. These studies were conducted using various data transfer protocols (LoRa, Wi-Fi and Bluetooth). The paper presents the result of the development of a digital twin of a university building and the development of augmented reality software for receiving data from real-time data collection devices.


10.29007/db8n ◽  
2019 ◽  
Author(s):  
Mohammad Hossain ◽  
Maninder Singh ◽  
Sameer Abufardeh

Time is a critical factor in processing a very large volume of data a.k.a ‘Big Data’. Many existing data mining algorithms (supervised and unsupervised) become futile because of the ubiquitous use of horizontal processing i.e. row-by-row processing of stored data. Processing time for big data is further exacerbated by its high dimensionality (# of features) and high cardinality (# of records). To address this processing-time issue, we proposed a vertical approach with predicate trees (pTree). Our approach structures data into columns of bit slices, which range from few to hundreds and are processed vertically i.e. column by column. We tested and compared our vertical approach to traditional (horizontal) approach using three basic Boolean operations namely addition, subtraction and multiplication with 10 data sizes. The length of data size ranged from half a billion bits to 5 billion bits. The results are analyzed w.r.t processing speed time and speed gain for both the approaches. The result shows that our vertical approach outperformed the traditional approach for all Boolean operations (add, subtract and multiply) across all data sizes and results in speed-gain between 24% to 96%. We concluded from our results that our approach being in data-mining ready format is best suited to apply to operations involving complex computations in big data application to achieve significant speed gain.


Author(s):  
Rajni Aron ◽  
Deepak Kumar Aggarwal

Cloud Computing has become a buzzword in the IT industry. Cloud Computing which provides inexpensive computing resources on the pay-as-you-go basis is promptly gaining momentum as a substitute for traditional Information Technology (IT) based organizations. Therefore, the increased utilization of Clouds makes an execution of Big Data processing jobs a vital research area. As more and more users have started to store/process their real-time data in Cloud environments, Resource Provisioning and Scheduling of Big Data processing jobs becomes a key element of consideration for efficient execution of Big Data applications. This chapter discusses the fundamental concepts supporting Cloud Computing & Big Data terms and the relationship between them. This chapter will help researchers find the important characteristics of Cloud Resource Management Systems to handle Big Data processing jobs and will also help to select the most suitable technique for processing Big Data jobs in Cloud Computing environment.


Sign in / Sign up

Export Citation Format

Share Document