scholarly journals A Survey on Job Scheduling in Big Data

2016 ◽  
Vol 16 (3) ◽  
pp. 35-51 ◽  
Author(s):  
M. Senthilkumar ◽  
P. Ilango

Abstract Big Data Applications with Scheduling becomes an active research area in last three years. The Hadoop framework becomes very popular and most used frameworks in a distributed data processing. Hadoop is also open source software that allows the user to effectively utilize the hardware. Various scheduling algorithms of the MapReduce model using Hadoop vary with design and behavior, and are used for handling many issues like data locality, awareness with resource, energy and time. This paper gives the outline of job scheduling, classification of the scheduler, and comparison of different existing algorithms with advantages, drawbacks, limitations. In this paper, we discussed various tools and frameworks used for monitoring and the ways to improve the performance in MapReduce. This paper helps the beginners and researchers in understanding the scheduling mechanisms used in Big Data.

2018 ◽  
Vol 11 (1) ◽  
pp. 90
Author(s):  
Sara Alomari ◽  
Mona Alghamdi ◽  
Fahd S. Alotaibi

The auditing services of the outsourced data, especially big data, have been an active research area recently. Many schemes of remotely data auditing (RDA) have been proposed. Both categories of RDA, which are Provable Data Possession (PDP) and Proof of Retrievability (PoR), mostly represent the core schemes for most researchers to derive new schemes that support additional capabilities such as batch and dynamic auditing. In this paper, we choose the most popular PDP schemes to be investigated due to the existence of many PDP techniques which are further improved to achieve efficient integrity verification. We firstly review the work of literature to form the required knowledge about the auditing services and related schemes. Secondly, we specify a methodology to be adhered to attain the research goals. Then, we define each selected PDP scheme and the auditing properties to be used to compare between the chosen schemes. Therefore, we decide, if possible, which scheme is optimal in handling big data auditing.


2018 ◽  
Vol 60 (5-6) ◽  
pp. 321-326 ◽  
Author(s):  
Christoph Boden ◽  
Tilmann Rabl ◽  
Volker Markl

Abstract The last decade has been characterized by the collection and availability of unprecedented amounts of data due to rapidly decreasing storage costs and the omnipresence of sensors and data-producing global online-services. In order to process and analyze this data deluge, novel distributed data processing systems resting on the paradigm of data flow such as Apache Hadoop, Apache Spark, or Apache Flink were built and have been scaled to tens of thousands of machines. However, writing efficient implementations of data analysis programs on these systems requires a deep understanding of systems programming, prohibiting large groups of data scientists and analysts from efficiently using this technology. In this article, we present some of the main achievements of the research carried out by the Berlin Big Data Cente (BBDC). We introduce the two domain-specific languages Emma and LARA, which are deeply embedded in Scala and enable declarative specification and the automatic parallelization of data analysis programs, the PEEL Framework for transparent and reproducible benchmark experiments of distributed data processing systems, approaches to foster the interpretability of machine learning models and finally provide an overview of the challenges to be addressed in the second phase of the BBDC.


2017 ◽  
Vol 28 (06) ◽  
pp. 661-682
Author(s):  
Rashed Mazumder ◽  
Atsuko Miyaji ◽  
Chunhua Su

Security, privacy and data integrity are the critical issues in Big Data application of IoT-enable environment and cloud-based services. There are many upcoming challenges to establish secure computations for Big Data applications. Authenticated encryption (AE) plays one of the core roles for Big Data’s confidentiality, integrity, and real-time security. However, many proposals exist in the research area of authenticated encryption. Generally, there are two concepts of nonce respect and nonce reuse under the security notion of the AE. However, recent studies show that nonce reuse needs to sacrifice security bound of the AE. In this paper, we consider nonce respect scheme and probabilistic encryption scheme which are more efficient and suitable for big data applications. Both schemes are based on keyed function. Our first scheme (FS) operates in parallel mode whose security is based on nonce respect and supports associated data. Furthermore, it needs less call of functions/block-cipher. On the contrary, our second scheme is based on probabilistic encryption. It is expected to be a light solution because of weaker security model construction. Moreover, both schemes satisfy reasonable privacy security bound.


Author(s):  
Uttama Garg

The amount of data in today’s world is increasing exponentially. Effectively analyzing Big Data is a very complex task. The MapReduce programming model created by Google in 2004 revolutionized the big-data comput-ing market. Nowadays the model is being used by many for scientific and research analysis as well as for commercial purposes. The MapReduce model however is quite a low-level progamming model and has many limitations. Active research is being undertaken to make models that overcome/remove these limitations. In this paper we have studied some popular data analytic models that redress some of the limitations of MapReduce; namely ASTERIX and Pregel (Giraph) We discuss these models briefly and through the discussion highlight how these models are able to overcome MapReduce’s limitations.


Author(s):  
Rajni Aron ◽  
Deepak Kumar Aggarwal

Cloud Computing has become a buzzword in the IT industry. Cloud Computing which provides inexpensive computing resources on the pay-as-you-go basis is promptly gaining momentum as a substitute for traditional Information Technology (IT) based organizations. Therefore, the increased utilization of Clouds makes an execution of Big Data processing jobs a vital research area. As more and more users have started to store/process their real-time data in Cloud environments, Resource Provisioning and Scheduling of Big Data processing jobs becomes a key element of consideration for efficient execution of Big Data applications. This chapter discusses the fundamental concepts supporting Cloud Computing & Big Data terms and the relationship between them. This chapter will help researchers find the important characteristics of Cloud Resource Management Systems to handle Big Data processing jobs and will also help to select the most suitable technique for processing Big Data jobs in Cloud Computing environment.


Author(s):  
Richard Earl

Topology remains a large, active research area in mathematics. Unsurprisingly its character has changed over the last century—there is considerably less current interest in general topology, but whole new areas have emerged, such as topological data analysis to help analyze big data sets. The Epilogue concludes that the interfaces of topology with other areas have remained rich and numerous, and it can be hard telling where topology stops and geometry or algebra or analysis or physics begin. Often that richness comes from studying structures that have interconnected flavours of algebra, geometry, and topology, but sometimes a result, seemingly of an entirely algebraic nature say, can be proved by purely topological means.


2020 ◽  
Vol 9 (1) ◽  
pp. 1151-1155

In industry and research area big data applications are consuming most of the spaces. Among some examples of big data, the video streams from CCTV cameras as equal importance with other sources like medical data, social media data. Based on the security purpose CCTV cameras are implemented in all places where security having much importance. Security can be defined in different ways like theft identification, violence detection etc. In most of the highly secured areas security plays a major role in a real time environment. This paper discusses the detecting and recognising the facial features of the persons using deep learning concepts. This paper includes deep learning concepts starts from object detection, action detection and identification. The issues recognized in existing methods are identified and summarized.


Big Data ◽  
2016 ◽  
pp. 1110-1128
Author(s):  
Ruben C. Huacarpuma ◽  
Daniel da C. Rodrigues ◽  
Antonio M. Rubio Serrano ◽  
João Paulo C. Lustosa da Costa ◽  
Rafael T. de Sousa Júnior ◽  
...  

The Brazilian Ministry of Planning, Budget, and Management (MP) manages enormous amounts of data that is generated on a daily basis. Processing all of this data more efficiently can reduce operating costs, thereby making better use of public resources. In this chapter, the authors construct a Big Data framework to deal with data loading and querying problems in distributed data processing. They evaluate the proposed Big Data processes by comparing them with the current centralized process used by MP in its Integrated System for Human Resources Management (in Portuguese: Sistema Integrado de Administração de Pessoal – SIAPE). This study focuses primarily on a NoSQL solution using HBase and Cassandra, which is compared to the relational PostgreSQL implementation used as a baseline. The inclusion of Big Data technologies in the proposed solution noticeably increases the performance of loading and querying time.


Sign in / Sign up

Export Citation Format

Share Document