scholarly journals A Survey on Some Big Data Applications Tools and Technologies

Author(s):  
Nazia Tazeen ◽  
Sandhya Rani K.

Big Data is a broad area that deals with enormous chunks of data sets. It is a word for enormous data sets having huge volume, more diverse structures of data originating from diverse sources are growing rapidly. Many data being generated because of fast data transmission between devices concerning different sectors like healthcare, science, media, business, entertainment and engineering. Data collection capacity and its storage is big concern. Apache Hadoop software is a store of accessible source programs to store big data and perform analytics and various other operations related to big data. Many organizations base their decisions by extracting knowledge from huge and complex data, because of this prime cause of decision making, Big Data has to be accurately classified and analyzed. In order to overcome the complex challenges encountered by Big Data, various Big Data tools and technologies have developed. Big Data Applications, tools and technologies used to handle it are briefly discussed in this paper.

Author(s):  
Aakriti Shukla ◽  
◽  
Dr Damodar Prasad Tiwari ◽  

Dimension reduction or feature selection is thought to be the backbone of big data applications in order to improve performance. Many scholars have shifted their attention in recent years to data science and analysis for real-time applications using big data integration. It takes a long time for humans to interact with big data. As a result, while handling high workload in a distributed system, it is necessary to make feature selection elastic and scalable. In this study, a survey of alternative optimizing techniques for feature selection are presented, as well as an analytical result analysis of their limits. This study contributes to the development of a method for improving the efficiency of feature selection in big complicated data sets.


Author(s):  
Abou_el_ela Abdou Hussein

Day by day advanced web technologies have led to tremendous growth amount of daily data generated volumes. This mountain of huge and spread data sets leads to phenomenon that called big data which is a collection of massive, heterogeneous, unstructured, enormous and complex data sets. Big Data life cycle could be represented as, Collecting (capture), storing, distribute, manipulating, interpreting, analyzing, investigate and visualizing big data. Traditional techniques as Relational Database Management System (RDBMS) couldn’t handle big data because it has its own limitations, so Advancement in computing architecture is required to handle both the data storage requisites and the weighty processing needed to analyze huge volumes and variety of data economically. There are many technologies manipulating a big data, one of them is hadoop. Hadoop could be understand as an open source spread data processing that is one of the prominent and well known solutions to overcome handling big data problem. Apache Hadoop was based on Google File System and Map Reduce programming paradigm. Through this paper we dived to search for all big data characteristics starting from first three V's that have been extended during time through researches to be more than fifty six V's and making comparisons between researchers to reach to best representation and the precise clarification of all big data V’s characteristics. We highlight the challenges that face big data processing and how to overcome these challenges using Hadoop and its use in processing big data sets as a solution for resolving various problems in a distributed cloud based environment. This paper mainly focuses on different components of hadoop like Hive, Pig, and Hbase, etc. Also we institutes absolute description of Hadoop Pros and cons and improvements to face hadoop problems by choosing proposed Cost-efficient Scheduler Algorithm for heterogeneous Hadoop system.


Web Services ◽  
2019 ◽  
pp. 1430-1443
Author(s):  
Louise Leenen ◽  
Thomas Meyer

The Governments, military forces and other organisations responsible for cybersecurity deal with vast amounts of data that has to be understood in order to lead to intelligent decision making. Due to the vast amounts of information pertinent to cybersecurity, automation is required for processing and decision making, specifically to present advance warning of possible threats. The ability to detect patterns in vast data sets, and being able to understanding the significance of detected patterns are essential in the cyber defence domain. Big data technologies supported by semantic technologies can improve cybersecurity, and thus cyber defence by providing support for the processing and understanding of the huge amounts of information in the cyber environment. The term big data analytics refers to advanced analytic techniques such as machine learning, predictive analysis, and other intelligent processing techniques applied to large data sets that contain different data types. The purpose is to detect patterns, correlations, trends and other useful information. Semantic technologies is a knowledge representation paradigm where the meaning of data is encoded separately from the data itself. The use of semantic technologies such as logic-based systems to support decision making is becoming increasingly popular. However, most automated systems are currently based on syntactic rules. These rules are generally not sophisticated enough to deal with the complexity of decisions required to be made. The incorporation of semantic information allows for increased understanding and sophistication in cyber defence systems. This paper argues that both big data analytics and semantic technologies are necessary to provide counter measures against cyber threats. An overview of the use of semantic technologies and big data technologies in cyber defence is provided, and important areas for future research in the combined domains are discussed.


2022 ◽  
pp. 67-76
Author(s):  
Dineshkumar Bhagwandas Vaghela

The term big data has come due to rapid generation of data in various organizations. In big data, the big is the buzzword. Here the data are so large and complex that the traditional database applications are not able to process (i.e., they are inadequate to deal with such volume of data). Usually the big data are described by 5Vs (volume, velocity, variety, variability, veracity). The big data can be structured, semi-structured, or unstructured. Big data analytics is the process to uncover hidden patterns, unknown correlations, predict the future values from large and complex data sets. In this chapter, the following topics will be covered more in detail. History of big data and business analytics, big data analytics technologies and tools, and big data analytics uses and challenges.


2018 ◽  
Vol 43 (4) ◽  
pp. 179-190
Author(s):  
Pritha Guha

Executive Summary Very large or complex data sets, which are difficult to process or analyse using traditional data handling techniques, are usually referred to as big data. The idea of big data is characterized by the three ‘v’s which are volume, velocity, and variety ( Liu, McGree, Ge, & Xie, 2015 ) referring respectively to the volume of data, the velocity at which the data are processed and the wide varieties in which big data are available. Every single day, different sectors such as credit risk management, healthcare, media, retail, retail banking, climate prediction, DNA analysis and, sports generate petabytes of data (1 petabyte = 250 bytes). Even basic handling of big data, therefore, poses significant challenges, one of them being organizing the data in such a way that it can give better insights into analysing and decision-making. With the explosion of data in our life, it has become very important to use statistical tools to analyse them.


2021 ◽  
Vol 8 ◽  
Author(s):  
Steefan Contractor ◽  
Moninya Roughan

Ocean data timeseries are vital for a diverse range of stakeholders (ranging from government, to industry, to academia) to underpin research, support decision making, and identify environmental change. However, continuous monitoring and observation of ocean variables is difficult and expensive. Moreover, since oceans are vast, observations are typically sparse in spatial and temporal resolution. In addition, the hostile ocean environment creates challenges for collecting and maintaining data sets, such as instrument malfunctions and servicing, often resulting in temporal gaps of varying lengths. Neural networks (NN) have proven effective in many diverse big data applications, but few oceanographic applications have been tested using modern frameworks and architectures. Therefore, here we demonstrate a “proof of concept” neural network application using a popular “off-the-shelf” framework called “TensorFlow” to predict subsurface ocean variables including dissolved oxygen and nutrient (nitrate, phosphate, and silicate) concentrations, and temperature timeseries and show how these models can be used successfully for gap filling data products. We achieved a final prediction accuracy of over 96% for oxygen and temperature, and mean squared errors (MSE) of 2.63, 0.0099, and 0.78, for nitrates, phosphates, and silicates, respectively. The temperature gap-filling was done with an innovative contextual Long Short-Term Memory (LSTM) NN that uses data before and after the gap as separate feature variables. We also demonstrate the application of a novel dropout based approach to approximate the Bayesian uncertainty of these temperature predictions. This Bayesian uncertainty is represented in the form of 100 monte carlo dropout estimates of the two longest gaps in the temperature timeseries from a model with 25% dropout in the input and recurrent LSTM connections. Throughout the study, we present the NN training process including the tuning of the large number of NN hyperparameters which could pose as a barrier to uptake among researchers and other oceanographic data users. Our models can be scaled up and applied operationally to provide consistent, gap-free data to all data users, thus encouraging data uptake for data-based decision making.


Author(s):  
Mr. Prakash Ukhalkar ◽  
◽  
Dr. Santosh Parakh ◽  
Dr. Rajesh Phursule ◽  
Mrs. Leena Sanu ◽  
...  

The business success is critical for any organization irrespective of its size. Understanding business value and advanced technology capabilities in Business Intelligence (BI) solutions for analysis and decision making is the need of every management, researchers and analysts. Data is available at every corner, at every platform and it is being produced all the way using digital equipment. These complex data sets for analysis and decision-making can be used on various platforms and are more accessible than ever to find insights and explanations. There is an abundance of datasets and data analysts, but organizations are still unable to leverage its potential to the fullest and this perhaps is the failure of most data analytics initiatives. This paper covers the need for Augmented Analytics adoption in companies to maximize business value with respect to added advanced analytics capabilities in BI Tools for effective, accurate and timely decision making, and business analysis.


2013 ◽  
Vol 1 (1) ◽  
pp. 19-25 ◽  
Author(s):  
Abdelkader Baaziz ◽  
Luc Quoniam

“Big Data is the oil of the new economy” is the most famous citation during the three last years. It has even been adopted by the World Economic Forum in 2011. In fact, Big Data is like crude! It’s valuable, but if unrefined it cannot be used. It must be broken down, analyzed for it to have value. But what about Big Data generated by the Petroleum Industry and particularly its upstream segment? Upstream is no stranger to Big Data. Understanding and leveraging data in the upstream segment enables firms to remain competitive throughout planning, exploration, delineation, and field development.Oil Gas Companies conduct advanced geophysics modeling and simulation to support operations where 2D, 3D 4D Seismic generate significant data during exploration phases. They closely monitor the performance of their operational assets. To do this, they use tens of thousands of data-collecting sensors in subsurface wells and surface facilities to provide continuous and real-time monitoring of assets and environmental conditions. Unfortunately, this information comes in various and increasingly complex forms, making it a challenge to collect, interpret, and leverage the disparate data. As an example, Chevron’s internal IT traffic alone exceeds 1.5 terabytes a day.Big Data technologies integrate common and disparate data sets to deliver the right information at the appropriate time to the correct decision-maker. These capabilities help firms act on large volumes of data, transforming decision-making from reactive to proactive and optimizing all phases of exploration, development and production. Furthermore, Big Data offers multiple opportunities to ensure safer, more responsible operations. Another invaluable effect of that would be shared learning.The aim of this paper is to explain how to use Big Data technologies to optimize operations. How can Big Data help experts to decision-making leading the desired outcomes?Keywords:Big Data; Analytics; Upstream Petroleum Industry; Knowledge Management; KM; Business Intelligence; BI; Innovation; Decision-making under Uncertainty


2017 ◽  
Vol 31 (3) ◽  
pp. 45-61 ◽  
Author(s):  
Uday S. Murthy ◽  
Guido L. Geerts

ABSTRACT The term “Big Data” refers to massive volumes of data that grow at an increasing rate and encompass complex data types such as audio and video. While the applications of Big Data and analytic techniques for business purposes have received considerable attention, it is less clear how external sources of Big Data relate to the transaction processing-oriented world of accounting information systems. This paper uses the Resource-Event-Agent Enterprise Ontology (REA) (McCarthy 1982; International Standards Organization [ISO] 2007) to model the implications of external Big Data sources on business transactions. The five-phase REA-based specification of a business transaction as defined in ISO (2007) is used to formally define associations between specific Big Data elements and business transactions. Using Big Data technologies such as Apache Hadoop and MapReduce, a number of information extraction patterns are specified for extracting business transaction-related information from Big Data. We also present a number of analytics patterns to demonstrate how decision making in accounting can benefit from integrating specific external Big Data sources and conventional transactional data. The model and techniques presented in this paper can be used by organizations to formalize the associations between external Big Data elements in their environment and their accounting information artifacts, to build architectures that extract information from external Big Data sources for use in an accounting context, and to leverage the power of analytics for more effective decision making.


Sign in / Sign up

Export Citation Format

Share Document