4. Big data analytics

Author(s):  
Dawn E. Holmes

‘Big data analytics’ argues that big data is only useful if we can extract useful information from it. It looks at some of the techniques used to discover useful information from big data, such as customer preferences or how fast an epidemic is spreading. Big data analytics is changing rapidly as the size of the datasets increases and classical statistics makes room for this new paradigm. An example of big data analytics is the algorithmic method called MapReduce, a distributed data processing system that forms part of the core functionality of the Hadoop Ecosystem. Amazon, Google, Facebook, and many others use Hadoop to store and process their data.

2018 ◽  
Vol 15 (3) ◽  
Author(s):  
Blagoj Ristevski ◽  
Ming Chen

Abstract This paper surveys big data with highlighting the big data analytics in medicine and healthcare. Big data characteristics: value, volume, velocity, variety, veracity and variability are described. Big data analytics in medicine and healthcare covers integration and analysis of large amount of complex heterogeneous data such as various – omics data (genomics, epigenomics, transcriptomics, proteomics, metabolomics, interactomics, pharmacogenomics, diseasomics), biomedical data and electronic health records data. We underline the challenging issues about big data privacy and security. Regarding big data characteristics, some directions of using suitable and promising open-source distributed data processing software platform are given.


2016 ◽  
Vol 58 (4) ◽  
Author(s):  
Wolfram Wingerath ◽  
Felix Gessert ◽  
Steffen Friedrich ◽  
Norbert Ritter

AbstractWith the rise of the web 2.0 and the Internet of things, it has become feasible to track all kinds of information over time, in particular fine-grained user activities and sensor data on their environment and even their biometrics. However, while efficiency remains mandatory for any application trying to cope with huge amounts of data, only part of the potential of today's Big Data repositories can be exploited using traditional batch-oriented approaches as the value of data often decays quickly and high latency becomes unacceptable in some applications. In the last couple of years, several distributed data processing systems have emerged that deviate from the batch-oriented approach and tackle data items as they arrive, thus acknowledging the growing importance of timeliness and velocity in Big Data analytics.In this article, we give an overview over the state of the art of stream processors for low-latency Big Data analytics and conduct a qualitative comparison of the most popular contenders, namely Storm and its abstraction layer Trident, Samza and Spark Streaming. We describe their respective underlying rationales, the guarantees they provide and discuss the trade-offs that come with selecting one of them for a particular task.


2017 ◽  
pp. 83-99
Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


2017 ◽  
Vol 2017 ◽  
pp. 1-16 ◽  
Author(s):  
Dillon Chrimes ◽  
Hamid Zamani

Big data analytics (BDA) is important to reduce healthcare costs. However, there are many challenges of data aggregation, maintenance, integration, translation, analysis, and security/privacy. The study objective to establish an interactive BDA platform with simulated patient data using open-source software technologies was achieved by construction of a platform framework with Hadoop Distributed File System (HDFS) using HBase (key-value NoSQL database). Distributed data structures were generated from benchmarked hospital-specific metadata of nine billion patient records. At optimized iteration, HDFS ingestion of HFiles to HBase store files revealed sustained availability over hundreds of iterations; however, to complete MapReduce to HBase required a week (for 10 TB) and a month for three billion (30 TB) indexed patient records, respectively. Found inconsistencies of MapReduce limited the capacity to generate and replicate data efficiently. Apache Spark and Drill showed high performance with high usability for technical support but poor usability for clinical services. Hospital system based on patient-centric data was challenging in using HBase, whereby not all data profiles were fully integrated with the complex patient-to-hospital relationships. However, we recommend using HBase to achieve secured patient data while querying entire hospital volumes in a simplified clinical event model across clinical services.


2017 ◽  
Vol 7 (1) ◽  
pp. 183-195
Author(s):  
Sasikala V

Big data analytics is the process of examining large data sets to uncover hidden patterns,unknown correlations, market trends, customer preferences and other useful business information. The analytical findings can lead to more effective marketing, new revenue opportunities, better customer service, improved operational efficiency, competitive advantages over rival organizations and other business benefits.


Author(s):  
M. Baby Nirmala

In this emerging era of analytics 3.0, where big data is the heart of talk in all sectors, achieving and extracting the full potential from this vast data is accomplished by many vendors through their new generation analytical processing systems. This chapter deals with a brief introduction of the categories of analytical processing system, followed by some prominent analytical platforms, appliances, frameworks, engines, fabrics, solutions, tools, and products of the big data vendors. Finally, it deals with big data analytics in the network, its security, WAN optimization tools, and techniques for cloud-based big data analytics.


Author(s):  
Sivamathi Chokkalingam ◽  
Vijayarani S.

The term Big Data refers to large-scale information management and analysis technologies that exceed the capability of traditional data processing technologies. Big Data is differentiated from traditional technologies in three ways: volume, velocity and variety of data. Big data analytics is the process of analyzing large data sets which contains a variety of data types to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information. Since Big Data is new emerging field, there is a need for development of new technologies and algorithms for handling big data. The main objective of this paper is to provide knowledge about various research challenges of Big Data analytics. A brief overview of various types of Big Data analytics is discussed in this paper. For each analytics, the paper describes process steps and tools. A banking application is given for each analytics. Some of research challenges and possible solutions for those challenges of big data analytics are also discussed.


Big Data ◽  
2016 ◽  
pp. 1859-1894
Author(s):  
Pethuru Raj

This chapter is mainly crafted in order to give a business-centric view of big data analytics. The readers can find the major application domains / use cases of big data analytics and the compelling needs and reasons for wholeheartedly embracing this new paradigm. The emerging use cases include the use of real-time data such as the sensor data to detect any abnormalities in plant and machinery and batch processing of sensor data collected over a period to conduct failure analysis of plant and machinery. The author describes the short-term as well as the long-term benefits and find and nullify all kinds of doubts and misgivings on this new idea, which has been pervading and penetrating into every tangible domain. The ultimate goal is to demystify this cutting-edge technology so that its acceptance and adoption levels go up significantly in the days to unfold.


Sign in / Sign up

Export Citation Format

Share Document