Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics
Latest Publications


TOTAL DOCUMENTS

21
(FIVE YEARS 0)

H-INDEX

4
(FIVE YEARS 0)

Published By IGI Global

9781466658646, 9781466658653

Author(s):  
Haoliang Wang ◽  
Wei Liu ◽  
Tolga Soyata

The amount of data acquired, stored, and processed annually over the Internet has exceeded the processing capabilities of modern computer systems, including supercomputers with multiple-Petaflop processing power, giving rise to the term Big Data. Continuous research efforts to implement systems to cope with this insurmountable amount of data are underway. The authors introduce the ongoing research in three different facets: 1) in the Acquisition front, they introduce a concept that has come to the forefront in the past few years: Internet-of-Things (IoT), which will be one of the major sources for Big Data generation in the following decades. The authors provide a brief survey of IoT to understand the concept and the ongoing research in this field. 2) In the Cloud Storage and Processing front, they provide a survey of techniques to efficiently store the acquired Big Data in the cloud, index it, and get it ready for processing. While IoT relates primarily to sensor nodes and thin devices, the authors study this storage and processing aspect of Big Data within the framework of Cloud Computing. 3) In the Mobile Access front, they perform a survey of existing infrastructures to access the Big Data efficiently via mobile devices. This survey also includes intermediate devices, such as a Cloudlet, to accelerate the Big Data collection from IoT and access to Big Data for applications that require response times that are close to real-time.


Author(s):  
Ganesh Chandra Deka

The Analytics tools are capable of suggesting the most favourable future planning by analyzing “Why” and “How” blended with What, Who, Where, and When. Descriptive, Predictive, and Prescriptive analytics are the analytics currently in use. Clear understanding of these three analytics will enable an organization to chalk out the most suitable action plan taking various probable outcomes into account. Currently, corporate are flooded with structured, semi-structured, unstructured, and hybrid data. Hence, the existing Business Intelligence (BI) practices are not sufficient to harness potentials of this sea of data. This change in requirements has made the cloud-based “Analytics as a Service (AaaS)” the ultimate choice. In this chapter, the recent trends in Predictive, Prescriptive, Big Data analytics, and some AaaS solutions are discussed.


Author(s):  
Ravishankar Palaniappan

Data visualization has the potential to aid humanity not only in exploring and analyzing large volume datasets but also in identifying and predicting trends and anomalies/outliers in a “simple and consumable” approach. These are vital to good and timely decisions for business advantage. Data Visualization is an active research field, focusing on the different techniques and tools for qualitative exploration in conjunction with quantitative analysis of data. However, an increase in volume, multivariate, frequency, and interrelationships of data will make the data visualization process notoriously difficult. This necessitates “innovative and iterative” display techniques. Either overlooking any dimensions/relationships of data structure or choosing an unfitting visualization method will quickly lead to a humanitarian uninterpretable “junk chart,” which leads to incorrect inferences or conclusions. The purpose of this chapter is to introduce the different phases of data visualization and various techniques which help to connect and empower data to mine insights. It exemplifies on how “data visualization” helps to unravel the important, meaningful, and useful insights including trends and outliers from real world datasets, which might otherwise be unnoticed. The use case in this chapter uses both simulated and real-world datasets to illustrate the effectiveness of data visualization.


Author(s):  
Ganesh Chandra Deka

NoSQL databases are designed to meet the huge data storage requirements of cloud computing and big data processing. NoSQL databases have lots of advanced features in addition to the conventional RDBMS features. Hence, the “NoSQL” databases are popularly known as “Not only SQL” databases. A variety of NoSQL databases having different features to deal with exponentially growing data-intensive applications are available with open source and proprietary option. This chapter discusses some of the popular NoSQL databases and their features on the light of CAP theorem.


Author(s):  
Anupama C. Raman

Unstructured data is growing exponentially. Present day storage infrastructures like Storage Area Networks and Network Attached Storage are not very suitable for storing huge volumes of unstructured data. This has led to the development of new types of storage technologies like object-based storage. Huge amounts of both structured and unstructured data that needs to be made available in real time for analytical insights is referred to as Big Data. On account of the distinct nature of big data, the storage infrastructures for storing big data should possess some specific features. In this chapter, the authors examine the various storage technology options that are available nowadays and their suitability for storing big data. This chapter also provides a bird's eye view of cloud storage technology, which is used widely for big data storage.


Author(s):  
Pethuru Raj

The implications of the digitization process among a bevy of trends are definitely many and memorable. One is the abnormal growth in data generation, gathering, and storage due to a steady increase in the number of data sources, structures, scopes, sizes, and speeds. In this chapter, the author shows some of the impactful developments brewing in the IT space, how the tremendous amount of data getting produced and processed all over the world impacts the IT and business domains, how next-generation IT infrastructures are accordingly getting refactored, remedied, and readied for the impending big data-induced challenges, how likely the move of the big data analytics discipline towards fulfilling the digital universe requirements of extracting and extrapolating actionable insights for the knowledge-parched is, and finally, the establishment and sustenance of the dreamt smarter planet.


Author(s):  
Claudia Cava ◽  
Francesca Gallivanone ◽  
Christian Salvatore ◽  
Pasquale Anthony Della Rosa ◽  
Isabella Castiglioni

Bioinformatics traditionally deals with computational approaches to the analysis of big data from high-throughput technologies as genomics, proteomics, and sequencing. Bioinformatics analysis allows extraction of new information from big data that might help to better assess the biological details at a molecular and cellular level. The wide-scale and high-dimensionality of Bioinformatics data has led to an increasing need of high performance computing and repository. In this chapter, the authors demonstrate the advantages of cloud computing in Bioinformatics research for high-throughput technologies.


Author(s):  
M. Thilagavathi ◽  
Daphne Lopez ◽  
B. Senthil Murugan

With increased usage of IT solutions, a huge volume of data is generated from different sources like social networks, CRM, and healthcare applications, to name a few. The size of the data that is generated grows exponentially. As cloud computing provides an optimized, shared, and virtualized IT infrastructure, it is better to leverage the cloud services for storing and processing such Big Data. Securing the data is one of the major challenges in all the domains. Though security and privacy have been talked about for decades, there is still a growing need for high end methods for securing the rampant growth of data. The privacy of personal data, and to be more specific the health data, continues to be an important issue worldwide. Most of the health data in today’s IT world is being computerized. A patient’s health data may portray the different attributes such as his physical and mental health, its severity, financial status, and much more. Moreover, the medical data that are collected from the patients are being shared with other stakeholders of interest like doctors, insurance companies, pharmacies, researchers, and other health care providers. Individuals raise concern about the privacy of their health data in such a shared environment.


Author(s):  
Richard Millham

Data is an integral part of most business-critical applications. As business data increases in volume and in variety due to technological, business, and other factors, managing this diverse volume of data becomes more difficult. A new paradigm, data virtualization, is used for data management. Although a lot of research has been conducted on developing techniques to accurately store huge amounts of data and to process this data with optimal resource utilization, research remains on how to handle divergent data from multiple data sources. In this chapter, the authors first look at the emerging problem of “big data” with a brief introduction to the emergence of data virtualization and at an existing system that implements data virtualization. Because data virtualization requires techniques to integrate data, the authors look at the problems of divergent data in terms of value, syntax, semantic, and structural differences. Some proposed methods to help resolve these differences are examined in order to enable the mapping of this divergent data into a homogeneous global schema that can more easily be used for big data analysis. Finally, some tools and industrial examples are given in order to demonstrate different approaches of heterogeneous data integration.


Author(s):  
Siddesh G. M. ◽  
Srinidhi Hiriyannaiah ◽  
K. G. Srinivasa

The world of Internet has driven the computing world from a few gigabytes of information to terabytes, petabytes of information turning into a huge volume of information. These volumes of information come from a variety of sources that span over from structured to unstructured data formats. The information needs to update in a quick span of time and be available on demand with the cheaper infrastructures. The information or the data that spans over three Vs, namely Volume, Variety, and Velocity, is called Big Data. The challenge is to store and process this Big Data, running analytics on the stored Big Data, making critical decisions on the results of processing, and obtaining the best outcomes. In this chapter, the authors discuss the capabilities of Big Data, its uses, and processing of Big Data using Hadoop technologies and tools by Apache foundation.


Sign in / Sign up

Export Citation Format

Share Document