Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.

Download Full-text

Centralized to Decentralized Social Networks

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch003 ◽

2016 ◽

pp. 37-54

Author(s):

Maryam Qamar ◽

Mehwish Malik ◽

Saadia Batool ◽

Sidra Mehmood ◽

Asad W. Malik ◽

...

Keyword(s):

Social Networks ◽

Traffic Flow ◽

Network Traffic ◽

Online Social Networks ◽

Research Work ◽

Central Authority ◽

Potential Threat ◽

Privacy Breach ◽

Natural Solution

This work covers the research work on decentralization of Online Social Networks (OSNs), issues with centralized design are studied with possible decentralized solutions. Centralized architecture is prone to privacy breach, p2p architecture for data and thus authority decentralization with encryption seems a possible solution. OSNs' users grow exponentially causing scalability issue, a natural solution is decentralization where users bring resources with them via personal machines or paid services. Also centralized services are not available unremittingly, to this end decentralization proposes replication. Decentralized solutions are also proposed for reliability issues arising in centralized systems and the potential threat of a central authority. Yet key to all problems isn't found, metadata may be enough for inferences about data and network traffic flow can lead to information on users' relationships. First issue can be mitigated by data padding or splitting in uniform blocks. Caching, dummy traffic or routing through a mix of nodes can be some possible solutions to the second.

Download Full-text

Texture-Based Evolutionary Method for Cancer Classification in Histopathology

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch004 ◽

2016 ◽

pp. 55-69

Author(s):

Kiran Fatima ◽

Hammad Majeed

Keyword(s):

Classification Accuracy ◽

Malignant Tumors ◽

Svm Classifier ◽

Breast Cancer Dataset ◽

Cancer Dataset ◽

Tissue Samples ◽

Diagnostic And Prognostic Value

Real-world histology tissue textures owing to non-homogeneous nature and unorganized spatial intensity variations are complex to analyze and classify. The major challenge in solving pathological problems is inherent complexity due to high intra-class variability and low inter-class variation in texture of histology samples. The development of computational methods to assists pathologists in characterization of these tissue samples would have great diagnostic and prognostic value. In this chapter, an optimized texture-based evolutionary framework is proposed to provide assistance to pathologists for classification of benign and pre-malignant tumors. The proposed framework investigates the imperative role of RGB color channels for discrimination of cancer grades or subtypes, explores higher-order statistical features at image-level, and implements an evolution-based optimization scheme for feature selection and classification. The highest classification accuracy of 99.06% is achieved on meningioma dataset and 90% on breast cancer dataset through Quadratic SVM classifier.

Download Full-text

On the Dynamic Shifting of the MapReduce Timeout

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch001 ◽

2016 ◽

pp. 1-22

Author(s):

Bunjamin Memishi ◽

Shadi Ibrahim ◽

Maria S. Perez ◽

Gabriel Antoniu

Keyword(s):

Big Data ◽

Data Processing ◽

Response Time ◽

Case Studies ◽

Large Scale ◽

Failure Detection ◽

Mapreduce Framework ◽

Big Data Applications ◽

Design Ideas

MapReduce has become a relevant framework for Big Data processing in the cloud. At large-scale clouds, failures do occur and may incur unwanted performance degradation to Big Data applications. As the reliability of MapReduce depends on how well they detect and handle failures, this book chapter investigates the problem of failure detection in the MapReduce framework. The case studies of this contribution reveal that the current static timeout value is not adequate and demonstrate significant variations in the application's response time with different timeout values. While arguing that comparatively little attention has been devoted to the failure detection in the framework, the chapter presents design ideas for a new adaptive timeout.

Download Full-text

The Heterogeneity Paradigm in Big Data Architectures

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch015 ◽

2016 ◽

pp. 218-245 ◽

Cited By ~ 1

Author(s):

Todor Ivanov ◽

Sead Izberovic ◽

Nikolaos Korfiatis

Keyword(s):

Big Data ◽

Extended Model ◽

Data Systems ◽

Cost Factor ◽

Management Platform ◽

Architectural Patterns ◽

Hadoop Ecosystem ◽

Big Data Systems ◽

The Cost

This chapter introduces the concept of heterogeneity as a perspective in the architecture of big data systems targeted to both vertical and generic workloads and discusses how this can be linked with the existing Hadoop ecosystem (as of 2015). The case of the cost factor of a big data solution and its characteristics can influence its architectural patterns and capabilities and as such an extended model based on the 3V paradigm is introduced (Extended 3V). This is examined on a hierarchical set of four layers (Hardware, Management, Platform and Application). A list of components is provided on each layer as well as a classification of their role in a big data solution.

Download Full-text

Mobile Cloud Computing Future Trends and Opportunities

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch007 ◽

2016 ◽

pp. 105-120 ◽

Cited By ~ 2

Author(s):

Sajid Umair ◽

Umair Muneer ◽

Muhammad Nauman Zahoor ◽

Asad W. Malik

Keyword(s):

Cloud Computing ◽

Mobile Cloud Computing ◽

Smart Phones ◽

Mobile Cloud ◽

Future Trends ◽

Processing Power ◽

Processing Ability ◽

New Models ◽

Work Done ◽

Day By Day

Due to wide variety of smart phones and capability of supporting heavy applications their demand is increasing day by day. Increase of computation capability and processing power Mobile cloud computing (MCC) becomes an emerging field. After cloud computing mobile cloud provide significant advantage and usage with reliability and portability. Challenges involved in mobile cloud computing are energy consumption, computation power and processing ability. Mobile cloud provides a way to use cloud resources on mobile but traditional models of smart phones does not support cloud so researchers introduce new models for the development of MCC. There are certain phases that still need improvement and this field attracts many researchers. Purpose of this chapter is to analyze and summarize the challenges involved in this field and work done so far.

Download Full-text

Ceaseless Virtual Appliance Streaming

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch005 ◽

2016 ◽

pp. 70-84

Author(s):

Shahid Nawaz ◽

Asad Waqar Malik ◽

Raihan ur Rasool

Keyword(s):

Cloud Computing ◽

Virtual Machines ◽

Video On Demand ◽

Resource Base ◽

Storage Area Networks ◽

Network Utilization ◽

Virtual Appliance ◽

Storage Area ◽

Network Interfaces ◽

Server Clusters

Cloud computing is modus operandi of manipulating server clusters hosted at secluded sites on Internet for storage, processing, and retrieval of data. It tenders suppleness, disaster recovery, competitiveness, and cutback in capital and operational cost for ventures, principally small and medium ones, which hold meager resource base. Virtualization at plinth of cloud computing sanctions utilizing physical hardware stratum to frame and administer virtualized infrastructure, storage areas, and network interfaces. Virtual machines, administered on clouds to seize inherent advantages of virtualization, are fabricated on storage area networks (Armbrust et al., 2009). But whenever user endeavors to access them from remote location it resulted in hundreds of megabytes of data reads and ensuing congestion in network. Question is how to instigate virtual machines and load their applications in minimal time. The ingenious Ceaseless Virtual Appliance Streaming system assures virtual machine's streaming just like video on demand. It trims down burden over existing resources and offers improved network utilization.

Download Full-text

Exploiting Semantics to Improve Classification of Text Corpus

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch002 ◽

2016 ◽

pp. 23-36

Author(s):

Hammad Majeed ◽

Firoza Erum

Keyword(s):

Semantic Similarity ◽

Similarity Measure ◽

High Accuracy ◽

Web Pages ◽

Relevant Feature ◽

Semantic Similarity Measure ◽

Text Corpus ◽

Processing Techniques ◽

The Web

Internet is growing fast with millions of web pages containing information on every topic. The data placed on Internet is not organized which makes the search process difficult. Classification of the web pages in some predefined classes can improve the organization of this data. In this chapter a semantic based technique is presented to classify text corpus with high accuracy. This technique uses some well-known pre-processing techniques like word stemming, term frequency, and degree of uniqueness. In addition to this a new semantic similarity measure is computed between different terms. The authors believe that semantic similarity based comparison in addition to syntactic matching makes the classification process significantly accurate. The proposed technique is tested on a benchmark dataset and results are compared with already published results. The obtained results are significantly better and that too by using quite small sized highly relevant feature set.

Download Full-text

Resource Scheduling for Big Data on Cloud

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch013 ◽

2016 ◽

pp. 185-205 ◽

Cited By ~ 1

Author(s):

K. Indira Suthakar ◽

M. K. Kavitha Devi

Keyword(s):

Cloud Computing ◽

Big Data ◽

Distributed Computing ◽

Grid Computing ◽

Task Scheduling ◽

Service Providers ◽

Maximum Profit ◽

Cloud Computing Service ◽

Computing Grid ◽

Scheduling Strategies

Cloud computing is based on the concepts of distributed computing, grid computing, utility computing and virtualization. It is a virtual pool of resources which are provided to users via Internet. It gives users virtually unlimited pay-per-use computing resources without the burden of managing the underlying infrastructure. Cloud computing service providers' one of the goals is to use the resources efficiently and gain maximum profit. This leads to task scheduling as a core and challenging issue in cloud computing. This paper gives different scheduling strategies and algorithms in cloud computing.

Download Full-text

Need of Hadoop and Map Reduce for Processing and Managing Big Data

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing ◽

10.4018/978-1-4666-9767-6.ch009 ◽

2016 ◽

pp. 132-144 ◽

Cited By ~ 1

Author(s):

Manjunath Thimmasandra Narayanapppa ◽

A. Channabasamma ◽

Ravindra S. Hegadi

Keyword(s):

Big Data ◽

Database Systems ◽

Unstructured Data ◽

Data Sets ◽

Exponential Rate ◽

Efficient Manner ◽

Business Decision ◽

Time Limits ◽

The World ◽

Day By Day

The amount of data around us in three sixty degrees getting increased second on second and the world is exploding as a result the size of the database used in today's enterprises, which is growing at an exponential rate day by day. At the same time, the need to process and analyze the bulky data for business decision making has also increased. Several business and scientific applications generate terabytes of data which have to be processed in efficient manner on daily bases. Data gets collected and stored at unprecedented rates. Moreover the challenge here is not only to store and manage the huge amount of data, but even to analyze and extract meaningful values from it. This has contributed to the problem of big data faced by the industry due to the inability of usual software tools and database systems to manage and process the big data sets within reasonable time limits. The main focus of the chapter is on unstructured data analysis.

Download Full-text

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Essentiality of Machine Learning Algorithms for Big Data Computation

Centralized to Decentralized Social Networks

Texture-Based Evolutionary Method for Cancer Classification in Histopathology

On the Dynamic Shifting of the MapReduce Timeout

The Heterogeneity Paradigm in Big Data Architectures

Mobile Cloud Computing Future Trends and Opportunities

Ceaseless Virtual Appliance Streaming

Exploiting Semantics to Improve Classification of Text Corpus

Resource Scheduling for Big Data on Cloud

Need of Hadoop and Map Reduce for Processing and Managing Big Data

Export Citation Format

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud ComputingLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Essentiality of Machine Learning Algorithms for Big Data Computation

Centralized to Decentralized Social Networks

Texture-Based Evolutionary Method for Cancer Classification in Histopathology

On the Dynamic Shifting of the MapReduce Timeout

The Heterogeneity Paradigm in Big Data Architectures

Mobile Cloud Computing Future Trends and Opportunities

Ceaseless Virtual Appliance Streaming

Exploiting Semantics to Improve Classification of Text Corpus

Resource Scheduling for Big Data on Cloud

Need of Hadoop and Map Reduce for Processing and Managing Big Data

Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing
Latest Publications