Advances in Data Mining and Database Management - Managing and Processing Big Data in Cloud Computing
Latest Publications


TOTAL DOCUMENTS

15
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By IGI Global

9781466697676, 9781466697683

Author(s):  
Manjunath Thimmasandra Narayanapppa ◽  
T. P. Puneeth Kumar ◽  
Ravindra S. Hegadi

Recent technological advancements have led to generation of huge volume of data from distinctive domains (scientific sensors, health care, user-generated data, finical companies and internet and supply chain systems) over the past decade. To capture the meaning of this emerging trend the term big data was coined. In addition to its huge volume, big data also exhibits several unique characteristics as compared with traditional data. For instance, big data is generally unstructured and require more real-time analysis. This development calls for new system platforms for data acquisition, storage, transmission and large-scale data processing mechanisms. In recent years analytics industries interest expanding towards the big data analytics to uncover potentials concealed in big data, such as hidden patterns or unknown correlations. The main goal of this chapter is to explore the importance of machine learning algorithms and computational environment including hardware and software that is required to perform analytics on big data.


Author(s):  
Maryam Qamar ◽  
Mehwish Malik ◽  
Saadia Batool ◽  
Sidra Mehmood ◽  
Asad W. Malik ◽  
...  

This work covers the research work on decentralization of Online Social Networks (OSNs), issues with centralized design are studied with possible decentralized solutions. Centralized architecture is prone to privacy breach, p2p architecture for data and thus authority decentralization with encryption seems a possible solution. OSNs' users grow exponentially causing scalability issue, a natural solution is decentralization where users bring resources with them via personal machines or paid services. Also centralized services are not available unremittingly, to this end decentralization proposes replication. Decentralized solutions are also proposed for reliability issues arising in centralized systems and the potential threat of a central authority. Yet key to all problems isn't found, metadata may be enough for inferences about data and network traffic flow can lead to information on users' relationships. First issue can be mitigated by data padding or splitting in uniform blocks. Caching, dummy traffic or routing through a mix of nodes can be some possible solutions to the second.


Author(s):  
Kiran Fatima ◽  
Hammad Majeed

Real-world histology tissue textures owing to non-homogeneous nature and unorganized spatial intensity variations are complex to analyze and classify. The major challenge in solving pathological problems is inherent complexity due to high intra-class variability and low inter-class variation in texture of histology samples. The development of computational methods to assists pathologists in characterization of these tissue samples would have great diagnostic and prognostic value. In this chapter, an optimized texture-based evolutionary framework is proposed to provide assistance to pathologists for classification of benign and pre-malignant tumors. The proposed framework investigates the imperative role of RGB color channels for discrimination of cancer grades or subtypes, explores higher-order statistical features at image-level, and implements an evolution-based optimization scheme for feature selection and classification. The highest classification accuracy of 99.06% is achieved on meningioma dataset and 90% on breast cancer dataset through Quadratic SVM classifier.


Author(s):  
Bunjamin Memishi ◽  
Shadi Ibrahim ◽  
Maria S. Perez ◽  
Gabriel Antoniu

MapReduce has become a relevant framework for Big Data processing in the cloud. At large-scale clouds, failures do occur and may incur unwanted performance degradation to Big Data applications. As the reliability of MapReduce depends on how well they detect and handle failures, this book chapter investigates the problem of failure detection in the MapReduce framework. The case studies of this contribution reveal that the current static timeout value is not adequate and demonstrate significant variations in the application's response time with different timeout values. While arguing that comparatively little attention has been devoted to the failure detection in the framework, the chapter presents design ideas for a new adaptive timeout.


Author(s):  
Todor Ivanov ◽  
Sead Izberovic ◽  
Nikolaos Korfiatis

This chapter introduces the concept of heterogeneity as a perspective in the architecture of big data systems targeted to both vertical and generic workloads and discusses how this can be linked with the existing Hadoop ecosystem (as of 2015). The case of the cost factor of a big data solution and its characteristics can influence its architectural patterns and capabilities and as such an extended model based on the 3V paradigm is introduced (Extended 3V). This is examined on a hierarchical set of four layers (Hardware, Management, Platform and Application). A list of components is provided on each layer as well as a classification of their role in a big data solution.


Author(s):  
Sajid Umair ◽  
Umair Muneer ◽  
Muhammad Nauman Zahoor ◽  
Asad W. Malik

Due to wide variety of smart phones and capability of supporting heavy applications their demand is increasing day by day. Increase of computation capability and processing power Mobile cloud computing (MCC) becomes an emerging field. After cloud computing mobile cloud provide significant advantage and usage with reliability and portability. Challenges involved in mobile cloud computing are energy consumption, computation power and processing ability. Mobile cloud provides a way to use cloud resources on mobile but traditional models of smart phones does not support cloud so researchers introduce new models for the development of MCC. There are certain phases that still need improvement and this field attracts many researchers. Purpose of this chapter is to analyze and summarize the challenges involved in this field and work done so far.


Author(s):  
Shahid Nawaz ◽  
Asad Waqar Malik ◽  
Raihan ur Rasool

Cloud computing is modus operandi of manipulating server clusters hosted at secluded sites on Internet for storage, processing, and retrieval of data. It tenders suppleness, disaster recovery, competitiveness, and cutback in capital and operational cost for ventures, principally small and medium ones, which hold meager resource base. Virtualization at plinth of cloud computing sanctions utilizing physical hardware stratum to frame and administer virtualized infrastructure, storage areas, and network interfaces. Virtual machines, administered on clouds to seize inherent advantages of virtualization, are fabricated on storage area networks (Armbrust et al., 2009). But whenever user endeavors to access them from remote location it resulted in hundreds of megabytes of data reads and ensuing congestion in network. Question is how to instigate virtual machines and load their applications in minimal time. The ingenious Ceaseless Virtual Appliance Streaming system assures virtual machine's streaming just like video on demand. It trims down burden over existing resources and offers improved network utilization.


Author(s):  
Hammad Majeed ◽  
Firoza Erum

Internet is growing fast with millions of web pages containing information on every topic. The data placed on Internet is not organized which makes the search process difficult. Classification of the web pages in some predefined classes can improve the organization of this data. In this chapter a semantic based technique is presented to classify text corpus with high accuracy. This technique uses some well-known pre-processing techniques like word stemming, term frequency, and degree of uniqueness. In addition to this a new semantic similarity measure is computed between different terms. The authors believe that semantic similarity based comparison in addition to syntactic matching makes the classification process significantly accurate. The proposed technique is tested on a benchmark dataset and results are compared with already published results. The obtained results are significantly better and that too by using quite small sized highly relevant feature set.


Author(s):  
K. Indira Suthakar ◽  
M. K. Kavitha Devi

Cloud computing is based on the concepts of distributed computing, grid computing, utility computing and virtualization. It is a virtual pool of resources which are provided to users via Internet. It gives users virtually unlimited pay-per-use computing resources without the burden of managing the underlying infrastructure. Cloud computing service providers' one of the goals is to use the resources efficiently and gain maximum profit. This leads to task scheduling as a core and challenging issue in cloud computing. This paper gives different scheduling strategies and algorithms in cloud computing.


Author(s):  
Manjunath Thimmasandra Narayanapppa ◽  
A. Channabasamma ◽  
Ravindra S. Hegadi

The amount of data around us in three sixty degrees getting increased second on second and the world is exploding as a result the size of the database used in today's enterprises, which is growing at an exponential rate day by day. At the same time, the need to process and analyze the bulky data for business decision making has also increased. Several business and scientific applications generate terabytes of data which have to be processed in efficient manner on daily bases. Data gets collected and stored at unprecedented rates. Moreover the challenge here is not only to store and manage the huge amount of data, but even to analyze and extract meaningful values from it. This has contributed to the problem of big data faced by the industry due to the inability of usual software tools and database systems to manage and process the big data sets within reasonable time limits. The main focus of the chapter is on unstructured data analysis.


Sign in / Sign up

Export Citation Format

Share Document