Handbook of Research on Trends and Future Directions in Big Data and Web Intelligence - Advances in Data Mining and Database Management
Latest Publications


TOTAL DOCUMENTS

20
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781466685055, 9781466685062

Author(s):  
Khawaja Tehseen Ahmed ◽  
Mazhar Ul-Haq ◽  
Arsalaan Ahmed Shaikh ◽  
Raihan ur Rasool

With the advancement of technology we are heading towards a paperless environment. But there are still a large numbers of documents that exist in paper format in our daily lives. Thus the need to digitize these paper documents, archive them and view them at all times has arisen. The number of documents of a small organization may be in thousands, millions or even more. This chapter presents comparative analysis of different programming languages and libraries where it is intended to parallel process a huge stream of images which undergo unpredictable arrival of the images and variation in time. Since the parallelism can be implemented at different levels, different algorithms and techniques have also been discussed. It also presents the state of the art and discussion of various existing technical solutions to implement the parallelization on a hybrid platform for the real time processing of the images contained in a stream. Experimental results obtained using Apache Hadoop in combination with OpenMP have also been discussed.


Author(s):  
Amir A. Khwaja

Big data explosion has already happened and the situation is only going to exacerbate with such a high number of data sources and high-end technology prevalent everywhere, generating data at a frantic pace. One of the most important aspects of big data is being able to capture, process, and analyze data as it is happening in real-time to allow real-time business decisions. Alternate approaches must be investigated especially consisting of highly parallel and real-time computations for big data processing. The chapter presents RealSpec real-time specification language that may be used for the modeling of big data analytics due to the inherent language features needed for real-time big data processing such as concurrent processes, multi-threading, resource modeling, timing constraints, and exception handling. The chapter provides an overview of RealSpec and applies the language to a detailed big data event recognition case study to demonstrate language applicability to big data framework and analytics modeling.


Author(s):  
Usman Akhtar ◽  
Mehdi Hassan

The availability of a huge amount of heterogeneous data from different sources to the Internet has been termed as the problem of Big Data. Clustering is widely used as a knowledge discovery tool that separate the data into manageable parts. There is a need of clustering algorithms that scale on big databases. In this chapter we have explored various schemes that have been used to tackle the big databases. Statistical features have been extracted and most important and relevant features have been extracted from the given dataset. Reduce and irrelevant features have been eliminated and most important features have been selected by genetic algorithms (GA).Clustering with reduced feature sets requires lower computational time and resources. Experiments have been performed at standard datasets and results indicate that the proposed scheme based clustering offers high clustering accuracy. To check the clustering quality various quality measures have been computed and it has been observed that the proposed methodology results improved significantly. It has been observed that the proposed technique offers high quality clustering.


Author(s):  
Zahid Hussain Qaisar ◽  
Farooq Ahmad

Regression testing is important activity during the maintenance phase. An important work during maintenance of the software is to find impact of change. One of the essential attributes of Software is change i.e. quality software is more vulnerable to change and provide facilitation and ease for developer to do required changes. Modification plays vital role in the software development so it is highly important to find the impact of that modification or to identify the change in the software. In software testing that issue gets more attention because after change we have to identify impact of change and have to keenly observe what has happened or what will happen after that particular change that we have made or going to make in software. After change software testing team has to modify its testing strategy and have to come across with new test cases to efficiently perform the testing activity during the software development Regression testing is performed when the software is already tested and now some change is made to it. Important thing is to adjust those tests which were generated in the previous testing processes of the software. This study will present an approach by analyzing VDM (Vienna Development Methods) to find impact of change which will describe that how we can find the change and can analyze the change in the software i.e. impact of change that has been made in software. This approach will fulfill the purpose of classifying the test cases from original test suite into three classes obsolete, re-testable, and reusable test cases. This technique will not only classify the original test cases but will also generate new test cases required for the purpose of regression testing.


Author(s):  
Shefali Virkar

Over the last few decades, unprecedented advances in communications technology have collapsed vast spatial and temporal differences, and made it possible for people to form connections in a manner not thought possible before. Centred chiefly on information, this revolution has transformed the way in which people around the world think, work, share, and communicate. Information and Communication Technologies (ICTs) promise a future of a highly interconnected world, wherein action is not limited by physical boundaries, and constrained physical space is replaced by a virtual ‘cyberspace' not subject to traditional hierarchies and power relations. But is the promise of ICTs chimerical? To tackle these issues, central to the global policy debate over the potential development contributions of Information and Communication Technologies, and to examine whether and the extent to which disparities in access to ICTs exist, this book chapter provides a demonstration of the ways in which ICTs may be used as tools to further global economic, social, and political advancement, to shape actor behaviour, and to enhance institutional functioning; particularly in the Third World.


Author(s):  
Wajid Khan ◽  
Fiaz Hussain ◽  
Edmond C. Prakash

The arrival of E-commerce systems has contributed a lot to the economy and also played a vital role in collecting a huge amount of transactional data in the form of online orders and web enquiries, with such a huge volume of data it is getting difficult day by day to analyse business and consumer behaviour. There is a greater need for business analytical tools to help decision makers understand data properly - and understanding data will lead to amazing things such as hidden trends, effective resource utilisation, decision making ability and understanding business and its core values.


Author(s):  
Carlos Q. Gómez ◽  
Marco A. Villegas ◽  
Fausto P. García ◽  
Diego J. Pedregal

Condition Monitoring (CM) is the process of determining the state of a system according to a certain number of parameters. This ‘condition' is tracked over time to detect any developing fault or non desired behaviour. As the Information and Communication Technologies (ICT) continue expanding the range of possible applications and gaining industrial maturity, the appearing of new sensor technologies such as Macro Fiber Composites (MFC) has opened a new range of possibilities for addressing a CM in industrial scenarios. The huge amount of data collected by MFC could overflow most conventional monitoring systems, requiring new approaches to take true advantage of the data. Big Data approach makes it possible to take profit of tons of data, integrating in the appropriate algorithms and technologies in a unified platform. This chapter proposes a real time condition monitoring approach, in which the system is continuously monitored allowing an online analysis.


Author(s):  
Abubakr Gafar Abdalla ◽  
Tarig Mohamed Ahmed ◽  
Mohamed Elhassan Seliaman

The web is a rich data mining source which is dynamic and fast growing, providing great opportunities which are often not exploited. Web data represent a real challenge to traditional data mining techniques due to its huge amount and the unstructured nature. Web logs contain information about the interactions between visitors and the website. Analyzing these logs provides insights into visitors' behavior, usage patterns, and trends. Web usage mining, also known as web log mining, is the process of applying data mining techniques to discover useful information hidden in web server's logs. Web logs are primarily used by Web administrators to know how much traffic they get and to detect broken links and other types of errors. Web usage mining extracts useful information that can be beneficial to a number of application areas such as: web personalization, website restructuring, system performance improvement, and business intelligence. The Web usage mining process involves three main phases: pre-processing, pattern discovery, and pattern analysis. Various preprocessing techniques have been proposed to extract information from log files and group primitive data items into meaningful, lighter level abstractions that are suitable for mining, usually in forms of visitors' sessions. Major data mining techniques in web usage mining pattern discovery are: clustering, association analysis, classification, and sequential patterns discovery. This chapter discusses the process of web usage mining, its procedure, methods, and patterns discovery techniques. The chapter also presents a practical example using real web log data.


Author(s):  
Alberto Pliego ◽  
Fausto Pedro García Márquez

The growing amount of available data generates complex problems when they need to be treated. Usually these data come from different sources and inform about different issues, however, in many occasions these data can be interrelated in order to gather strategic information that is useful for Decision Making processes in multitude of business. For a qualitatively and quantitatively analysis of a complex Decision Making process is critical to employ a correct method due to the large number of operations required. With this purpose, this chapter presents an approach employing Binary Decision Diagram applied to the Logical Decision Tree. It allows addressing a Main Problem by establishing different causes, called Basic Causes and their interrelations. The cases that have a large number of Basic Causes generate important computational costs because it is a NP-hard type problem. Moreover, this chapter presents a new approach in order to analyze big Logical Decision Trees. However, the size of the Logical Decision Trees is not the unique factor that affects to the computational cost but the procedure of resolution can widely vary this cost (ordination of Basic Causes, number of AND/OR gates, etc.) A new approach to reduce the complexity of the problem is hereby presented. It makes use of data derived from simpler problems that requires less computational costs for obtaining a good solution. An exact solution is not provided by this method but the approximations achieved have a low deviation from the exact.


Author(s):  
Jafreezal Jaafar ◽  
Kamaluddeen Usman Danyaro ◽  
M. S. Liew

This chapter discusses about the veracity of data. The veracity issue is the challenge of imprecision in big data due to influx of data from diverse sources. To overcome this problem, this chapter proposes a fuzzy knowledge-based framework that will enhance the accessibility of Web data and solve the inconsistency in data model. D2RQ, protégé, and fuzzy Web Ontology Language applications were used for configuration and performance. The chapter also provides the completeness fuzzy knowledge-based algorithm, which was used to determine the robustness and adaptability of the knowledge base. The result shows that the D2RQ is more scalable with respect to performance comparison. Finally, the conclusion and future lines of the research were provided.


Sign in / Sign up

Export Citation Format

Share Document