Research and Trends in Data Mining Technologies and Applications
Latest Publications


TOTAL DOCUMENTS

12
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

Published By IGI Global

9781599042718, 9781599042732

Author(s):  
Daniel Wu ◽  
Xiaohua Hu

In this chapter, we report a comprehensive evaluation of the topological structure of protein-protein interaction (PPI) networks, by mining and analyzing graphs constructed from the popular data sets publicly available to the bioinformatics research community. We compare the topology of these networks across different species, different confidence levels, and different experimental systems used to obtain the interaction data. Our results confirm the well-accepted claim that the degree distribution follows a power law. However, further statistical analysis shows that residues are not independent on the fit values, indicating that the power law model may be inadequate. Our results also show that the dependence of the average clustering coefficient on the vertices degree is far from a power law, contradicting many published results. For the first time, we report that the average vertex density exhibits a strong powder law dependence on the vertices degree for the networks studied, regardless of species, confidence levels, and experimental systems. We also present an efficient and accurate approach to detecting a community in a protein-protein interaction network from a given seed protein. Our experimental results show strong structural and functional relationships among member proteins within each of the communities identified by our approach, as verified by MIPS complex catalog database and annotations.


Author(s):  
Mafruz Ashrafi ◽  
David Taniar ◽  
Kate Smith

Association rule mining is one of the most widely used data mining techniques. To achieve a better performance, many efficient algorithms have been proposed. Despite these efforts, many of these algorithms require a large amount of main memory to enumerate all frequent itemsets, especially when the dataset is large or the user-specified support is low. Thus, it becomes apparent that we need to have an efficient main memory handling technique, which allows association rule mining algorithms to handle larger datasets in the main memory. To achieve this goal, in this chapter we propose an algorithm for vertical association rule mining that compresses a vertical dataset in an efficient manner, using bit vectors. Our performance evaluations show that the compression ratio attained by our proposed technique is better than those of the other well-known techniques.


Author(s):  
Richi Nayak

Web services have recently received much attention in businesses. However, a number of challenges such as lack of experience in estimating the costs, lack of service innovation and monitoring, and lack of methods for locating appropriate services are to be resolved. One possible approach is by learning from the experiences in Web services and from other similar situations. Such a task requires the use of data mining to represent generalizations on common situations. This chapter examines how some of the issues of Web services can be addressed through data mining.


Author(s):  
Alex Freitas ◽  
André Carvalho

In machine learning and data mining, most of the works in classification problems deal with flat classification, where each instance is classified in one of a set of possible classes and there is no hierarchical relationship between the classes. There are, however, more complex classification problems where the classes to be predicted are hierarchically related. This chapter presents a tutorial on the hierarchical classification techniques found in the literature. We also discuss how hierarchical classification techniques have been applied to the area of bioinformatics (particularly the prediction of protein function), where hierarchical classification problems are often found.


Author(s):  
Fedja Hadzic ◽  
Tharam Dillon ◽  
Henry Tan ◽  
Ling. Feng ◽  
Elizabeth Chang

Association rule mining is one of the most popular pattern discovery methods used in data mining. Frequent pattern extraction is an essential step in association rule mining. Most of the proposed algorithms for extracting frequent patterns are based on the downward closure lemma concept utilizing the support and confidence framework. In this chapter we investigate an alternative method for mining frequent patterns in a transactional database. Self-Organizing Map (SOM) is an unsupervised neural network that effectively creates spatially organized internal representations of the features and abstractions detected in the input space. It is one of the most popular clustering techniques, and it reveals existing similarities in the input space by performing a topology-preserving mapping. These promising properties indicate that such a clustering technique can be used to detect frequent patterns in a top-down manner as opposed to the traditional approach that employs a bottom-up lattice search. Issues that are frequently raised when using clustering technique for the purpose of finding association rules are: (i) the completeness of association rule set, (ii) the support level for the rules generated, and (iii) the confidence level for the rules generated. We present some case studies analyzing the relationships between the SOM approach and the traditional association rule framework, and propose a way to constrain the clustering technique so that the traditional support constraint can be approximated. Throughout our experiments, we have demonstrated how a clustering approach can be used for discovering frequent patterns.


Author(s):  
Karlton Sequeira ◽  
Mohammed J. Zaki

Very often, related data may be collected by a number of sources, which may be unable to share their entire datasets for reasons like confidentiality agreements, dataset size, and so forth. However, these sources may be willing to share a condensed model of their datasets. If some substructures of the condensed models of such datasets, from different sources, are found to be unusually similar, policies successfully applied to one may be successfully applied to the others. In this chapter, we propose a framework for constructing condensed models of datasets and algorithms to find similar substructure in pairs of such models. The algorithms are based on the tensor product. We test our framework on pairs of synthetic datasets and compare our algorithms with an existing one. Finally, we apply it to basketball player statistics for two National Basketball Association (NBA) seasons, and to breast cancer datasets. The results are statistically more interesting than results obtained from independent analysis of the datasets.


Author(s):  
Wenyuan Li ◽  
Wee-Keong Ng ◽  
Kok-Leong Ong

With the most expressive representation that is able to characterize the complex data, graph mining is an emerging and promising domain in data mining. Meanwhile, the graph has been well studied in a long history with many theoretical results from various foundational fields, such as mathematics, physics, and artificial intelligence. In this chapter, we systematically reviewed theories and techniques newly studied and proposed in these areas. Moreover, we focused on those approaches that are potentially valuable to graph-based data mining. These approaches provide the different perspectives and motivations for this new domain. To illustrate how the method from the other area contributes to graph-based data mining, we did a case study on a classic graph problem that can be widely applied in many application areas. Our results showed that the methods from foundational areas may contribute to graph-based data mining.


Author(s):  
Irene Ntoutsi ◽  
Nikos Pelekis ◽  
Yannis Theodoridis

Many patterns are available nowadays due to the widespread use of knowledge discovery in databases (KDD), as a result of the overwhelming amount of data. This “flood” of patterns imposes new challenges regarding their management. Pattern comparison, which aims at evaluating how close to each other two patterns are, is one of these challenges resulting in a variety of applications. In this chapter we investigate issues regarding the pattern comparison problem and present an overview of the work performed so far in this domain. Due to heterogeneity of data mining patterns, we focus on the most popular pattern types, namely frequent itemsets and association rules, clusters and clusterings, and decision trees.


Author(s):  
Lixin Fu

In high-dimensional data sets, both the number of dimensions and the cardinalities of the dimensions are large and data is often very sparse, that is, most cubes are empty. For such large data sets, it is a well-known challenging problem to compute the aggregation of a measure over arbitrary combinations of dimensions efficiently. However, in real-world applications, users are usually not interested in all the sparse cubes, most of which are empty or contain only one or few tuples. Instead, they focus more on the “big picture” information the highly aggregated data, where the “where clauses” of the SQL queries involve only few dimensions. Although the input data set is sparse, this aggregate data is dense. The existing multi-pass, full-cube computation algorithms are prohibitively slow for this type of application involving very large input data sets. We propose a new dynamic data structure called Restricted Sparse Statistics Tree (RSST) and a novel cube evaluation algorithm, which are especially well suited for efficiently computing dense sub-cubes imbedded in high-dimensional sparse data sets. RSST only computes the aggregations of non-empty cube cells where the number of non-star coordinates (i.e., the number of group by attributes) is restricted to be no more than a user-specified threshold. Our innovative algorithms are scalable and I/O efficient. RSST is incrementally maintainable, which makes it suitable for data warehousing and the analysis of streaming data. We have compared our algorithms with top, state-of-the-art cube computation algorithms such as Dwarf and QCT in construction times, query response times, and data compression. Experiments demonstrate the excellent performance and good scalability of our approach.


Author(s):  
Torben Pedersen ◽  
Jesper Thorhauge ◽  
Søren Jespersen

Enormous amounts of information about Web site user behavior are collected in Web server logs. However, this information is only useful if it can be queried and analyzed to provide high-level knowledge about user navigation patterns, a task that requires powerful techniques. This chapter presents a number of approaches that combine data warehousing and data mining techniques in order to analyze Web logs. After introducing the well-known click and session data warehouse (DW) schemas, the chapter presents the subsession schema, which allows fast queries on sequences of page visits. Then, the chapter presents the so-called “hybrid” technique, which combines DW Web log schemas with a data mining technique called Hypertext Probabilistic Grammars, hereby providing fast and flexible constraint-based Web log analysis. Finally, the chapter presents a “post-check enhanced” improvement of the hybrid technique.


Sign in / Sign up

Export Citation Format

Share Document