Scalable Fuzzy Algorithms for Data Management and Analysis
Latest Publications


TOTAL DOCUMENTS

16
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781605668581, 9781605668598

Author(s):  
François Deliège ◽  
Torben Bach Pedersen

The emergence of music recommendation systems calls for the development of new data management technologies able to query vast music collections. In this chapter, the authors present a music warehouse prototype able to perform efficient nearest neighbor searches in an arbitrary song similarity space. Using fuzzy songs sets, the music warehouse offers a practical solution to three concrete musical data management scenarios: user musical preferences, user feedback, and song similarities. The authors investigate three practical approaches to tackle the storage issues of fuzzy song sets: tables, arrays, and compressed bitmaps. They confront theoretical estimates with practical implementation results and prove that, from a storage point of view, arrays and compressed bitmaps are both effective data structure solutions. With respect to speed, the authors show that operations on compressed bitmap offer a significant grain in performances for fuzzy song sets comprising a large number of songs. Finally, the authors argue that the presented results are not limited to music recommendations system but can be applied to other domains.


Author(s):  
Christophe Marsala ◽  
Marcin Detyniecki

In this chapter, the authors focus on the use of forests of fuzzy decision trees (FFDT) in a video mining application. They discuss how to learn from a high scale video data sets and how to use the trained FFDTs to detect concepts in a high number of video shots. Moreover, the authors study the effect of the size of the forest on the performance; and of the use of fuzzy logic during the classification process. The experiments are performed on a well-know non-video dataset and on a real TV quality video benchmark.


Author(s):  
Christian Borgelt ◽  
Xiaomeng Wang

In this chapter the authors introduce SaM, a split and merge algorithm for frequent item set mining. Its core advantages are its extremely simple data structure and processing scheme, which not only make it very easy to implement, but also fairly easy to execute on external storage, thus rendering it a highly useful method if the data to mine cannot be loaded into main memory. Furthermore, the authors present extensions of this algorithm, which allow for approximate or “fuzzy” frequent item set mining in the sense that missing items can be inserted into transactions with a user-specified penalty. Finally, they present experiments comparing their new method with classical frequent item set mining algorithms (like Apriori, Eclat and FP-growth) and with the approximate frequent item set mining version of RElim (an algorithm the authors proposed in an earlier paper and improved in the meantime).


Author(s):  
Ronald R. Yager

The ordered weighted averaging (OWA) operator is introduced and the author discusses how it can provide a basis for generating summarizing statistics over large data sets. The author further notes how different forms of OWA operators, and hence different summarizing statistics, can be induced using weight-generating functions. The author shows how these weight-generating functions can provide a vehicle with which a data analyst can express desired summarizing statistics. Modern data analysis requires the use of more human focused summarizing statistics then those classically used. The author’s goal here is to develop to ideas to enable a human focused approach to summarizing statistics. Using these ideas we can envision a computer aided construction of the weight generating functions based upon a combination of graphical and linguistic specifications provided by a data analyst describing his desired summarization.


Author(s):  
Janusz Kacprzyk ◽  
Slawomir Zadrozny

The authors discuss aspects related to the scalability of data mining tools meant in a different way than whether a data mining tool retains its intended functionality as the problem size increases. They introduce a new concept of a cognitive (perceptual) scalability meant as whether as the problem size increases the method remains fully functional in the sense of being able to provide intuitively appealing and comprehensible results to the human user. The authors argue that the use of natural language in the linguistic data summaries provides a high cognitive (perceptional) scalability because natural language is the only fully natural means of human communication and provides a common language for individuals and groups of different backgrounds, skills, knowledge. They show that the use of Zadeh’s protoform as general representations of linguistic data summaries, proposed by Kacprzyk and Zadrozny (2002; 2005a; 2005b), amplify this advantage leading to an ultimate cognitive (perceptual) scalability.


Author(s):  
Gloria Bordogna ◽  
Alessandro Campi ◽  
Stefania Ronchi ◽  
Giuseppe Psaila

In this chapter the authors consider the problem of defining a flexible approach for exploring huge amounts of results retrieved by several Internet search services (like search engines). The goal is to offer users a way to discover relevant hidden relationships between documents. The proposal is motivated by the observation that visualization paradigms, based on either the ranked list or clustered results, do not allow users to fully appreciate and understand the retrieved contents. In the case of long ranked lists, the user generally analyzes only the first few pages. On the other side, in the case the documents are clustered, to understand their contents the user does not have other means that looking at the cluster labels. When the same query is submitted to distinct search services, they may produce partially overlapped clustered results, where clusters identified by distinct labels collect some common documents. Moreover, clusters with similar labels, but containing distinct documents, may be produced as well. In such a situation, it may be useful to compare, combine and rank the cluster contents, to filter out relevant documents. In this chapter the authors present a novel manipulation language, in which several operators (inspired by relational algebra) and distinct ranking methods can be exploited to analyze the clusters’ contents. New clusters can be generated and ranked based on distinct criteria, by combining (i.e., overlapping, refining and intersecting) clusters in a set oriented fashion. Specifically, the chapter is focused on the ranking methods defined for each operator of the language.


Author(s):  
Giorgos Stoilos ◽  
Jeff Z. Pan ◽  
Giorgos Stamou

The last couple of years it is widely acknowledged that uncertainty and fuzzy extensions to ontology languages, like description logics (DLs) and OWL, could play a significant role in the improvement of many Semantic Web (SW) applications like matching, merging and ranking. Unfortunately, existing fuzzy reasoners focus on very expressive fuzzy ontology languages, like OWL, and are thus not able to handle the scale of data that the Web provides. For those reasons much research effort has been focused on providing fuzzy extensions and algorithms for tractable ontology languages. In this chapter, the authors present some recent results about reasoning and fuzzy query answering over tractable/polynomial fuzzy ontology languages namely Fuzzy DL-Lite and Fuzzy EL+. Fuzzy DL-Lite provides scalable algorithms for very expressive (extended) conjunctive queries, while Fuzzy EL+ provides polynomial algorithms for knowledge classification. For the Fuzzy DL-Lite case the authors will also report on an implementation in the ONTOSEARCH2 system and preliminary, but encouraging, benchmarking results.


Author(s):  
Nicolás Marín ◽  
Carlos Molina ◽  
Daniel Sánchez ◽  
M. Amparo Vila

The use of online analytical processing (OLAP) systems as data sources for data mining techniques has been widely studied and has resulted in what is known as online analytical mining (OLAM). As a result of both the use of OLAP technology in new fields of knowledge and the merging of data from different sources, it has become necessary for models to support imprecision. We, therefore, need OLAM methods which are able to deal with this imprecision. Association rules are one of the most used data mining techniques. There are several proposals that enable the extraction of association rules on DataCubes but few of these deal with imprecision in the process and give as result complex rule sets. In this chapter the authors will present a method that manages the imprecision and reduces the complexity. They will study the influence of the use of fuzzy logic using different size problems and comparing the results with a crisp approach.


Author(s):  
Lawrence O. Hall ◽  
Dmitry B. Goldgof ◽  
Juana Canul-Reich ◽  
Prodip Hore ◽  
Weijian Cheng ◽  
...  

This chapter examines how to scale algorithms which learn fuzzy models from the increasing amounts of labeled or unlabeled data that are becoming available. Large data repositories are increasingly available, such as records of network transmissions, customer transactions, medical data, and so on. A question arises about how to utilize the data effectively for both supervised and unsupervised fuzzy learning. This chapter will focus on ensemble approaches to learning fuzzy models for large data sets which may be labeled or unlabeled. Further, the authors examine ways of scaling fuzzy clustering to extremely large data sets. Examples from existing data repositories, some quite large, will be given to show the approaches discussed here are effective.


Author(s):  
Koldo Basterretxea ◽  
Inés del Campo

This chapter describes two decades of evolution of electronic hardware for fuzzy computing, and discusses the new trends and challenges that are currently being faced in this field. Firstly the authors analyze the main design approaches performed since first fuzzy chip designs were published and until the consolidation of reconfigurable hardware: the digital approach and the analog approach. Secondly, the evolution of fuzzy hardware based on reconfigurable devices, from traditional field programmable gate arrays to complex system-on-programmable chip solutions, is described and its relationship with the scalability issue is explained. The reconfigurable approach is completed by analyzing a cutting edge design methodology known as dynamic partial reconfiguration and by reviewing some evolvable fuzzy hardware designs. Lastly, regarding fuzzy data-mining processing, the main proposals to speed up data-mining workloads are presented: multiprocessor architectures, reconfigurable hardware, and high performance reconfigurable computing.


Sign in / Sign up

Export Citation Format

Share Document