Discovering Personalized Novel Knowledge from Text

Author(s):  
Yi-fang Brook Wu ◽  
Xin Chen

This chapter presents a methodology for personalized knowledge discovery from text. Traditionally, problems with text mining are numerous rules derived and many already known to the user. Our proposed algorithm derives user’s background knowledge from a set of documents provided by the user, and exploits such knowledge in the process of knowledge discovery from text. Keywords are extracted from background documents and clustered into a concept hierarchy that captures the semantic usage of keywords and their relationships in the background documents. Target documents are retrieved by selecting documents that are relevant to the user’s background. Association rules are discovered among noun phrases extracted from target documents. Novelty of an association rule is defined as the semantic distance between the antecedent and the consequent of a rule in the background knowledge. The experiment shows that our novelty measure performs better than support and confidence in identifying novel knowledge.

2010 ◽  
Vol 108-111 ◽  
pp. 50-56 ◽  
Author(s):  
Liang Zhong Shen

Due to the popularity of knowledge discovery and data mining, in practice as well as among academic and corporate professionals, association rule mining is receiving increasing attention. The technology of data mining is applied in analyzing data in databases. This paper puts forward a new method which is suit to design the distributed databases.


2013 ◽  
Vol 694-697 ◽  
pp. 2317-2321
Author(s):  
Hui Wang

The goal of knowledge discovery is to extract hidden or useful unknown knowledge from databases, while the objective of knowledge hiding is to prevent certain confidential data or knowledge from being extracted through data mining techniques. Hiding sensitive association rules is focused. The side-effects of the existing data mining technology are investigated. The problem of sensitive association rule hiding is described formally. The representative sanitizing strategies for sensitive association rule hiding are discussed.


There is huge amount of data being generated every minute on internet. This data is of no use until we cannot extract useful information from it. Data mining is the process of extracting useful information or knowledge from this huge amount of data that can be further used for various purposes. Discovering Association rules is one of the most important tasks among all other data mining tasks. Association rules contain the rules in the form of IF then THAN form. The leftmost part of the rule i.e. IF is called as the Antecedent which defines the condition and the rightmost part i.e. ELSE is called as the Consequent which defines the result. In this paper, we present the overview and comparison of Apriori, Apriori PT and Frequent Itemsets algorithm of association component in Tanagra Tool. We analyzed the performance based on the execution time and memory used for different number of instances, support and Rule Length in Spambase Dataset. The results show that when we increase the support value the Apriori PT takes the less execution time and Apriori takes less memory space. When numbers of instances are reduced Frequent Itemsets outperforms well both in case of memory and execution time. When rule length is increased the Apriori algorithm performs better than Apriori PT and Frequent Itemsets.


2016 ◽  
pp. 713-732
Author(s):  
Asmae Dami ◽  
Mohamed Fakir ◽  
Belaid Bouikhalene

This chapter is located in the intersection of two research themes, namely: Information Retrieval and Knowledge Discovery from texts (Text mining). The purpose of this paper is two-fold: first, it focuses on Information Retrieval (IR) whose purpose is to implement a set of models and systems for selecting a set of documents satisfying user needs in terms of information expressed as a query. An information retrieval system is composed mainly of two processes the representation and retrieval process. The process of representation is called indexing, which allows representation of documents and queries by descriptors, or indexes. These descriptors reflect the contents of documents. The retrieval process consists on the comparison between documents representations and query representation. The second aim of this paper is to discover the relationships between terms (keywords) descriptors of documents in a document database. The correlations (relationships) between terms are extracted by using a technique of the Text mining, mainly association rules.


Author(s):  
Mihai Gabroveanu

During the last years the amount of data stored in databases has grown very fast. Data mining, also known as knowledge discovery in databases, represents the discovery process of potentially useful hidden knowledge or relations among data from large databases. An important task in the data mining process is the discovery of the association rules. An association rule describes an interesting relationship between different attributes. There are different kinds of association rules: Boolean (crisp) association rules, quantitative association rules, fuzzy association rules, etc. In this chapter, we present the basic concepts of Boolean and the fuzzy association rules, and describe the methods used to discover the association rules by presenting the most important algorithms.


Author(s):  
Jianchao Han ◽  
◽  
Mohsen Beheshti

Mining association rules is an important task of dara mining and knowledge discovery. Traditional association rules mining is built on transaction databases, which has some limitations. Two of these limitations are 1) each transaction merely contains binary items, meaning that an item either occurs in a transaction or not; 2) only positive association rules are discovered, while negative associations are ignored. Mining fuzzy association rules has been proposed to address the first limitation, while mining algorithms for negative association rules have been developed to resolve the second limitation. In this paper, we combine these two approaches to propose a novel approach for mining both positive and negative fuzzy association rules. The interestingness measure for both positive and negative fuzzy association rule is proposed, the algorithm for mining these rules is described, and an illustrative example is presented to demonstrate how the measure and the algorithm work.


2014 ◽  
Vol 7 (4) ◽  
pp. 42-62
Author(s):  
Asmae Dami ◽  
Mohamed Fakir ◽  
Belaid Bouikhalene

This paper is located in the intersection of two research themes, namely: Information Retrieval and Knowledge Discovery from texts (Text mining). The purpose of this paper is two-fold: first, it focuses on Information Retrieval (IR) whose purpose is to implement a set of models and systems for selecting a set of documents satisfying user needs in terms of information expressed as a query. An information retrieval system is composed mainly of two processes the representation and retrieval process. The process of representation is called indexing, which allows representation of documents and queries by descriptors, or indexes. These descriptors reflect the contents of documents. The retrieval process consists on the comparison between documents representations and query representation. The second aim of this paper is to discover the relationships between terms (keywords) descriptors of documents in a document database. The correlations (relationships) between terms are extracted by using a technique of the Text mining, mainly association rules.


2016 ◽  
Vol 8 (1) ◽  
Author(s):  
Kezia Sumangkut ◽  
Arie S.M. Lumenta ◽  
Virginia Tulenan

Abstrak --- Perkembangan pasar modern yang semakin hari semakin pesat dapat dilihat dari pusat perbelanjaan seperti supermarket, minimarket, grosir, dan lain sebagainya yang dibangun untuk kebutuhan melayani konsumen. Dan pemanfaatan data transaksi yang banyak dapat memberikan pengetahuan yang menarik dalam membuat kebijakan dan strategi penempatan rak barang. Maraknya perbelanjaan modern dan pesaing bisnis seperti itu tidak lepas dari peralihan pola pikir konsumen yang tadinya mencari harga yang murah, kini sudah memperhatikan aspek keamanan, kebersihan, kenyamanan, keramahan dalam pelayanan serta kelengkapan jenis barang dan penempatan rak barang. Oleh karena itu dalam penelitian ini, penulis mengangkat permasalahan tentang Analisa Pola Belanja Swalayan Daily Mart Untuk Menentukan Tata Letak Barang Menggunakan Algoritma FP-Growth, dalam pelayanan yang sering terjadi di swalayan Daily Mart, dan  untuk mewujudkan hal itu penulis menerapkan metodologi KDD (Knowledge Discovery in Database). Salah satu teknik Data Mining dalam penelitian ini adalah Association Rule dalam Java Weka untuk mencari pengetahuan pola dari pembelian konsumen. Hasil dari penelitian ini berupa data pola pembelian/struk yang memiliki nilai confidence yang tinggi sebagai bahan untuk merekomendasi tata letak sesuai banyak barang yang paling sering dibeli. Kata Kunci --- Data Mining, Association Rules, Market Based Analysis, Java Weka


Sign in / Sign up

Export Citation Format

Share Document