Using Prior Knowledge in Data Mining

Author(s):  
Francesca A. Lisi

One of the most important and challenging problems in current Data Mining research is the definition of the prior knowledge that can be originated from the process or the domain. This contextual information may help select the appropriate information, features or techniques, decrease the space of hypotheses, represent the output in a most comprehensible way and improve the process. Ontological foundation is a precondition for efficient automated usage of such information (Chandrasekaran et al., 1999). An ontology is a formal explicit specification of a shared conceptualization for a domain of interest (Gruber, 1993). Among other things, this definition emphasizes the fact that an ontology has to be specified in a language that comes with a formal semantics. Due to this formalization ontologies provide the machine interpretable meaning of concepts and relations that is expected when using a semantic-based approach (Staab & Studer, 2004). In its most prevalent use in Artificial Intelligence (AI), an ontology refers to an engineering artifact (more precisely, produced according to the principles of Ontological Engineering (Gómez-Pérez et al., 2004)), constituted by a specific vocabulary used to describe a certain reality, plus a set of explicit assumptions regarding the intended meaning of the vocabulary words. This set of assumptions has usually the form of a First-Order Logic (FOL) theory, where vocabulary words appear as unary or binary predicate names, respectively called concepts and relations. In the simplest case, an ontology describes a hierarchy of concepts related by subsumption relationships; in more sophisticated cases, suitable axioms are added in order to express other relationships between concepts and to constrain their intended interpretation. Ontologies can play several roles in Data Mining (Nigro et al., 2007). In this chapter we investigate the use of ontologies as prior knowledge in Data Mining. As an illustrative case throughout the chapter, we choose the task of Frequent Pattern Discovery, it being the most representative product of the cross-fertilization among Databases, Machine Learning and Statistics that has given rise to Data Mining. Indeed it is central to an entire class of descriptive tasks in Data Mining among which Association Rule Mining (Agrawal et al., 1993; Agrawal & Srikant, 1994) is the most popular. A pattern is considered as an intensional description (expressed in a given language L) of a subset of a data set r. The support of a pattern is the relative frequency of the pattern within r and is computed with the evaluation function supp. The task of Frequent Pattern Discovery aims at the extraction of all frequent patterns, i.e. all patterns whose support exceeds a user-defined threshold of minimum support. The blueprint of most algorithms for Frequent Pattern Discovery is the levelwise search (Mannila & Toivonen, 1997). It is based on the following assumption: If a generality order = for the language L of patterns can be found such that = is monotonic w.r.t. supp, then the resulting space (L, =) can be searched breadth-first by starting from the most general pattern in L and alternating candidate generation and candidate evaluation phases.

2014 ◽  
Vol 926-930 ◽  
pp. 2786-2789
Author(s):  
Jing Zhu Li ◽  
Qian Li ◽  
Tai Yu Liu ◽  
Wei Hong Niu

Data mining is a multidisciplinary field of the 20th century gradually, this paper based on data mining modeling, algorithms, applications and software tools were reviewed, the definition of data mining, the scope and characteristics of the data sets and data mining various practical situations; summarizes the data mining in the practical application of the basic steps and processes; data mining tasks in a variety of applications and modeling issues were discussed; cited the current field of data mining is mainly popular algorithms, and algorithm design issues to consider briefly analyzed; overview of the current data mining algorithm in a number of areas; more comprehensive description of the current performance and data mining software tools developer circumstances; Finally, the development of data mining prospects and direction prospected.


Author(s):  
Benito van der Zander

Pattern matching in a broad sense is a common feature of modern functional programming languages, answering the question, if one complex structured object has a form that is the same as another complex structured object, for some definition of “the same”. In XQuery path expressions, switch, and typeswitch statements are often described as performing pattern matching, but these are merely impoverished flavors of matching when compared to the real thing. We describe a syntax for general pattern matching based on regular expressions for XML/HTML/JSONiq trees, how these patterns are matched against input data, and how this pattern matching can be integrated into the syntax and semantics of the XQuery language. At the end we summarize real-world experience using it for large-scale data mining of library webcatalogs.


Author(s):  
Héctor Oscar Nigro ◽  
Sandra Elizabeth González Císaro

Nowadays one of the most important and challenging problems in Knowledge Discovery Process in Databases (KDD) or Data Mining is the definition of the prior knowledge; this can be originated either from the process or the domain. This contextual information may help select the appropriate information, features or techniques, decrease the space of hypothesis, represent the output in a more comprehensible way and improve the whole process.


Author(s):  
Junsong Yuan

One of the focused themes in data mining research is to discover frequent and repetitive patterns from the data. The success of frequent pattern mining (Han, Cheng, Xin, & Yan, 2007) in structured data (e.g., transaction data) and semi-structured data (e.g., text) has recently aroused our curiosity in applying them to multimedia data. Given a collection of unlabeled images, videos or audios, the objective of repetitive pattern discovery is to find (if there is any) similar patterns that appear repetitively in the whole dataset. Discovering such repetitive patterns in multimedia data brings in interesting new problems in data mining research. It also provides opportunities in solving traditional tasks in multimedia research, including visual similarity matching (Boiman & Irani, 2006), visual object retrieval (Sivic & Zisserman, 2004; Philbin, Chum, Isard, Sivic & Zisserman, 2007), categorization (Grauman & Darrell, 2006), recognition (Quack, Ferrari, Leibe & Gool, 2007; Amores, Sebe, & Radeva, 2007), as well as audio object search and indexing (Herley, 2006). • In image mining, frequent or repetitive patterns can be similar image texture regions, a specific visual object, or a category of objects. These repetitive patterns appear in a sub-collection of the images (Hong & Huang, 2004; Tan & Ngo, 2005; Yuan & Wu, 2007, Yuan, Wu & Yang, 2007; Yuan, Li, Fu, Wu & Huang, 2007). • In video mining, repetitive patterns can be repetitive short video clips (e.g. commercials) or temporal visual events that happen frequently in the given videos (Wang, Liu & Yang, 2005; Xie, Kennedy, Chang, Divakaran, Sun, & Lin, 2004; Yang, Xue, & Tian, 2005; Yuan, Wang, Meng, Wu & Li, 2007). • In audio mining, repetitive patterns can be repeated structures appearing in music (Lartillot, 2005) or broadcast audio (Herley, 2006). Repetitive pattern discovery is a challenging problem because we do not have any a prior knowledge of the possible repetitive patterns. For example, it is generally unknown in advance (i) what the repetitive patterns look like (e.g. shape and appearance of the repetitive object/contents of the repetitive clip); (ii) where (location) and how large (scale of the repetitive object or length of the repetitive clip) they are; (iii) how many repetitive patterns in total and how many instances each repetitive pattern has; or even (iv) whether such repetitive patterns exist at all. An exhaustive solution needs to search through all possible pattern sizes and locations, thus is extremely computationally demanding, if not impossible.


Author(s):  
Xiong Wang

Data management in its general term refers to activities that involve the acquisition, storage, and retrieval of data. Traditionally, information retrieval is facilitated through queries, such as exact search, nearest neighbor search, range search, etc. In the last decade, data mining has emerged as one of the most dynamic fields in the frontier of data management. Data mining refers to the process of extracting useful knowledge from the data. Popular data mining techniques include association rule discovery, frequent pattern discovery, classification, and clustering. In this chapter, we discuss data management in a specific type of data i.e., three-dimensional structures. While research on text and multimedia data management has attracted considerable attention and substantial progress has been made, data management in three-dimensional structures is still in its infancy (Castelli & Bergman, 2001; Paquet & Rioux, 1999). Data management in 3D structures raises several interesting problems: 1. Similarity search 2. Pattern discovery 3. Classification 4. Clustering


2010 ◽  
Vol 10 (3) ◽  
pp. 251-289 ◽  
Author(s):  
JOANNA JÓZEFOWSKA ◽  
AGNIESZKA ŁAWRYNOWICZ ◽  
TOMASZ ŁUKASZEWSKI

AbstractWe propose a new method for mining frequent patterns in a language that combines both Semantic Web ontologies and rules. In particular, we consider the setting of using a language that combines description logics (DLs) with DL-safe rules. This setting is important for the practical application of data mining to the Semantic Web. We focus on the relation of the semantics of the representation formalism to the task of frequent pattern discovery, and for the core of our method, we propose an algorithm that exploits the semantics of the combined knowledge base. We have developed a proof-of-concept data mining implementation of this. Using this we have empirically shown that using the combined knowledge base to perform semantic tests can make data mining faster by pruning useless candidate patterns before their evaluation. We have also shown that the quality of the set of patterns produced may be improved: the patterns are more compact, and there are fewer patterns. We conclude that exploiting the semantics of a chosen representation formalism is key to the design and application of (onto-)relational frequent pattern discovery methods.


2021 ◽  
Vol 11 (4) ◽  
pp. 1715
Author(s):  
Jieh-Ren Chang ◽  
You-Shyang Chen ◽  
Chien-Ku Lin ◽  
Ming-Fu Cheng

Storage devices in the computer industry have gradually transformed from the hard disk drive (HDD) to the solid-state drive (SSD), of which the key component is error correction in not-and (NAND) flash memory. While NAND flash memory is under development, it is still limited by the “program and erase” cycle (PE cycle). Therefore, the improvement of quality and the formulation of customer service strategy are topics worthy of discussion at this stage. This study is based on computer company A as the research object and collects more than 8000 items of SSD error data of its customers, which are then calculated with data mining and frequent pattern growth (FP-Growth) of the association rule algorithm to identify the association rule of errors by setting the minimum support degree of 90 and the minimum trust degree of 10 as the threshold. According to the rules, three improvement strategies of production control are suggested: (1) use of the association rule to speed up the judgment of the SSD error condition by customer service personnel, (2) a quality strategy, and (3) a customer service strategy.


2018 ◽  
Vol E101.D (3) ◽  
pp. 593-601
Author(s):  
Shouhei FUKUNAGA ◽  
Yoshimasa TAKABATAKE ◽  
Tomohiro I ◽  
Hiroshi SAKAMOTO

Sign in / Sign up

Export Citation Format

Share Document