Mining Repetitive Patterns in Multimedia Data

Author(s):  
Junsong Yuan

One of the focused themes in data mining research is to discover frequent and repetitive patterns from the data. The success of frequent pattern mining (Han, Cheng, Xin, & Yan, 2007) in structured data (e.g., transaction data) and semi-structured data (e.g., text) has recently aroused our curiosity in applying them to multimedia data. Given a collection of unlabeled images, videos or audios, the objective of repetitive pattern discovery is to find (if there is any) similar patterns that appear repetitively in the whole dataset. Discovering such repetitive patterns in multimedia data brings in interesting new problems in data mining research. It also provides opportunities in solving traditional tasks in multimedia research, including visual similarity matching (Boiman & Irani, 2006), visual object retrieval (Sivic & Zisserman, 2004; Philbin, Chum, Isard, Sivic & Zisserman, 2007), categorization (Grauman & Darrell, 2006), recognition (Quack, Ferrari, Leibe & Gool, 2007; Amores, Sebe, & Radeva, 2007), as well as audio object search and indexing (Herley, 2006). • In image mining, frequent or repetitive patterns can be similar image texture regions, a specific visual object, or a category of objects. These repetitive patterns appear in a sub-collection of the images (Hong & Huang, 2004; Tan & Ngo, 2005; Yuan & Wu, 2007, Yuan, Wu & Yang, 2007; Yuan, Li, Fu, Wu & Huang, 2007). • In video mining, repetitive patterns can be repetitive short video clips (e.g. commercials) or temporal visual events that happen frequently in the given videos (Wang, Liu & Yang, 2005; Xie, Kennedy, Chang, Divakaran, Sun, & Lin, 2004; Yang, Xue, & Tian, 2005; Yuan, Wang, Meng, Wu & Li, 2007). • In audio mining, repetitive patterns can be repeated structures appearing in music (Lartillot, 2005) or broadcast audio (Herley, 2006). Repetitive pattern discovery is a challenging problem because we do not have any a prior knowledge of the possible repetitive patterns. For example, it is generally unknown in advance (i) what the repetitive patterns look like (e.g. shape and appearance of the repetitive object/contents of the repetitive clip); (ii) where (location) and how large (scale of the repetitive object or length of the repetitive clip) they are; (iii) how many repetitive patterns in total and how many instances each repetitive pattern has; or even (iv) whether such repetitive patterns exist at all. An exhaustive solution needs to search through all possible pattern sizes and locations, thus is extremely computationally demanding, if not impossible.

Author(s):  
Xiong Wang

Data management in its general term refers to activities that involve the acquisition, storage, and retrieval of data. Traditionally, information retrieval is facilitated through queries, such as exact search, nearest neighbor search, range search, etc. In the last decade, data mining has emerged as one of the most dynamic fields in the frontier of data management. Data mining refers to the process of extracting useful knowledge from the data. Popular data mining techniques include association rule discovery, frequent pattern discovery, classification, and clustering. In this chapter, we discuss data management in a specific type of data i.e., three-dimensional structures. While research on text and multimedia data management has attracted considerable attention and substantial progress has been made, data management in three-dimensional structures is still in its infancy (Castelli & Bergman, 2001; Paquet & Rioux, 1999). Data management in 3D structures raises several interesting problems: 1. Similarity search 2. Pattern discovery 3. Classification 4. Clustering


Author(s):  
Manish Gupta ◽  
Jiawei Han

Sequential pattern mining methods have been found to be applicable in a large number of domains. Sequential data is omnipresent. Sequential pattern mining methods have been used to analyze this data and identify patterns. Such patterns have been used to implement efficient systems that can recommend based on previously observed patterns, help in making predictions, improve usability of systems, detect events, and in general help in making strategic product decisions. In this chapter, we discuss the applications of sequential data mining in a variety of domains like healthcare, education, Web usage mining, text mining, bioinformatics, telecommunications, intrusion detection, et cetera. We conclude with a summary of the work.


Author(s):  
Anne Denton

Time series data is of interest to most science and engineering disciplines and analysis techniques have been developed for hundreds of years. There have, however, in recent years been new developments in data mining techniques, such as frequent pattern mining, that take a different perspective of data. Traditional techniques were not meant for such pattern-oriented approaches. There is, as a result, a significant need for research that extends traditional time-series analysis, in particular clustering, to the requirements of the new data mining algorithms.


2013 ◽  
Vol 443 ◽  
pp. 402-406 ◽  
Author(s):  
Shang Gao ◽  
Mei Mei Li

With the rapid development of the number of mobile phone users has accumulated a large number of graph data, graph data mining has gradually become a hot area of research. Traditional data such as clustering, classification, frequent pattern mining gradually extended to the field of graph data mining research. Introduced at this stage graph data mining technology research progress, summarizes the characteristics of the graphical data mining, practical significance, the main problem, and scenarios to discuss and forecast chart data, especially research on uncertain graph data become trends and hot spots.


2017 ◽  
Vol 10 (13) ◽  
pp. 191
Author(s):  
Nikhil Jamdar ◽  
A Vijayalakshmi

There are many algorithms available in data mining to search interesting patterns from transactional databases of precise data. Frequent pattern mining is a technique to find the frequently occurred items in data mining. Most of the techniques used to find all the interesting patterns from a collection of precise data, where items occurred in each transaction are certainly known to the system. As well as in many real-time applications, users are interested in a tiny portion of large frequent patterns. So the proposed user constrained mining approach, will help to find frequent patterns in which user is interested. This approach will efficiently find user interested frequent patterns by applying user constraints on the collections of uncertain data. The user can specify their own interest in the form of constraints and uses the Map Reduce model to find uncertain frequent pattern that satisfy the user-specified constraints 


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Weifeng Li ◽  
Xiaoyun Cheng ◽  
Zhengyu Duan ◽  
Dongyuan Yang ◽  
Gaohua Guo

The overall understanding of spatial interaction and the exact knowledge of its dynamic evolution are required in the urban planning and transportation planning. This study aimed to analyze the spatial interaction based on the large-scale mobile phone data. The newly arisen mass dataset required a new methodology which was compatible with its peculiar characteristics. A three-stage framework was proposed in this paper, including data preprocessing, critical activity identification, and spatial interaction measurement. The proposed framework introduced the frequent pattern mining and measured the spatial interaction by the obtained association. A case study of three communities in Shanghai was carried out as verification of proposed method and demonstration of its practical application. The spatial interaction patterns and the representative features proved the rationality of the proposed framework.


2012 ◽  
Vol 195-196 ◽  
pp. 984-986
Author(s):  
Ming Ru Zhao ◽  
Yuan Sun ◽  
Jian Guo ◽  
Ping Ping Dong

Frequent itemsets mining is an important data mining task and a focused theme in data mining research. Apriori algorithm is one of the most important algorithm of mining frequent itemsets. However, the Apriori algorithm scans the database too many times, so its efficiency is relatively low. The paper has therefore conducted a research on the mining frequent itemsets algorithm based on a across linker. Through comparing with the classical algorithm, the improved algorithm has obvious advantages.


2017 ◽  
Author(s):  
◽  
Michael Phinney

Frequent pattern mining is a classic data mining technique, generally applicable to a wide range of application domains, and a mature area of research. The fundamental challenge arises from the combinatorial nature of frequent itemsets, scaling exponentially with respect to the number of unique items. Apriori-based and FPTree-based algorithms have dominated the space thus far. Initial phases of this research relied on the Apriori algorithm and utilized a distributed computing environment; we proposed the Cartesian Scheduler to manage Apriori's candidate generation process. To address the limitation of bottom-up frequent pattern mining algorithms such as Apriori and FPGrowth, we propose the Frequent Hierarchical Pattern Tree (FHPTree): a tree structure and new frequent pattern mining paradigm. The classic problem is redefined as frequent hierarchical pattern mining where the goal is to detect frequent maximal pattern covers. Under the proposed paradigm, compressed representations of maximal patterns are mined using a top-down FHPTree traversal, FHPGrowth, which detects large patterns before their subsets, thus yielding significant reductions in computation time. The FHPTree memory footprint is small; the number of nodes in the structure scales linearly with respect to the number of unique items. Additionally, the FHPTree serves as a persistent, dynamic data structure to index frequent patterns and enable efficient searches. When the search space is exponential, efficient targeted mining capabilities are paramount; this is one of the key contributions of the FHPTree. This dissertation will demonstrate the performance of FHPGrowth, achieving a 300x speed up over state-of-the-art maximal pattern mining algorithms and approximately a 2400x speedup when utilizing FHPGrowth in a distributed computing environment. In addition, we allude to future research opportunities, and suggest various modifications to further optimize the FHPTree and FHPGrowth. Moreover, the methods we offer will have an impact on other data mining research areas including contrast set mining as well as spatial and temporal mining.


Sign in / Sign up

Export Citation Format

Share Document