Pattern Synthesis for Large-Scale Pattern Recognition

Author(s):  
P. Viswanath ◽  
M. Narasimha Murty ◽  
Shalabh Bhatnagar

Two major problems in applying any pattern recognition technique for large and high-dimensional data are (a) high computational requirements and (b) curse of dimensionality (Duda, Hart, & Stork, 2000). Algorithmic improvements and approximate methods can solve the first problem, whereas feature selection (Guyon & Elisseeff, 2003), feature extraction (Terabe, Washio, Motoda, Katai, & Sawaragi, 2002), and bootstrapping techniques (Efron, 1979; Hamamoto, Uchimura, & Tomita, 1997) can tackle the second problem. We propose a novel and unified solution for these problems by deriving a compact and generalized abstraction of the data. By this term, we mean a compact representation of the given patterns from which one can retrieve not only the original patterns but also some artificial patterns. The compactness of the abstraction reduces the computational requirements, and its generalization reduces the curse of dimensionality effect. Pattern synthesis techniques accompanied with compact representations attempt to derive compact and generalized abstractions of the data. These techniques are applied with nearest neighbor classifier (NNC), which is a popular nonparametric classifier used in many fields, including data mining, since its conception in the early 1950s (Dasarathy, 2002).

Author(s):  
P. Viswanath ◽  
Narasimha M. Murty ◽  
Bhatnagar Shalabh

Parametric methods first choose the form of the model or hypotheses and estimates the necessary parameters from the given dataset. The form, which is chosen, based on experience or domain knowledge, often, need not be the same thing as that which actually exists (Duda, Hart & Stork, 2000). Further, apart from being highly error-prone, this type of methods shows very poor adaptability for dynamically changing datasets. On the other hand, non-parametric pattern recognition methods are attractive because they do not derive any model, but works with the given dataset directly. These methods are highly adaptive for dynamically changing datasets. Two widely used non-parametric pattern recognition methods are (a) the nearest neighbor based classification and (b) the Parzen-Window based density estimation (Duda, Hart & Stork, 2000). Two major problems in applying the non-parametric methods, especially, with large and high dimensional datasets are (a) the high computational requirements and (b) the curse of dimensionality (Duda, Hart & Stork, 2000). Algorithmic improvements, approximate methods can solve the first problem whereas feature selection (Isabelle Guyon & André Elisseeff, 2003), feature extraction (Terabe, Washio, Motoda, Katai & Sawaragi, 2002) and bootstrapping techniques (Efron, 1979; Hamamoto, Uchimura & Tomita, 1997) can tackle the second problem. We propose a novel and unified solution for these problems by deriving a compact and generalized abstraction of the data. By this term, we mean a compact representation of the given patterns from which one can retrieve not only the original patterns but also some artificial patterns. The compactness of the abstraction reduces the computational requirements, and its generalization reduces the curse of dimensionality effect. Pattern synthesis techniques accompanied with compact representations attempt to derive compact and generalized abstractions of the data. These techniques are applied with (a) the nearest neighbor classifier (NNC) which is a popular non-parametric classifier used in many fields including data mining since its conception in the early fifties (Dasarathy, 2002) and (b) the Parzen-Window based density estimation which is a well known non-parametric density estimation method (Duda, Hart & Stork, 2000).


2020 ◽  
Vol 2020 ◽  
pp. 1-8
Author(s):  
Fengying Li ◽  
Enyi Yang ◽  
Anqiao Ma ◽  
Rongsheng Dong

The application of appropriate graph data compression technology to store and manipulate graph data with tens of thousands of nodes and edges is a prerequisite for analyzing large-scale graph data. The traditional K2-tree representation scheme mechanically partitions the adjacency matrix, which causes the dense interval to be split, resulting in additional storage overhead. As the size of the graph data increases, the query time of K2-tree continues to increase. In view of the above problems, we propose a compact representation scheme for graph data based on grid clustering and K2-tree. Firstly, we divide the adjacency matrix into several grids of the same size. Then, we continuously filter and merge these grids until grid density satisfies the given density threshold. Finally, for each large grid that meets the density, K2-tree compact representation is performed. On this basis, we further give the relevant node neighbor query algorithm. The experimental results show that compared with the current best K2-BDC algorithm, our scheme can achieve better time/space tradeoff.


Author(s):  
C. Radha

An important problem in pattern recognition is that of pattern classification. The objective of classification is to determine a discriminant function which is consistent with the given training examples and performs reasonably well on an unlabeled test set of examples. The degree of performance of the classifier on the test examples, known as its generalization performance, is an important issue in the design of the classifier. It has been established that a good generalization performance can be achieved by providing the learner with a sufficiently large number of discriminative training examples. However, in many domains, it is infeasible or expensive to obtain a sufficiently large training set. Various mechanisms have been proposed in literature to combat this problem. Active Learning techniques (Angluin, 1998; Seung, Opper, & Sompolinsky, 1992) reduce the number of training examples required by carefully choosing discriminative training examples. Bootstrapping (Efron, 1979; Hamamoto, Uchimura & Tomita, 1997) and other pattern synthesis techniques generate a synthetic training set from the given training set. We present some of these techniques and propose some general mechanisms for pattern synthesis.


2013 ◽  
Vol 9 (3) ◽  
pp. 1099-1109
Author(s):  
Dr. H. B. Kekre ◽  
Dr. Tanuja K. Sarode ◽  
Jagruti K. Save

The paper presents a new approach of finding nearest neighbor in image classification algorithm by proposing efficient method for similarity measure. Generally in supervised classification, after finding the feature vectors of training images and testing images, nearest neighbor classifier does the classification job. This classifier uses different distance measures such as Euclidean distance, Manhattan distance etc. to find the nearest training feature vector. This paper proposes to use Mean Squared Error (MSE) to find the nearness between two images. Initially Independent Principal Component Analysis (PCA),which we discussed in our earlier work, is applied to images of each class to generate Eigen coordinate system for that class. Then for the given test image, a set of feature vectors is generated. New images are reconstructed using each Eigen coordinate system and the corresponding test feature vector. Lowest MSE between the given test image and new reconstructed image indicates the corresponding class for that image. The experiments are conducted on COIL-100 database. The performance is also compared with  distance based nearest neighbor classifier. Results show that the proposed method achieves high accuracy even for small size of training set.


Author(s):  
Junjie Chen ◽  
William K. Cheung ◽  
Anran Wang

Hashing is an efficient approximate nearest neighbor search method and has been widely adopted for large-scale multimedia retrieval. While supervised learning is more popular for the data-dependent hashing, deep unsupervised hashing methods have recently been developed to learn non-linear transformations for converting multimedia inputs to binary codes. Most of existing deep unsupervised hashing methods make use of a quadratic constraint for minimizing the difference between the compact representations and the target binary codes, which inevitably causes severe information loss. In this paper, we propose a novel deep unsupervised method called DeepQuan for hashing. The DeepQuan model utilizes a deep autoencoder network, where the encoder is used to learn compact representations and the decoder is for manifold preservation. To contrast with the existing unsupervised methods, DeepQuan learns the binary codes by minimizing the quantization error through product quantization technique. Furthermore, a weighted triplet loss is proposed to avoid trivial solution and poor generalization. Extensive experimental results on standard datasets show that the proposed DeepQuan model outperforms the state-of-the-art unsupervised hashing methods for image retrieval tasks.


2005 ◽  
Vol 38 (8) ◽  
pp. 1187-1195 ◽  
Author(s):  
P. Viswanath ◽  
Narasimha Murty ◽  
Shalabh Bhatnagar

Sign in / Sign up

Export Citation Format

Share Document