A Semantics Sensitive Framework of Organization and Retrieval for Multimedia Databases

Semantics-based retrieval is a trend of the Content-Based Multimedia Retrieval (CBMR). Typically, in multimedia databases, there exist two kinds of clues for query: perceptive features and semantic classes. In this chapter, we proposed a novel framework for multimedia database organization and retrieval, integrating the perceptive features and semantic classes. Thereunto, a semantics supervised cluster-based index organization approach (briefly as SSCI) was developed: the entire data set is divided hierarchically into many clusters until the objects within a cluster are not only close in the perceptive feature space, but also within the same semantic class; then an index entry is built for each cluster. Especially, the perceptive feature vectors in a cluster are organized adjacently in disk. Furthermore, the SSCI supports a relevance feedback approach: users sign the positive and negative examples regarded a cluster as unit rather than a single object. Our experiments show that the proposed framework can improve the retrieval speed and precision of the CBMR systems significantly.

Download Full-text

Kernel principal component analysis for multimedia retrieval

Global Journal of Information Technology Emerging Technologies ◽

10.18844/gjit.v6i1.384 ◽

2016 ◽

Vol 6 (1) ◽

Author(s):

Guang-Ho Cha

Keyword(s):

Principal Component Analysis ◽

Dimensionality Reduction ◽

Principal Components ◽

Principal Component ◽

Feature Space ◽

Component Analysis ◽

Multimedia Retrieval ◽

Kernel Principal Component Analysis ◽

Kernel Pca ◽

Data Set

Principal component analysis (PCA) is an important tool in many areas including data reduction and interpretation, information retrieval, image processing, and so on. Kernel PCA has recently been proposed as a nonlinear extension of the popular PCA. The basic idea is to first map the input space into a feature space via a nonlinear map and then compute the principal components in that feature space. This paper illustrates the potential of kernel PCA for dimensionality reduction and feature extraction in multimedia retrieval. By the use of Gaussian kernels, the principal components were computed in the feature space of an image data set and they are used as new dimensions to approximate image features. Extensive experimental results show that kernel PCA performs better than linear PCA with respect to the retrieval quality as well as the retrieval precision in content-based image retrievals.Keywords: Principal component analysis, kernel principal component analysis, multimedia retrieval, dimensionality reduction, image retrieval

Download Full-text

An Expert System Based on Fisher Score and LS-SVM for Cardiac Arrhythmia Diagnosis

Computational and Mathematical Methods in Medicine ◽

10.1155/2013/849674 ◽

2013 ◽

Vol 2013 ◽

pp. 1-6 ◽

Cited By ~ 19

Author(s):

Ersen Yılmaz

Keyword(s):

Expert System ◽

Cardiac Arrhythmia ◽

Feature Space ◽

Support Vector ◽

Feature Subset ◽

Fisher Score ◽

Data Set ◽

Second Stage ◽

Vector Machines ◽

Two Stages

An expert system having two stages is proposed for cardiac arrhythmia diagnosis. In the first stage, Fisher score is used for feature selection to reduce the feature space dimension of a data set. The second stage is classification stage in which least squares support vector machines classifier is performed by using the feature subset selected in the first stage to diagnose cardiac arrhythmia. Performance of the proposed expert system is evaluated by using an arrhythmia data set which is taken from UCI machine learning repository.

Download Full-text

Embedding Undersampling Rotation Forest for Imbalanced Problem

Computational Intelligence and Neuroscience ◽

10.1155/2018/6798042 ◽

2018 ◽

Vol 2018 ◽

pp. 1-15 ◽

Cited By ~ 3

Author(s):

Huaping Guo ◽

Xiaoyu Diao ◽

Hongbing Liu

Keyword(s):

Imbalanced Data ◽

Feature Space ◽

Original Data ◽

Training Set ◽

Data Set ◽

Minority Class ◽

Rotation Forest ◽

Novel Method ◽

Individual Classifier ◽

The Cost

Rotation Forest is an ensemble learning approach achieving better performance comparing to Bagging and Boosting through building accurate and diverse classifiers using rotated feature space. However, like other conventional classifiers, Rotation Forest does not work well on the imbalanced data which are characterized as having much less examples of one class (minority class) than the other (majority class), and the cost of misclassifying minority class examples is often much more expensive than the contrary cases. This paper proposes a novel method called Embedding Undersampling Rotation Forest (EURF) to handle this problem (1) sampling subsets from the majority class and learning a projection matrix from each subset and (2) obtaining training sets by projecting re-undersampling subsets of the original data set to new spaces defined by the matrices and constructing an individual classifier from each training set. For the first method, undersampling is to force the rotation matrix to better capture the features of the minority class without harming the diversity between individual classifiers. With respect to the second method, the undersampling technique aims to improve the performance of individual classifiers on the minority class. The experimental results show that EURF achieves significantly better performance comparing to other state-of-the-art methods.

Download Full-text

Exploiting node metadata to predict interactions in large networks using graph embedding and neural networks

10.1101/2021.06.10.447991 ◽

2021 ◽

Author(s):

Rogini Runghen ◽

Daniel B Stouffer ◽

Giulio Valentino Dalla Riva

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Link Prediction ◽

Graph Embedding ◽

Feature Space ◽

Machine Learning Techniques ◽

Large Networks ◽

Data Set ◽

Learning Techniques ◽

Low Dimensional

Collecting network interaction data is difficult. Non-exhaustive sampling and complex hidden processes often result in an incomplete data set. Thus, identifying potentially present but unobserved interactions is crucial both in understanding the structure of large scale data, and in predicting how previously unseen elements will interact. Recent studies in network analysis have shown that accounting for metadata (such as node attributes) can improve both our understanding of how nodes interact with one another, and the accuracy of link prediction. However, the dimension of the object we need to learn to predict interactions in a network grows quickly with the number of nodes. Therefore, it becomes computationally and conceptually challenging for large networks. Here, we present a new predictive procedure combining a graph embedding method with machine learning techniques to predict interactions on the base of nodes' metadata. Graph embedding methods project the nodes of a network onto a---low dimensional---latent feature space. The position of the nodes in the latent feature space can then be used to predict interactions between nodes. Learning a mapping of the nodes' metadata to their position in a latent feature space corresponds to a classic---and low dimensional---machine learning problem. In our current study we used the Random Dot Product Graph model to estimate the embedding of an observed network, and we tested different neural networks architectures to predict the position of nodes in the latent feature space. Flexible machine learning techniques to map the nodes onto their latent positions allow to account for multivariate and possibly complex nodes' metadata. To illustrate the utility of the proposed procedure, we apply it to a large dataset of tourist visits to destinations across New Zealand. We found that our procedure accurately predicts interactions for both existing nodes and nodes newly added to the network, while being computationally feasible even for very large networks. Overall, our study highlights that by exploiting the properties of a well understood statistical model for complex networks and combining it with standard machine learning techniques, we can simplify the link prediction problem when incorporating multivariate node metadata. Our procedure can be immediately applied to different types of networks, and to a wide variety of data from different systems. As such, both from a network science and data science perspective, our work offers a flexible and generalisable procedure for link prediction.

Download Full-text

A Distributed Content-Based Video Retrieval System for Large Data-sets

10.21203/rs.3.rs-255106/v1 ◽

2021 ◽

Author(s):

ElMehdi SAOUDI ◽

Said Jai Andaloussi

Keyword(s):

Real Time ◽

Retrieval System ◽

Video Retrieval ◽

Multimedia Databases ◽

Video Data ◽

Multimedia Data ◽

Machine Learning Techniques ◽

Data Sets ◽

Data Set ◽

Content Based Video Retrieval

Abstract With the rapid growth of the volume of video data and the development of multimedia technologies, it has become necessary to have the ability to accurately and quickly browse and search through information stored in large multimedia databases. For this purpose, content-based video retrieval ( CBVR ) has become an active area of research over the last decade. In this paper, We propose a content-based video retrieval system providing similar videos from a large multimedia data-set based on a query video. The approach uses vector motion-based signatures to describe the visual content and uses machine learning techniques to extract key-frames for rapid browsing and efficient video indexing. We have implemented the proposed approach on both, single machine and real-time distributed cluster to evaluate the real-time performance aspect, especially when the number and size of videos are large. Experiments are performed using various benchmark action and activity recognition data-sets and the results reveal the effectiveness of the proposed method in both accuracy and processing time compared to state-of-the-art methods.

Download Full-text

A Novel Density-based Technique for Outlier Detection of High Dimensional Data Utilizing Full Feature Space

Information Technology And Control ◽

10.5755/j01.itc.50.1.25588 ◽

2021 ◽

Vol 50 (1) ◽

pp. 138-152

Author(s):

Mujeeb Ur Rehman ◽

Dost Muhammad Khan

Keyword(s):

Data Mining ◽

Outlier Detection ◽

High Dimensional Data ◽

Research Work ◽

Feature Space ◽

High Dimensional ◽

Data Set ◽

Data Points ◽

Low Dimensional ◽

Intrinsic Feature

Recently, anomaly detection has acquired a realistic response from data mining scientists as a graph of its reputation has increased smoothly in various practical domains like product marketing, fraud detection, medical diagnosis, fault detection and so many other fields. High dimensional data subjected to outlier detection poses exceptional challenges for data mining experts and it is because of natural problems of the curse of dimensionality and resemblance of distant and adjoining points. Traditional algorithms and techniques were experimented on full feature space regarding outlier detection. Customary methodologies concentrate largely on low dimensional data and hence show ineffectiveness while discovering anomalies in a data set comprised of a high number of dimensions. It becomes a very difficult and tiresome job to dig out anomalies present in high dimensional data set when all subsets of projections need to be explored. All data points in high dimensional data behave like similar observations because of its intrinsic feature i.e., the distance between observations approaches to zero as the number of dimensions extends towards infinity. This research work proposes a novel technique that explores deviation among all data points and embeds its findings inside well established density-based techniques. This is a state of art technique as it gives a new breadth of research towards resolving inherent problems of high dimensional data where outliers reside within clusters having different densities. A high dimensional dataset from UCI Machine Learning Repository is chosen to test the proposed technique and then its results are compared with that of density-based techniques to evaluate its efficiency.

Download Full-text

Schaalvergroting in het syntactische alternantieonderzoek : Een nieuwe analyse van het presentatieve er met automatisch gegenereerde predictoren

Nederlandse taalkunde ◽

10.5117/nedtaa2020.1.005.spee ◽

2020 ◽

Vol 25 (1) ◽

pp. 101-123

Author(s):

Dirk Speelman ◽

Stefan Grondelaers ◽

Benedikt Szmrecsanyi ◽

Kris Heylen

Keyword(s):

The Other ◽

Manual Annotation ◽

Distributional Analysis ◽

Syntactic Variation ◽

Semantic Class ◽

Semantic Classes ◽

Reference Corpus ◽

Lexical Collocations ◽

The Subject ◽

Pragmatic Factor

Abstract In this paper, we revisit earlier analyses of the distribution of er ‘there’ in adjunct-initial sentences to demonstrate the merits of computational upscaling in syntactic variation research. Contrary to previous studies, in which major semantic and pragmatic predictors (viz. adjunct type, adjunct concreteness, and verb specificity) had to be coded manually, the present study operationalizes these predictors on the basis of distributional analysis: instead of hand-coding for specific semantic classes, we determine the semantic class of the adjunct, verb, and subject automatically by clustering the lexemes in those slots on the basis of their ‘semantic passport’ (as established on the basis of their distributional behaviour in a reference corpus). These clusters are subsequently interpreted as proxies for semantic classes. In addition, the pragmatic factor ‘subject predictability’ is operationalized automatically on the basis of collocational attraction measures, as well as distributional similarity between the other slots and the subject. We demonstrate that the distribution of er can be modelled equally successfully with the automated approach as in manual annotation-based studies. Crucially, the new method replicates our earlier findings that the Netherlandic data are easier to model than the Belgian data, and that lexical collocations play a bigger role in the Netherlandic than in the Belgian data. On a methodological level, the proposed automatization opens up a window of opportunities. Most important is its scalability: it allows for a larger gamut of alternations that can be investigated in one study, and for much larger datasets to represent each alternation.

Download Full-text

Uncertainty-Aware Deep Classifiers Using Generative Models

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6015 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5620-5627 ◽

Cited By ~ 1

Author(s):

Murat Sensoy ◽

Lance Kaplan ◽

Federico Cerutti ◽

Maryam Saleki

Keyword(s):

Neural Networks ◽

Epistemic Uncertainty ◽

Feature Space ◽

Generative Models ◽

Detection Methods ◽

Generative Adversarial Networks ◽

Data Sets ◽

Bayesian Approaches ◽

Data Set ◽

Auxiliary Data

Deep neural networks are often ignorant about what they do not know and overconfident when they make uninformed predictions. Some recent approaches quantify classification uncertainty directly by training the model to output high uncertainty for the data samples close to class boundaries or from the outside of the training distribution. These approaches use an auxiliary data set during training to represent out-of-distribution samples. However, selection or creation of such an auxiliary data set is non-trivial, especially for high dimensional data such as images. In this work we develop a novel neural network model that is able to express both aleatoric and epistemic uncertainty to distinguish decision boundary and out-of-distribution regions of the feature space. To this end, variational autoencoders and generative adversarial networks are incorporated to automatically generate out-of-distribution exemplars for training. Through extensive analysis, we demonstrate that the proposed approach provides better estimates of uncertainty for in- and out-of-distribution samples, and adversarial examples on well-known data sets against state-of-the-art approaches including recent Bayesian approaches for neural networks and anomaly detection methods.

Download Full-text

Word vs. Class-Based Word Sense Disambiguation

Journal of Artificial Intelligence Research ◽

10.1613/jair.4727 ◽

2015 ◽

Vol 54 ◽

pp. 83-122 ◽

Cited By ~ 4

Author(s):

Ruben Izquierdo ◽

Armando Suarez ◽

German Rigau

Keyword(s):

Word Sense Disambiguation ◽

Coarse Grained ◽

Semantic Features ◽

Word Sense ◽

Simple Method ◽

Word Meanings ◽

Semantic Class ◽

Semantic Classes ◽

Sense Disambiguation ◽

Word Senses

As empirically demonstrated by the Word Sense Disambiguation (WSD) tasks of the last SensEval/SemEval exercises, assigning the appropriate meaning to words in context has resisted all attempts to be successfully addressed. Many authors argue that one possible reason could be the use of inappropriate sets of word meanings. In particular, WordNet has been used as a de-facto standard repository of word meanings in most of these tasks. Thus, instead of using the word senses defined in WordNet, some approaches have derived semantic classes representing groups of word senses. However, the meanings represented by WordNet have been only used for WSD at a very fine-grained sense level or at a very coarse-grained semantic class level (also called SuperSenses). We suspect that an appropriate level of abstraction could be on between both levels. The contributions of this paper are manifold. First, we propose a simple method to automatically derive semantic classes at intermediate levels of abstraction covering all nominal and verbal WordNet meanings. Second, we empirically demonstrate that our automatically derived semantic classes outperform classical approaches based on word senses and more coarse-grained sense groupings. Third, we also demonstrate that our supervised WSD system benefits from using these new semantic classes as additional semantic features while reducing the amount of training examples. Finally, we also demonstrate the robustness of our supervised semantic class-based WSD system when tested on out of domain corpus.

Download Full-text

Multimedia Databases and Data Management

Methods and Innovations for Multimedia Database Content Management ◽

10.4018/978-1-4666-1791-9.ch001 ◽

2012 ◽

pp. 1-11

Author(s):

Shu-Ching Chen

Keyword(s):

Data Management ◽

Relevant Information ◽

Multimedia Databases ◽

Multimedia Data ◽

Formal Structure ◽

Multimedia Database ◽

Storage Devices ◽

Temporal Relationships ◽

Spatio Temporal ◽

Application Requirements

The exponential growth of the technological advancements has resulted in high-resolution devices, such as digital cameras, scanners, monitors, and printers, which enable the capturing and displaying of multimedia data in high-density storage devices. Furthermore, more and more applications need to live with multimedia data. However, the gap between the characteristics of various media types and the application requirements has created the need to develop advanced techniques for multimedia data management and the extraction of relevant information from multimedia databases. Though many research efforts have been devoted to the areas of multimedia databases and data management, it is still far from maturity. The purpose of this article is to discuss how the existing techniques, methodologies, and tools addressed relevant issues and challenges to enable a better understanding in multimedia databases and data management. The focuses include: (1) how to develop a formal structure that can be used to capture the distinguishing content of the media data in a multimedia database (MMDB) and to form an abstract space for the data to be queried; (2) how to develop advanced content analysis and retrieval techniques that can be used to bridge the gaps between the semantic meaning and low-level media characteristics to improve multimedia information retrieval; and (3) how to develop query mechanisms that can handle complex spatial, temporal, and/or spatio-temporal relationships of multimedia data to answer the imprecise and incomplete queries issued to an MMDB.

Download Full-text