Connected bit minwise hashing for large-scale linear SVM

Author(s):  
Jingjing Tang ◽  
Yingjie Tian ◽  
Dalian Liu
Author(s):  
Jun Long ◽  
Qunfeng Liu ◽  
Xinpan Yuan ◽  
Chengyuan Zhang ◽  
Junfeng Liu ◽  
...  

Image similarity measures play an important role in nearest neighbor search and duplicate detection for large-scale image datasets. Recently, Minwise Hashing (or Minhash) and its related hashing algorithms have achieved great performances in large-scale image retrieval systems. However, there are a large number of comparisons for image pairs in these applications, which may spend a lot of computation time and affect the performance. In order to quickly obtain the pairwise images that theirs similarities are higher than the specific thresholdT(e.g., 0.5), we propose a dynamic threshold filter of Minwise Hashing for image similarity measures. It greatly reduces the calculation time by terminating the unnecessary comparisons in advance. We also find that the filter can be extended to other hashing algorithms, on when the estimator satisfies the binomial distribution, such as b-Bit Minwise Hashing, One Permutation Hashing, etc. In this pager, we use the Bag-of-Visual-Words (BoVW) model based on the Scale Invariant Feature Transform (SIFT) to represent the image features. We have proved that the filter is correct and effective through the experiment on real image datasets.


Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5710
Author(s):  
Yukun Wu ◽  
Wei William Lee ◽  
Xuan Gong ◽  
Hui Wang

Owing to the constraints of time and space complexity, network intrusion detection systems (NIDSs) based on support vector machines (SVMs) face the “curse of dimensionality” in a large-scale, high-dimensional feature space. This study proposes a joint training model that combines a stacked autoencoder (SAE) with an SVM and the kernel approximation technique. The training model uses the SAE to perform feature dimension reduction, uses random Fourier features to perform kernel approximation, and then random Fourier mapping is explicitly applied to the sub-sample to generate the random feature space, making it possible to apply a linear SVM to uniformly approximate to the Gaussian kernel SVM. Finally, the SAE performs joint training with the efficient linear SVM. We studied the effects of an SAE structure and a random Fourier feature on classification performance, and compared that performance with that of other training models, including some without kernel approximation. At the same time, we compare the accuracy of the proposed model with that of other models, which include basic machine learning models and the state-of-the-art models in other literatures. The experimental results demonstrate that the proposed model outperforms the previously proposed methods in terms of classification performance and also reduces the training time. Our model is feasible and works efficiently on large-scale datasets.


2020 ◽  
Vol 9 (1) ◽  
pp. 2640-2645

In this paper, Question Categorization (QC) has been studied most primarily in order to understand customers' search intention. In both of these searches, the items in the question list relate to the category label belonging to the taxonomy tree that is being examined. Despite this, search queries about the product usually vary depending on what is vague, and introduce new products over time, seasonal trends and narrow. Traditional supervised approaches to E-Commerce QC are not possible due to the high volume of traffic and high cost for manual annotation in E-Commerce search engines. Here, clickstream data is utilized to determine the effectiveness of a channel's marketplace. So, using the customer's click concept, to collect large-scale question categorization data, this paper uses unsupervised methods that means SVM algorithm is mainly used in this system. Here the data is in the multiclass and multi-label classifier is used to classify them. This paper gets on a large multi-label data set with specific and individual queries from a specific category. In this paper, a comparison of different sophisticated text classifiers is viewed. This paper calculates the micro-F1 scores of top and leaf, which are considered to be a linear SVM-ensemble.


2019 ◽  
Vol 6 (2) ◽  
pp. 3948-3961
Author(s):  
Chunkai Zhang ◽  
Panbo Tian ◽  
Xudong Zhang ◽  
Zoe L Jiang ◽  
Lin Yao ◽  
...  
Keyword(s):  

1999 ◽  
Vol 173 ◽  
pp. 243-248
Author(s):  
D. Kubáček ◽  
A. Galád ◽  
A. Pravda

AbstractUnusual short-period comet 29P/Schwassmann-Wachmann 1 inspired many observers to explain its unpredictable outbursts. In this paper large scale structures and features from the inner part of the coma in time periods around outbursts are studied. CCD images were taken at Whipple Observatory, Mt. Hopkins, in 1989 and at Astronomical Observatory, Modra, from 1995 to 1998. Photographic plates of the comet were taken at Harvard College Observatory, Oak Ridge, from 1974 to 1982. The latter were digitized at first to apply the same techniques of image processing for optimizing the visibility of features in the coma during outbursts. Outbursts and coma structures show various shapes.


1994 ◽  
Vol 144 ◽  
pp. 29-33
Author(s):  
P. Ambrož

AbstractThe large-scale coronal structures observed during the sporadically visible solar eclipses were compared with the numerically extrapolated field-line structures of coronal magnetic field. A characteristic relationship between the observed structures of coronal plasma and the magnetic field line configurations was determined. The long-term evolution of large scale coronal structures inferred from photospheric magnetic observations in the course of 11- and 22-year solar cycles is described.Some known parameters, such as the source surface radius, or coronal rotation rate are discussed and actually interpreted. A relation between the large-scale photospheric magnetic field evolution and the coronal structure rearrangement is demonstrated.


Sign in / Sign up

Export Citation Format

Share Document