A pornographic web page detecting method based on SVM model using text and image features

2006 ◽  
Vol 1 (4) ◽  
pp. 264
Author(s):  
Rung Ching Chen ◽  
Chun Te Ho
Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1502
Author(s):  
Ben Wilkes ◽  
Igor Vatolkin ◽  
Heinrich Müller

We present a multi-modal genre recognition framework that considers the modalities audio, text, and image by features extracted from audio signals, album cover images, and lyrics of music tracks. In contrast to pure learning of features by a neural network as done in the related work, handcrafted features designed for a respective modality are also integrated, allowing for higher interpretability of created models and further theoretical analysis of the impact of individual features on genre prediction. Genre recognition is performed by binary classification of a music track with respect to each genre based on combinations of elementary features. For feature combination a two-level technique is used, which combines aggregation into fixed-length feature vectors with confidence-based fusion of classification results. Extensive experiments have been conducted for three classifier models (Naïve Bayes, Support Vector Machine, and Random Forest) and numerous feature combinations. The results are presented visually, with data reduction for improved perceptibility achieved by multi-objective analysis and restriction to non-dominated data. Feature- and classifier-related hypotheses are formulated based on the data, and their statistical significance is formally analyzed. The statistical analysis shows that the combination of two modalities almost always leads to a significant increase of performance and the combination of three modalities in several cases.


2021 ◽  
Vol 32 (4) ◽  
pp. 1-13
Author(s):  
Xia Feng ◽  
Zhiyi Hu ◽  
Caihua Liu ◽  
W. H. Ip ◽  
Huiying Chen

In recent years, deep learning has achieved remarkable results in the text-image retrieval task. However, only global image features are considered, and the vital local information is ignored. This results in a failure to match the text well. Considering that object-level image features can help the matching between text and image, this article proposes a text-image retrieval method that fuses salient image feature representation. Fusion of salient features at the object level can improve the understanding of image semantics and thus improve the performance of text-image retrieval. The experimental results show that the method proposed in the paper is comparable to the latest methods, and the recall rate of some retrieval results is better than the current work.


2019 ◽  
Vol 29 (1) ◽  
pp. 1179-1187
Author(s):  
N. Karthika ◽  
B. Janet

Abstract Text documents are significant arrangements of various words, while images are significant arrangements of various pixels/features. In addition, text and image data share a similar semantic structural pattern. With reference to this research, the feature pair is defined as a pair of adjacent image features. The innovative feature pair index graph (FPIG) is constructed from the unique feature pair selected, which is constructed using an inverted index structure. The constructed FPIG is helpful in clustering, classifying and retrieving the image data. The proposed FPIG method is validated against the traditional KMeans++, KMeans and Farthest First cluster methods which have the serious drawback of initial centroid selection and local optima. The FPIG method is analyzed using Iris flower image data, and the analysis yields 88% better results than Farthest First and 28.97% better results than conventional KMeans in terms of sum of squared errors. The paper also discusses the scope for further research in the proposed methodology.


2021 ◽  
pp. 1-13
Author(s):  
Shuo Shi ◽  
Changwei Huo ◽  
Yingchun Guo ◽  
Stephen Lean ◽  
Gang Yan ◽  
...  

Person re-identification with natural language description is a process of retrieving the corresponding person’s image from an image dataset according to a text description of the person. The key challenge in this cross-modal task is to extract visual and text features and construct loss functions to achieve cross-modal matching between text and image. Firstly, we designed a two-branch network framework for person re-identification with natural language description. In this framework we include the following: a Bi-directional Long Short-Term Memory (Bi-LSTM) network is used to extract text features and a truncated attention mechanism is proposed to select the principal component of the text features; a MobileNet is used to extract image features. Secondly, we proposed a Cascade Loss Function (CLF), which includes cross-modal matching loss and single modal classification loss, both with relative entropy function, to fully exploit the identity-level information. The experimental results on the CUHK-PEDES dataset demonstrate that our method achieves better results in Top-5 and Top-10 than other current 10 state-of-the-art algorithms.


Sign in / Sign up

Export Citation Format

Share Document