Detecting tag spams for social bookmarking Websites using a text mining approach

2014 ◽  
Vol 13 (02) ◽  
pp. 387-406 ◽  
Author(s):  
Hsin-Chang Yang ◽  
Chung-Hong Lee

Social bookmarking Websites are popular nowadays for they provide platforms that are easy and clear to browse and organize Web pages. Users can add tags on Web pages to allow easy comprehension and retrieval of Web pages. However, tag spams could also be added to promote the opportunity of being referenced of a Web page, which is troublesome to users for accessing uninterested Web pages. In this work, we proposed a scheme to automatically detect such tag spams using a proposed text mining approach based on self-organizing map (SOM) model. We used SOM to find the associations among Web pages as well as tags. Such associations were then used to discover the relationships between Web pages and tags. Tag spams can then be detected according to such relationships. Experiments were conducted on a set of Web pages collected from a social bookmarking site and obtained promising result.

2019 ◽  
Vol 1 (1) ◽  
pp. 194-202
Author(s):  
Adrian Costea

Abstract This paper assesses the financial performance of Romania’s non-banking financial institutions (NFIs) using a neural network training algorithm proposed by Kohonen, namely the Self-Organizing Maps algorithm. The algorithm takes the financial dataset and positiones each observation into a self-organizing map (a two-dimensional map) which can be latter used to visualize the trajectories of an individual NFI and explain it based on different performance dimensions, such as capital adequacy, assets’ quality and profitability. Further, we use the map as an early-warning system that would accurately forecast the NFIs future performance (whether they would stay or be eliminated from the NFI’s Special Register three quarters into the future). The results are promising: the model is able to correctly predict NFIs’ performance movements. Finally, we compared the results of our SOM-based model with those obtained by applying a multivariate logit-based model. The SOM model performed worse in discriminating the NFIs’ performance: the performance classes were not clearly defined and the model lacked the interpretability of the results. In the contrary, the multivariate logit coefficients have nice interpretability and an individual default probability estimate is obtained for each new observation. However, we can benefit from the results of both techniques: the visualization capabilities of the SOM model and the interpretability of multivariate logit-based model.


2019 ◽  
Vol 9 (5) ◽  
pp. 182
Author(s):  
Abdulfattah Omar ◽  
Aldawsari Bader Deraan

This study proposes an integrated framework that considers letter-pair frequencies/combinations along with the lexical features of documents as a means to identifying the authorship of short texts posted anonymously on social media. Taking a quantitative morpho-lexical approach, this study tests the hypothesis that letter information, or mapping, can identify unique stylistic features. As such, stable word combinations and morphological patterns can be used successfully for authorship detection in relation to very short texts. This method offers significant potential in the fight against online hate speech, which is often posted anonymously and where authorship is difficult to identify. The data analyzed is from a corpus of 12,240 tweets derived from 87 Twitter accounts. A self-organizing map (SOM) model was used to classify input patterns in the tweets that shared common features. Tweets grouped in a particular class displayed features that suggested they were written by a particular author. The results indicate that the accuracy of classification according to the proposed system was around 76%. Up to 22% of this accuracy was lost, however, when only distinctive words were used and 26% was lost when the classification procedure was based solely on letter combinations and morphological patterns. The integration of letter-pairs and morphological patterns had the advantage of improving accuracy when determining the author of a given tweet. This indicates that the integration of different linguistic variables into an integrated system leads to better performance in classifying very short texts. It is also clear that the use of a self-organizing map (SOM) led to better clustering performance because of its capacity to integrate two different linguistic levels for each author profile.


Author(s):  
Tien Ho-Phuoc ◽  
Anne Guerin-Dugue

The Self-Organizing Map (Kohonen, 1997) is an effective and a very popular tool for data clustering and visualization. With this method, the input samples are projected into a low dimension space while preserving their topology. The samples are described by a set of features. The input space is generally a high dimensional space Rd. 2D or 3D maps are very often used for visualization in a low dimension space (2 or 3). For many applications, usually in psychology, biology, genetic, image and signal processing, such vector description is not available; only pair-wise dissimilarity data is provided. For instance, applications in Text Mining or ADN exploration are very important in this field and the observations are usually described through their proximities expressed by the “Levenshtein”, or “String Edit” distances (Levenshtein, 1966). The first approach consists of the transformation of a dissimilarity matrix into a true Euclidean distance matrix. A straightforward strategy is to use “Multidimensional Scaling” techniques (Borg & Groenen, 1997) to provide a feature space. So, the initial vector SOM algorithm can be naturally used. If this transformation involves great distortions, the initial vector model for SOM is no longer valid, and the analysis of dissimilarity data requires specific techniques (Jain & Dubes, 1988; Van Cutsem, 1994) and Dissimilarity Self Organizing Map (DSOM) is a new one. Consequently, adaptation of the Self-Organizing Map (SOM) to dissimilarity data is of a growing interest. During this last decade, different propositions emerged to extend the vector SOM model to pair-wise dissimilarity data. The main motivation is to cope with large proximity databases for data mining. In this article, we present a new adaptation of the SOM algorithm which is compared with two existing ones.


2011 ◽  
Vol 2011 ◽  
pp. 1-11 ◽  
Author(s):  
S. Hasan ◽  
S. M. Shamsuddin

Multistrategy Learning of Self-Organizing Map (SOM) and Particle Swarm Optimization (PSO) is commonly implemented in clustering domain due to its capabilities in handling complex data characteristics. However, some of these multistrategy learning architectures have weaknesses such as slow convergence time always being trapped in the local minima. This paper proposes multistrategy learning of SOM lattice structure with Particle Swarm Optimisation which is called ESOMPSO for solving various classification problems. The enhancement of SOM lattice structure is implemented by introducing a new hexagon formulation for better mapping quality in data classification and labeling. The weights of the enhanced SOM are optimised using PSO to obtain better output quality. The proposed method has been tested on various standard datasets with substantial comparisons with existing SOM network and various distance measurement. The results show that our proposed method yields a promising result with better average accuracy and quantisation errors compared to the other methods as well as convincing significant test.


Sign in / Sign up

Export Citation Format

Share Document