Detecting tag spams for social bookmarking Websites using a text mining approach

Social bookmarking Websites are popular nowadays for they provide platforms that are easy and clear to browse and organize Web pages. Users can add tags on Web pages to allow easy comprehension and retrieval of Web pages. However, tag spams could also be added to promote the opportunity of being referenced of a Web page, which is troublesome to users for accessing uninterested Web pages. In this work, we proposed a scheme to automatically detect such tag spams using a proposed text mining approach based on self-organizing map (SOM) model. We used SOM to find the associations among Web pages as well as tags. Such associations were then used to discover the relationships between Web pages and tags. Tag spams can then be detected according to such relationships. Experiments were conducted on a set of Web pages collected from a social bookmarking site and obtained promising result.

Download Full-text

On building early-warning systems for preventing the deterioration of financial institutions’ performance

Proceedings of the International Conference on Applied Statistics ◽

10.2478/icas-2019-0017 ◽

2019 ◽

Vol 1 (1) ◽

pp. 194-202

Author(s):

Adrian Costea

Keyword(s):

Financial Institutions ◽

Early Warning ◽

Warning System ◽

Early Warning Systems ◽

Capital Adequacy ◽

Self Organizing Map ◽

Self Organizing Maps ◽

Performance Dimensions ◽

Som Model ◽

Self Organizing

Abstract This paper assesses the financial performance of Romania’s non-banking financial institutions (NFIs) using a neural network training algorithm proposed by Kohonen, namely the Self-Organizing Maps algorithm. The algorithm takes the financial dataset and positiones each observation into a self-organizing map (a two-dimensional map) which can be latter used to visualize the trajectories of an individual NFI and explain it based on different performance dimensions, such as capital adequacy, assets’ quality and profitability. Further, we use the map as an early-warning system that would accurately forecast the NFIs future performance (whether they would stay or be eliminated from the NFI’s Special Register three quarters into the future). The results are promising: the model is able to correctly predict NFIs’ performance movements. Finally, we compared the results of our SOM-based model with those obtained by applying a multivariate logit-based model. The SOM model performed worse in discriminating the NFIs’ performance: the performance classes were not clearly defined and the model lacked the interpretability of the results. In the contrary, the multivariate logit coefficients have nice interpretability and an individual default probability estimate is obtained for each new observation. However, we can benefit from the results of both techniques: the visualization capabilities of the SOM model and the interpretability of multivariate logit-based model.

Download Full-text

A Web text mining approach based on self-organizing map

Proceedings of the second international workshop on Web information and data management - WIDM '99 ◽

10.1145/319759.319789 ◽

1999 ◽

Cited By ~ 18

Author(s):

Chung-Hong Lee ◽

Hsin-Chang Yang

Keyword(s):

Text Mining ◽

Self Organizing Map ◽

Web Text Mining ◽

Self Organizing

Download Full-text

HDGSOMr: A High Dimensional Growing Self-Organizing Map Using Randomness for Efficient Web and Text Mining

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05) ◽

10.1109/wi.2005.70 ◽

2005 ◽

Cited By ~ 10

Author(s):

R. Amarasiri ◽

D. Alahakoon ◽

K. Smith ◽

M. Premaratne

Keyword(s):

Text Mining ◽

High Dimensional ◽

Self Organizing Map ◽

Self Organizing

Download Full-text

Image Retrieval Using the Curvature Scale Space (CSS) Technique and the Self-Organizing Map (SOM) Model under Affine Transforms

7th International Conference on Hybrid Intelligent Systems (HIS 2007) ◽

10.1109/ichis.2007.4344053 ◽

2007 ◽

Author(s):

Carlos W.D. de Almeida ◽

Renata M.C.R. de Souza ◽

Carlos E.B. Rodrigues ◽

Nicomedes L. Junior Cavalcanti

Keyword(s):

Image Retrieval ◽

Scale Space ◽

The Self ◽

Self Organizing Map ◽

Curvature Scale Space ◽

Som Model ◽

Affine Transforms ◽

Self Organizing

Download Full-text

Towards a Linguistic Stylometric Model for the Authorship Detection in Cybercrime Investigations

International Journal of English Linguistics ◽

10.5539/ijel.v9n5p182 ◽

2019 ◽

Vol 9 (5) ◽

pp. 182

Author(s):

Abdulfattah Omar ◽

Aldawsari Bader Deraan

Keyword(s):

Hate Speech ◽

Letter Pair ◽

Integrated System ◽

Self Organizing Map ◽

Linguistic Variables ◽

Lexical Approach ◽

Som Model ◽

Improving Accuracy ◽

Morphological Patterns ◽

Self Organizing

This study proposes an integrated framework that considers letter-pair frequencies/combinations along with the lexical features of documents as a means to identifying the authorship of short texts posted anonymously on social media. Taking a quantitative morpho-lexical approach, this study tests the hypothesis that letter information, or mapping, can identify unique stylistic features. As such, stable word combinations and morphological patterns can be used successfully for authorship detection in relation to very short texts. This method offers significant potential in the fight against online hate speech, which is often posted anonymously and where authorship is difficult to identify. The data analyzed is from a corpus of 12,240 tweets derived from 87 Twitter accounts. A self-organizing map (SOM) model was used to classify input patterns in the tweets that shared common features. Tweets grouped in a particular class displayed features that suggested they were written by a particular author. The results indicate that the accuracy of classification according to the proposed system was around 76%. Up to 22% of this accuracy was lost, however, when only distinctive words were used and 26% was lost when the classification procedure was based solely on letter combinations and morphological patterns. The integration of letter-pairs and morphological patterns had the advantage of improving accuracy when determining the author of a given tweet. This indicates that the integration of different linguistic variables into an integrated system leads to better performance in classifying very short texts. It is also clear that the use of a self-organizing map (SOM) led to better clustering performance because of its capacity to integrate two different linguistic levels for each author profile.

Download Full-text

Incorporating self-organizing map with text mining techniques for text hierarchy generation

Applied Soft Computing ◽

10.1016/j.asoc.2015.05.005 ◽

2015 ◽

Vol 34 ◽

pp. 251-259 ◽

Cited By ~ 8

Author(s):

Hsin-Chang Yang ◽

Chung-Hong Lee ◽

Han-Wei Hsiao

Keyword(s):

Text Mining ◽

Self Organizing Map ◽

Hierarchy Generation ◽

Self Organizing

Download Full-text

A New Self-Organizing Map for Dissimilarity Data

Encyclopedia of Artificial Intelligence ◽

10.4018/978-1-59904-849-9.ch182 ◽

2011 ◽

pp. 1244-1252

Author(s):

Tien Ho-Phuoc ◽

Anne Guerin-Dugue

Keyword(s):

Dimensional Space ◽

The Self ◽

Initial Vector ◽

Self Organizing Map ◽

Dissimilarity Data ◽

Low Dimension ◽

Som Algorithm ◽

Som Model ◽

Dimension Space ◽

Self Organizing

The Self-Organizing Map (Kohonen, 1997) is an effective and a very popular tool for data clustering and visualization. With this method, the input samples are projected into a low dimension space while preserving their topology. The samples are described by a set of features. The input space is generally a high dimensional space Rd. 2D or 3D maps are very often used for visualization in a low dimension space (2 or 3). For many applications, usually in psychology, biology, genetic, image and signal processing, such vector description is not available; only pair-wise dissimilarity data is provided. For instance, applications in Text Mining or ADN exploration are very important in this field and the observations are usually described through their proximities expressed by the “Levenshtein”, or “String Edit” distances (Levenshtein, 1966). The first approach consists of the transformation of a dissimilarity matrix into a true Euclidean distance matrix. A straightforward strategy is to use “Multidimensional Scaling” techniques (Borg & Groenen, 1997) to provide a feature space. So, the initial vector SOM algorithm can be naturally used. If this transformation involves great distortions, the initial vector model for SOM is no longer valid, and the analysis of dissimilarity data requires specific techniques (Jain & Dubes, 1988; Van Cutsem, 1994) and Dissimilarity Self Organizing Map (DSOM) is a new one. Consequently, adaptation of the Self-Organizing Map (SOM) to dissimilarity data is of a growing interest. During this last decade, different propositions emerged to extend the vector SOM model to pair-wise dissimilarity data. The main motivation is to cope with large proximity databases for data mining. In this article, we present a new adaptation of the SOM algorithm which is compared with two existing ones.

Download Full-text

A novel self-organizing map algorithm for text mining

2010 International Conference on System Science and Engineering ◽

10.1109/icsse.2010.5551734 ◽

2010 ◽

Cited By ~ 5

Author(s):

Hsin-Chang Yang ◽

Chung-Hong Lee

Keyword(s):

Text Mining ◽

Self Organizing Map ◽

Map Algorithm ◽

Self Organizing

Download Full-text

Web page clustering using a self-organizing map of user navigation patterns

Decision Support Systems ◽

10.1016/s0167-9236(02)00109-4 ◽

2003 ◽

Vol 35 (2) ◽

pp. 245-256 ◽

Cited By ~ 74

Author(s):

Kate A. Smith ◽

Alan Ng

Keyword(s):

Self Organizing Map ◽

Web Page ◽

Navigation Patterns ◽

Web Page Clustering ◽

User Navigation ◽

Self Organizing

Download Full-text

Multistrategy Self-Organizing Map Learning for Classification Problems

Computational Intelligence and Neuroscience ◽

10.1155/2011/121787 ◽

2011 ◽

Vol 2011 ◽

pp. 1-11 ◽

Cited By ~ 8

Author(s):

S. Hasan ◽

S. M. Shamsuddin

Keyword(s):

Lattice Structure ◽

Promising Result ◽

Particle Swarm ◽

Convergence Time ◽

Multistrategy Learning ◽

Complex Data ◽

Self Organizing Map ◽

Classification Problems ◽

Average Accuracy ◽

Self Organizing

Multistrategy Learning of Self-Organizing Map (SOM) and Particle Swarm Optimization (PSO) is commonly implemented in clustering domain due to its capabilities in handling complex data characteristics. However, some of these multistrategy learning architectures have weaknesses such as slow convergence time always being trapped in the local minima. This paper proposes multistrategy learning of SOM lattice structure with Particle Swarm Optimisation which is called ESOMPSO for solving various classification problems. The enhancement of SOM lattice structure is implemented by introducing a new hexagon formulation for better mapping quality in data classification and labeling. The weights of the enhanced SOM are optimised using PSO to obtain better output quality. The proposed method has been tested on various standard datasets with substantial comparisons with existing SOM network and various distance measurement. The results show that our proposed method yields a promising result with better average accuracy and quantisation errors compared to the other methods as well as convincing significant test.

Download Full-text