A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms

Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LAMBDACC objective (introduced by Veldt et al.), which encompasses modularity and correlation clustering. Our framework consists of highly-optimized implementations that scale to large data sets of billions of edges and that obtain high-quality clusters compared to ground-truth data, on both unweighted and weighted graphs. Our empirical evaluation shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. For example, on a 30-core machine with two-way hyper-threading, our implementations achieve orders of magnitude speedups over other correlation clustering baselines, and up to 28.44× speedups over our own sequential baselines while maintaining or improving quality.

Download Full-text

User encoding for clustering in very sparse recommender systems tasks

Multimedia Tools and Applications ◽

10.1007/s11042-021-11564-x ◽

2021 ◽

Author(s):

Pablo Pérez-Núnez ◽

Jorge Díez ◽

Oscar Luaces ◽

Antonio Bahamonde

Keyword(s):

Recommender Systems ◽

Real World ◽

General Framework ◽

Service Providers ◽

New Products ◽

Clustering Algorithms ◽

Clustering Methods ◽

Homogeneous Groups ◽

Feature Values ◽

Common User

AbstractRecommender Systems are a very useful tool which let companies and service providers focus in the preferences of their customers, helping them to avoid an overwhelming variety of choices. In this context, clustering tools can play an important role to detect groups of customers with similar tastes. Thus, companies can make personalized marketing campaigns, offering to their users new products which have been consumed by other users with comparable preferences. In this paper we present a general framework to cluster users with respect to their tastes when the registers stored about the interactions between users and products are extremely scarce. Commonly, clustering methods employ the values of features describing the samples to be clustered (users in our case), but such features are not always available. We propose some alternative representations for users, in which their tastes are gathered to some extent, so that clustering algorithms can take advantage and make more homogeneous groups in this regard. To illustrate the performance of the whole framework, we tested it on six popular datasets commonly used as a benchmark for recommender systems, as well as on an extremely sparse real-world dataset that records the preferences of readers to click promoted links in digital publications. In the experimental section we compare our proposed representations to other common user encodings. We show that clustering users attending only to their feature values or to the items they have evaluated gives rise to the worst scores in terms of taste homogeneity.

Download Full-text

Unsupervised Clustering of Neighborhood Associations and Image Segmentation Applications

Algorithms ◽

10.3390/a13120309 ◽

2020 ◽

Vol 13 (12) ◽

pp. 309

Author(s):

Zhenggang Wang ◽

Xuantong Li ◽

Jin Jin ◽

Zhong Liu ◽

Wei Liu

Keyword(s):

Remote Sensing ◽

Clustering Analysis ◽

Clustering Algorithms ◽

Optimal Solution ◽

Remote Sensing Image ◽

Neighborhood Density ◽

Correlation Clustering ◽

Density Correlation ◽

Advantages And Disadvantages ◽

Data Points

Irregular shape clustering is always a difficult problem in clustering analysis. In this paper, by analyzing the advantages and disadvantages of existing clustering analysis algorithms, a new neighborhood density correlation clustering (NDCC) algorithm for quickly discovering arbitrary shaped clusters. Because the density of the center region of any cluster sample dataset is greater than that of the edge region, the data points can be divided into core, edge, and noise data points, and then the density correlation of the core data points in their neighborhood can be used to form a cluster. Further more, by constructing an objective function and optimizing the parameters automatically, a locally optimal result that is close to the globally optimal solution can be obtained. This algorithm avoids the clustering errors caused by iso-density points between clusters. We compare this algorithm with other five clustering algorithms and verify it on two common remote sensing image datasets. The results show that it can cluster the same ground objects in remote sensing images into one class and distinguish different ground objects. NDCC has strong robustness to irregular scattering dataset and can solve the clustering problem of remote sensing image.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

Spatial management for protogynous sex-changing fishes: a general framework for coastal systems

Marine Ecology Progress Series ◽

10.3354/meps11574 ◽

2016 ◽

Vol 543 ◽

pp. 223-240 ◽

Cited By ~ 11

Author(s):

EE Easter ◽

JW White

Keyword(s):

General Framework ◽

Coastal Systems ◽

Spatial Management

Download Full-text

Phonologically conditioned allomorphy in the morphology of Surmiran (Rumantsch)

WORD Structure ◽

10.3366/e1750124508000184 ◽

2008 ◽

Vol 1 (2) ◽

pp. 109-134 ◽

Cited By ~ 18

Author(s):

Stephen R. Anderson

Keyword(s):

Optimality Theory ◽

General Framework ◽

Great Majority ◽

Historical Change ◽

Romance Languages ◽

Morphological Properties ◽

Entire System ◽

Verbal System ◽

Extended Sense ◽

Phonological Rule

Alternations between allomorphs that are not directly related by phonological rule, but whose selection is governed by phonological properties of the environment, have attracted the sporadic attention of phonologists and morphologists. Such phenomena are commonly limited to rather small corners of a language's structure, however, and as a result have not been a major theoretical focus. This paper examines a set of alternations in Surmiran, a Swiss Rumantsch language, that have this character and that pervade the entire system of the language. It is shown that the alternations in question, best attested in the verbal system, are not conditioned by any coherent set of morphological properties (either straightforwardly or in the extended sense of ‘morphomes’ explored in other Romance languages by Maiden). These alternations are, however, straightforwardly aligned with the location of stress in words, and an analysis is proposed within the general framework of Optimality Theory to express this. The resulting system of phonologically conditioned allomorphy turns out to include the great majority of patterning which one might be tempted to treat as productive phonology, but which has been rendered opaque (and subsequently morphologized) as a result of the working of historical change.

Download Full-text

More's usage of Latin verbal predicates: the particular case of fio

Moreana ◽

10.3366/more.2019.0053 ◽

2019 ◽

Vol 56 (Number 211) (1) ◽

pp. 97-120

Author(s):

Concepción Cabrillana

Keyword(s):

General Framework ◽

Medieval Latin ◽

Lexical Semantic ◽

Syntactic Features ◽

Comparative Review ◽

Poetic Text

This article addresses Thomas More's use of an especially complex Latin predicate, fio, as a means of examining the degree of classicism in this aspect of his writing. To this end, the main lexical-semantic and syntactic features of the verb in Classical Latin are presented, and a comparative review is made of More's use of the predicate—and also its use in texts contemporaneous to More, as well as in Late and Medieval Latin—in both prose and poetry. The analysis shows that he works within a general framework of classicism, although he introduces some of his own idiosyncrasies, these essentially relating to the meaning of the verb that he employs in a preferential way and to the variety of verbal forms that occur in his poetic text.

Download Full-text