Design of an Unsupervised Machine Learning-Based Movie Recommender System

Mapping Intimacies ◽

10.20944/preprints202001.0124.v1 ◽

2020 ◽

Author(s):

Debby Cintia Ganesha Putri ◽

Jenq-Shiou Leu ◽

Pavel Seda

Keyword(s):

Recommender System ◽

Clustering Algorithm ◽

System Development ◽

Clustering Algorithms ◽

Mean Shift ◽

Computational Time ◽

Agglomerative Clustering ◽

Method Performance ◽

Cluster Validity Indices ◽

Validity Indices

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and, Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used Average Similarity, Computational Time, Association Rule with Apriori algorithm, and Clustering Performance Evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies-Bouldin Index.

Download Full-text

Design of an Unsupervised Machine Learning-Based Movie Recommender System

Symmetry ◽

10.3390/sym12020185 ◽

2020 ◽

Vol 12 (2) ◽

pp. 185 ◽

Cited By ~ 3

Author(s):

Debby Cintia Ganesha Putri ◽

Jenq-Shiou Leu ◽

Pavel Seda

Keyword(s):

Recommender System ◽

Clustering Algorithm ◽

System Development ◽

Clustering Algorithms ◽

Mean Shift ◽

Computational Time ◽

Agglomerative Clustering ◽

Method Performance ◽

Cluster Validity Indices ◽

Validity Indices

This research aims to determine the similarities in groups of people to build a film recommender system for users. Users often have difficulty in finding suitable movies due to the increasing amount of movie information. The recommender system is very useful for helping customers choose a preferred movie with the existing features. In this study, the recommender system development is established by using several algorithms to obtain groupings, such as the K-Means algorithm, birch algorithm, mini-batch K-Means algorithm, mean-shift algorithm, affinity propagation algorithm, agglomerative clustering algorithm, and spectral clustering algorithm. We propose methods optimizing K so that each cluster may not significantly increase variance. We are limited to using groupings based on Genre and Tags for movies. This research can discover better methods for evaluating clustering algorithms. To verify the quality of the recommender system, we adopted the mean square error (MSE), such as the Dunn Matrix and Cluster Validity Indices, and social network analysis (SNA), such as Degree Centrality, Closeness Centrality, and Betweenness Centrality. We also used average similarity, computational time, association rule with Apriori algorithm, and clustering performance evaluation as evaluation measures to compare method performance of recommender systems using Silhouette Coefficient, Calinski-Harabaz Index, and Davies–Bouldin Index.

Download Full-text

Clustering Algorithms and Validation Indices for a Wide mmWave Spectrum

Information ◽

10.3390/info10090287 ◽

2019 ◽

Vol 10 (9) ◽

pp. 287 ◽

Cited By ~ 2

Author(s):

Bogdan Antonescu ◽

Miead Tehrani Moayyed ◽

Stefano Basagni

Keyword(s):

Communication Systems ◽

Clustering Algorithm ◽

Radio Channel ◽

Clustering Algorithms ◽

Wireless Communication Systems ◽

Cluster Validity Indices ◽

Validity Indices ◽

Wide Range ◽

Radio Signals ◽

Urban Scenario

Radio channel propagation models for the millimeter wave (mmWave) spectrum are extremely important for planning future 5G wireless communication systems. Transmitted radio signals are received as clusters of multipath rays. Identifying these clusters provides better spatial and temporal characteristics of the mmWave channel. This paper deals with the clustering process and its validation across a wide range of frequencies in the mmWave spectrum below 100 GHz. By way of simulations, we show that in outdoor communication scenarios clustering of received rays is influenced by the frequency of the transmitted signal. This demonstrates the sparse characteristic of the mmWave spectrum (i.e., we obtain a lower number of rays at the receiver for the same urban scenario). We use the well-known k-means clustering algorithm to group arriving rays at the receiver. The accuracy of this partitioning is studied with both cluster validity indices (CVIs) and score fusion techniques. Finally, we analyze how the clustering solution changes with narrower-beam antennas, and we provide a comparison of the cluster characteristics for different types of antennas.

Download Full-text

Role of Cluster Validity Indices in Delineation of Precipitation Regions

Water ◽

10.3390/w12051372 ◽

2020 ◽

Vol 12 (5) ◽

pp. 1372

Author(s):

Nikhil Bhatia ◽

Jency M. Sojan ◽

Slobodon Simonovic ◽

Roshan Srivastav

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Optimal Number ◽

Ratio Test ◽

Cluster Validity ◽

Number Of Clusters ◽

Cluster Validity Indices ◽

Validity Indices ◽

Point Data ◽

Optimal Number Of Clusters

The delineation of precipitation regions is to identify homogeneous zones in which the characteristics of the process are statistically similar. The regionalization process has three main components: (i) delineation of regions using clustering algorithms, (ii) determining the optimal number of regions using cluster validity indices (CVIs), and (iii) validation of regions for homogeneity using L-moments ratio test. The identification of the optimal number of clusters will significantly affect the homogeneity of the regions. The objective of this study is to investigate the performance of the various CVIs in identifying the optimal number of clusters, which maximizes the homogeneity of the precipitation regions. The k-means clustering algorithm is adopted to delineate the regions using location-based attributes for two large areas from Canada, namely, the Prairies and the Great Lakes-St Lawrence lowlands (GL-SL) region. The seasonal precipitation data for 55 years (1951–2005) is derived using high-resolution ANUSPLIN gridded point data for Canada. The results indicate that the optimal number of clusters and the regional homogeneity depends on the CVI adopted. Among 42 cluster indices considered, 15 of them outperform in identifying the homogeneous precipitation regions. The Dunn, D e t _ r a t i o and Trace( W − 1 B ) indices found to be the best for all seasons in both the regions.

Download Full-text

Robustness of fish assemblages derived from three hierarchical agglomerative clustering algorithms performed on Icelandic groundfish survey data

ICES Journal of Marine Science ◽

10.1093/icesjms/fsq144 ◽

2010 ◽

Vol 68 (1) ◽

pp. 189-200 ◽

Cited By ~ 12

Author(s):

Warsha Singh ◽

Einar Hjorleifsson ◽

Gunnar Stefansson

Keyword(s):

Survey Data ◽

Fish Assemblages ◽

Clustering Algorithms ◽

Agglomerative Clustering ◽

Cluster Validity Indices ◽

Validity Indices ◽

Hierarchical Agglomerative Clustering ◽

Species Area ◽

Species Groups ◽

The Stability

Abstract Singh, W., Hjorleifsson, E., and Stefansson, G. 2011. Robustness of fish assemblages derived from three hierarchical agglomerative clustering algorithms performed on Icelandic groundfish survey data. – ICES Journal of Marine Science, 68: 189–200. Heatmaps are used to identify species–area assemblages based on Icelandic groundfish survey data. Hierarchical agglomerative clustering algorithms are widely applied for species assemblage studies and form the basis for heatmaps. First, the robustness of fish assemblages derived from three clustering algorithms, Average, Complete, and Ward's linkage, was examined. For statistical reliability, the use of a bootstrap resampling technique to generate the confidence values for the clusters is emphasized. Two cluster validity indices were used to measure the efficiency and the quality of the clusters. To examine the stability of the results, clustering was carried out across different sample sizes and levels of data smoothing. Second, cluster analysis was carried out using a different combination of data standardization and dissimilarity measure. Ward's linkage gave the most robust fish assemblages for both modes of data analyses. Four fish assemblages were identified which could be characterized according to the depth and the geographic distribution. This algorithm was then used to generate a heatmap to determine the species–area relationships. Specific areas were characterized by the identified species groups.

Download Full-text

A New Clustering Algorithm Based On Cluster Validity Indices

Discovery Science - Lecture Notes in Computer Science ◽

10.1007/978-3-540-30214-8_27 ◽

2004 ◽

pp. 322-329

Author(s):

Minho Kim ◽

R. S. Ramakrishna

Keyword(s):

Clustering Algorithm ◽

Cluster Validity ◽

Cluster Validity Indices ◽

Validity Indices

Download Full-text

Assessment of some combinations of hard and fuzzy clustering techniques for regionalisation of catchments in Sefidroud basin

Journal of Hydroinformatics ◽

10.2166/hydro.2016.239 ◽

2016 ◽

Vol 18 (6) ◽

pp. 1033-1054 ◽

Cited By ~ 7

Author(s):

Ali Ahani ◽

S. Saeid Mousavi Nadoushani

Keyword(s):

Objective Function ◽

Clustering Algorithms ◽

Hybrid Approach ◽

Hybrid Algorithms ◽

Flood Frequency Analysis ◽

Cluster Validity ◽

Fcm Algorithm ◽

Cluster Validity Indices ◽

Validity Indices ◽

Regional Flood Frequency Analysis

Cluster analysis methods are a type of well-known technique for regionalisation of catchments to perform regional flood frequency analysis. In this study, a fuzzy extension of hybrid clustering algorithms is evaluated. Self-organizing feature maps and four hierarchical clustering algorithms were used to provide the initial cluster centres for fuzzy c-means (FCM) algorithm. The hybrid approach was used for regionalisation of catchments in Sefidroud basin based on feature vectors including five catchment attributes: longitude and latitude, drainage area, runoff coefficient and mean annual precipitation. The results showed that according to the values of both the objective function and the cluster validity indices, the performances of FCM algorithm often was improved by using the proposed hybrid approach. Also, it was evident from the results that in the case of minimizing the objective function, the combination of Ward's algorithm and FCM provided best results, but according to the cluster validity indices, other hybrid algorithms such as combinations of single linkage or complete linkage and FCM algorithm presented the most desirable results. In addition, according to the results, there are two well-defined homogeneous regions in Sefidroud basin identified by all the examined hybrid algorithms.

Download Full-text

A Comparison of K-Means and Mean Shift Algorithms

10.20944/preprints202108.0140.v1 ◽

2021 ◽

Author(s):

Mehak Nigar Shumaila

Keyword(s):

Cluster Analysis ◽

Data Analysis ◽

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Mean Shift ◽

Prediction Performance ◽

Learning Problem ◽

Cluster A ◽

Formation Of Groups

Clustering, or otherwise known as cluster analysis, is a learning problem that takes place without any human supervision. This technique has often been utilized, much efficiently, in data analysis, and serves for observing and identifying interesting, useful, or desired patterns in the said data. The clustering technique functions by performing a structured division of the data involved, in similar objects based on the characteristics that it identifies. This process results in the formation of groups, and each group that is formed, is called a cluster. A single said cluster consists of objects from the data, that have similarities among other objects found in the same cluster, and resemble differences when compared to objects identified from the data that now exist in other clusters. The process of clustering is very significant in various aspects of data analysis, as it determines and presents the intrinsic grouping of objects present in the data, based on their attributes, in a batch of unlabeled raw data. A textbook or otherwise said, good criteria, does not exist in this method of cluster analysis. That is because this process is so different and so customizable for every user, that needs it in his/her various and different needs. There is no outright best clustering algorithm, as it massively depends on the user’s scenario and needs. This paper is intended to compare and study two different clustering algorithms. The algorithms under investigation are k-mean and mean shift. These algorithms are compared according to the following factors: time complexity, training, prediction performance and accuracy of the clustering algorithms.

Download Full-text

Boundary Matching and Interior Connectivity-Based Cluster Validity Anlysis

Applied Sciences ◽

10.3390/app10041337 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1337 ◽

Cited By ~ 2

Author(s):

Qi Li ◽

Shihong Yue ◽

Yaru Wang ◽

Mingliang Ding ◽

Jia Li ◽

...

Keyword(s):

Clustering Analysis ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Experimental Results ◽

Cluster Validity ◽

Validity Index ◽

Boundary Points ◽

Validity Indices ◽

Boundary Matching ◽

Interior Points

The evaluation of clustering results plays an important role in clustering analysis. However, the existing validity indices are limited to a specific clustering algorithm, clustering parameter, and assumption in practice. In this paper, we propose a novel validity index to solve the above problems based on two complementary measures: boundary points matching and interior points connectivity. Firstly, when any clustering algorithm is performed on a dataset, we extract all boundary points for the dataset and its partitioned clusters using a nonparametric metric. The measure of boundary points matching is computed. Secondly, the interior points connectivity of both the dataset and all the partitioned clusters are measured. The proposed validity index can evaluate different clustering results on the dataset obtained from different clustering algorithms, which cannot be evaluated by the existing validity indices at all. Experimental results demonstrate that the proposed validity index can evaluate clustering results obtained by using an arbitrary clustering algorithm and find the optimal clustering parameters.

Download Full-text

Enhancing Machine Learning Aptitude Using Significant Cluster Identification for Augmented Image Refining

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800142051009x ◽

2019 ◽

Vol 34 (09) ◽

pp. 2051009

Author(s):

Dhanalakshmi Samiappan ◽

S. Latha ◽

T. Rama Rao ◽

Deepak Verma ◽

CSA Sriharsha

Keyword(s):

Machine Learning ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Speckle Noise ◽

Computational Time ◽

Significant Cluster ◽

Edge Preservation ◽

Cluster Identification ◽

Base Method ◽

Window Selection

Enhancing the image to remove noise, preserving the useful features and edges are the most important tasks in image analysis. In this paper, Significant Cluster Identification for Maximum Edge Preservation (SCI-MEP), which works in parallel with clustering algorithms and improved efficiency of the machine learning aptitude, is proposed. Affinity propagation (AP) is a base method to obtain clusters from a learnt dictionary, with an adaptive window selection, which are then refined using SCI-MEP to preserve the semantic components of the image. Since only the significant clusters are worked upon, the computational time drastically reduces. The flexibility of SCI-MEP allows it to be integrated with any clustering algorithm to improve its efficiency. The method is tested and verified to remove Gaussian noise, rain noise and speckle noise from images. Our results have shown that SCI-MEP considerably optimizes the existing algorithms in terms of performance evaluation metrics.

Download Full-text

Diesel engine injection faults' detection and classification utilizing unsupervised fuzzy clustering techniques

Proceedings of the Institution of Mechanical Engineers Part C Journal of Mechanical Engineering Science ◽

10.1177/0954406219849089 ◽

2019 ◽

Vol 233 (16) ◽

pp. 5622-5636

Author(s):

Ezzeddine Ftoutou ◽

Mnaouar Chouchane

Keyword(s):

Diesel Engine ◽

Fuzzy Clustering ◽

Clustering Algorithms ◽

Variable Number ◽

High Detection Rate ◽

Time Frequency ◽

Cluster Validity Indices ◽

Validity Indices ◽

Number Of Classes ◽

Unsupervised Fuzzy Clustering

By using the unsupervised fuzzy clustering, this study attempts to design a new scheme for the unsupervised detection and classification of two injection faults using the time–frequency analysis of vibration signals of an internal combustion, four-stroke, diesel engine with six cylinders in-line. To reach this objective, two new methods called modified S-transform and two-dimensional non-negative matrix factorization are used. Three fuzzy clustering algorithms and nine cluster validity indices, for a variable number of classes, are also used to detect and classify the fault classes. The implementation of these methods resulted in a high detection rate of the injection faults.

Download Full-text