An Effective Collaborative User Model Using Hybrid Clustering Recommendation Methods

Collaborative Filtering (CF) has been known as the most successful recommendation technique in which recommendations are made based on the past rating records from like-minded users. Significant growth of users and items have negatively affected the efficiency of CF and pose key issues related to computational aspects and the quality of recommendation such as high dimensionality and data sparsity. In this study, a hybrid method was proposed and was capable to solve the mentioned problems using a neighborhood selection process for each user through two clustering algorithms which were item-based k-means clustering and user-based Fuzzy Clustering. Item-based k-means clustering was applied because of its advantages in computational time and hence it is able to address the high dimensionality issues. To create user groups and find the correlation between users, we employed the user-based Fuzzy Clustering and it has not yet been used in user-based CF clustering. This clustering can calculate the degree of membership among users into set of clustered items. Furthermore, a new similarity metric was designed to compute the similarity value among users with affecting the output of user-based Fuzzy Clustering. This metric is an alternative to the basic similarity metrics in CF and it has been proven to provide high-quality recommendations and a noticeable improvement on the accuracy of recommendations to the users. The proposed method has been evaluated using two benchmark datasets, MovieLens and LastFM in order to make a comparison with the existing recommendation methods.

Download Full-text

KC-Means: A Fast Fuzzy Clustering

Advances in Fuzzy Systems ◽

10.1155/2018/2634861 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Israa Abdzaid Atiyah ◽

Adel Mohammadpour ◽

S. Mahmoud Taheri

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithms ◽

Fixed Number ◽

Clustering Method ◽

Time Processing ◽

Hybrid Clustering ◽

Second Stage ◽

Benchmark Datasets ◽

Two Stages ◽

Minkowski Distances

A novel hybrid clustering method, named KC-Means clustering, is proposed for improving upon the clustering time of the Fuzzy C-Means algorithm. The proposed method combines K-Means and Fuzzy C-Means algorithms into two stages. In the first stage, the K-Means algorithm is applied to the dataset to find the centers of a fixed number of groups. In the second stage, the Fuzzy C-Means algorithm is applied on the centers obtained in the first stage. Comparisons are then made between the proposed and other algorithms in terms of time processing and accuracy. In addition, the mentioned clustering algorithms are applied to a few benchmark datasets in order to verify their performances. Finally, a class of Minkowski distances is used to determine the influence of distance on the clustering performance.

Download Full-text

Integrated Algorithm for Unsupervised Data Clustering Problems in Data Mining

Journal of Southwest Jiaotong University ◽

10.35741/issn.0258-2724.54.5.40 ◽

2019 ◽

Vol 54 (5) ◽

Author(s):

Nibras Othman Abdul Wahid ◽

Saif Aamer Fadhil ◽

Noor Abbood Jasim

Keyword(s):

Data Mining ◽

Data Clustering ◽

Clustering Algorithms ◽

Genetic Operators ◽

Way Of Life ◽

Fundamental Parameters ◽

Benchmark Datasets ◽

Key Issues ◽

Clustering Data ◽

Lion Optimization Algorithm

Unsupervised data clustering investigation is a standout amongst the most valuable apparatuses and an enlightening undertaking in data mining that looks to characterize homogeneous gatherings of articles depending on likeness and is utilized in numerous applications. One of the key issues in data mining is clustering data that have pulled in much consideration. One of the famous clustering algorithms is K-means clustering that has been effectively connected to numerous issues. Scientists recommended enhancing the nature of K-means, optimization algorithms were hybridized. In this paper, a heuristic calculation, Lion Optimization Algorithm (LOA), and Genetic Algorithm (GA) were adjusted for K-Means data clustering by altering the fundamental parameters of LOA calculation, which is propelled from the characteristic enlivened calculations. The uncommon way of life of lions and their participation attributes has been the essential inspiration for the advancement of this improvement calculation. The GA is utilized when it is required to reallocate the clusters using the genetic operators, crossover, and mutation. The outcomes of the examination of this calculation mirror the capacity of this methodology in clustering examination on the number of benchmark datasets from UCI Machine Learning Repository.

Download Full-text

Drug-Drug Interaction Predicting by Neural Network Using Integrated Similarity

Scientific Reports ◽

10.1038/s41598-019-50121-3 ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 10

Author(s):

Narjes Rohani ◽

Changiz Eslahchi

Keyword(s):

Neural Network ◽

Drug Interaction ◽

Side Effect ◽

Network Architecture ◽

Selection Process ◽

Superior Performance ◽

Multiple Drug ◽

Interaction Prediction ◽

Benchmark Datasets ◽

Drug Drug Interaction

Abstract Drug-Drug Interaction (DDI) prediction is one of the most critical issues in drug development and health. Proposing appropriate computational methods for predicting unknown DDI with high precision is challenging. We proposed "NDD: Neural network-based method for drug-drug interaction prediction" for predicting unknown DDIs using various information about drugs. Multiple drug similarities based on drug substructure, target, side effect, off-label side effect, pathway, transporter, and indication data are calculated. At first, NDD uses a heuristic similarity selection process and then integrates the selected similarities with a nonlinear similarity fusion method to achieve high-level features. Afterward, it uses a neural network for interaction prediction. The similarity selection and similarity integration parts of NDD have been proposed in previous studies of other problems. Our novelty is to combine these parts with new neural network architecture and apply these approaches in the context of DDI prediction. We compared NDD with six machine learning classifiers and six state-of-the-art graph-based methods on three benchmark datasets. NDD achieved superior performance in cross-validation with AUPR ranging from 0.830 to 0.947, AUC from 0.954 to 0.994 and F-measure from 0.772 to 0.902. Moreover, cumulative evidence in case studies on numerous drug pairs, further confirm the ability of NDD to predict unknown DDIs. The evaluations corroborate that NDD is an efficient method for predicting unknown DDIs. The data and implementation of NDD are available at https://github.com/nrohani/NDD.

Download Full-text

A novel and fast MIMO fuzzy inference system based on a class of fuzzy clustering algorithms with interpretability and complexity analysis

Expert Systems with Applications ◽

10.1016/j.eswa.2017.04.045 ◽

2017 ◽

Vol 84 ◽

pp. 301-322 ◽

Cited By ~ 10

Author(s):

S. Askari

Keyword(s):

Fuzzy Clustering ◽

Fuzzy Inference System ◽

Fuzzy Inference ◽

Complexity Analysis ◽

Clustering Algorithms ◽

Inference System

Download Full-text

Using single-cell cytometry to illustrate integrated multi-perspective evaluation of clustering algorithms using Pareto fronts

Bioinformatics ◽

10.1093/bioinformatics/btab038 ◽

2021 ◽

Author(s):

Givanna H Putri ◽

Irena Koprinska ◽

Thomas M Ashhurst ◽

Nicholas J C King ◽

Mark N Read

Keyword(s):

Single Cell ◽

Performance Metrics ◽

Clustering Algorithms ◽

Latin Hypercube Sampling ◽

Supplementary Information ◽

Sequencing Data ◽

Evaluation Protocol ◽

Benchmark Datasets ◽

Pareto Fronts ◽

Parameter Values

Abstract Motivation Many ‘automated gating’ algorithms now exist to cluster cytometry and single-cell sequencing data into discrete populations. Comparative algorithm evaluations on benchmark datasets rely either on a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasize different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding optimal clustering algorithms and undermines the translatability of results onto other non-benchmark datasets. Results We propose the Pareto fronts framework as an integrative evaluation protocol, wherein individual metrics are instead leveraged as complementary perspectives. Judged superior are algorithms that provide the best trade-off between the multiple metrics considered simultaneously. This yields a more comprehensive and complete view of clustering performance. Moreover, by broadly and systematically sampling algorithm parameter values using the Latin Hypercube sampling method, our evaluation protocol minimizes (un)fortunate parameter value selections as confounding factors. Furthermore, it reveals how meticulously each algorithm must be tuned in order to obtain good results, vital knowledge for users with novel data. We exemplify the protocol by conducting a comparative study between three clustering algorithms (ChronoClust, FlowSOM and Phenograph) using four common performance metrics applied across four cytometry benchmark datasets. To our knowledge, this is the first time Pareto fronts have been used to evaluate the performance of clustering algorithms in any application domain. Availability and implementation Implementation of our Pareto front methodology and all scripts and datasets to reproduce this article are available at https://github.com/ghar1821/ParetoBench. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

AN EXTENDED FUZZY CLUSTERING ALGORITHM AND ITS APPLICATION

Journal of Circuits System and Computers ◽

10.1142/s0218126695000175 ◽

1995 ◽

Vol 05 (02) ◽

pp. 239-259

Author(s):

SU HWAN KIM ◽

SEON WOOK KIM ◽

TAE WON RHEE

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Main Memory ◽

Color Image Segmentation ◽

Occurrence Rate ◽

Secondary Memory ◽

Worst Case ◽

Memory Space ◽

Fuzzy Clustering Algorithm

For data analyses, it is very important to combine data with similar attribute values into a categorically homogeneous subset, called a cluster, and this technique is called clustering. Generally crisp clustering algorithms are weak in noise, because each datum should be assigned to exactly one cluster. In order to solve the problem, a fuzzy c-means, a fuzzy maximum likelihood estimation, and an optimal fuzzy clustering algorithms in the fuzzy set theory have been proposed. They, however, require a lot of processing time because of exhaustive iteration with an amount of data and their memberships. Especially large memory space results in the degradation of performance in real-time processing applications, because it takes too much time to swap between the main memory and the secondary memory. To overcome these limitations, an extended fuzzy clustering algorithm based on an unsupervised optimal fuzzy clustering algorithm is proposed in this paper. This algorithm assigns a weight factor to each distinct datum considering its occurrence rate. Also, the proposed extended fuzzy clustering algorithm considers the degree of importances of each attribute, which determines the characteristics of the data. The worst case is that the whole data has an uniformly normal distribution, which means the importance of all attributes are the same. The proposed extended fuzzy clustering algorithm has better performance than the unsupervised optimal fuzzy clustering algorithm in terms of memory space and execution time in most cases. For simulation the proposed algorithm is applied to color image segmentation. Also automatic target detection and multipeak detection are considered as applications. These schemes can be applied to any other fuzzy clustering algorithms.

Download Full-text

Identification of domestic water consumption in a house based on fuzzy clustering algorithms

2009 IEEE International Conference on Systems, Man and Cybernetics ◽

10.1109/icsmc.2009.5346891 ◽

2009 ◽

Cited By ~ 1

Author(s):

M. A. Corona-Nakamura ◽

R. Ruelas ◽

B. Ojeda-Magana ◽

D. W. Carr Finch

Keyword(s):

Fuzzy Clustering ◽

Water Consumption ◽

Clustering Algorithms ◽

Domestic Water ◽

Domestic Water Consumption

Download Full-text

Construct Knowledge Structure of Linear Algebra

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.211-212.793 ◽

2011 ◽

Vol 211-212 ◽

pp. 793-797

Author(s):

Chin Chun Chen ◽

Yuan Horng Lin ◽

Jeng Ming Yih ◽

Sue Fen Huang

Keyword(s):

Knowledge Management ◽

Linear Algebra ◽

Fuzzy Clustering ◽

Mahalanobis Distance ◽

Clustering Algorithms ◽

Knowledge Structure ◽

Interpretive Structural Modeling ◽

Cognitive Characteristics ◽

Fuzzy C Means ◽

Fuzzy C Means Algorithm

Apply interpretive structural modeling to construct knowledge structure of linear algebra. New fuzzy clustering algorithms improved fuzzy c-means algorithm based on Mahalanobis distance has better performance than fuzzy c-means algorithm. Each cluster of data can easily describe features of knowledge structures individually. The results show that there are six clusters and each cluster has its own cognitive characteristics. The methodology can improve knowledge management in classroom more feasible.

Download Full-text

Dual Shot Face Detecting using Deep Learning

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.f9930.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 5669-5672

Keyword(s):

Deep Learning ◽

High Dimensionality ◽

Absolute Accuracy ◽

Robust Method ◽

Learning Technique ◽

Benchmark Datasets ◽

Dual Images ◽

Preprocessing Technique ◽

Day By Day ◽

Feature Extraction Technique

In the paper, we have used a deep learning technique to identify dual faces i.e. nothing but detecting dual shot faces. As the data is emerging day by day with high dimensionality, recognizing dual faces is a major problem. So wasting time on identifying images is like fiddling around. In order to save time and get absolute accuracy we have implemented a fast preprocessing technique named as Convolutional Neural Network (CNN) along with feature extraction technique which is used to knob the relevant features to detect and identify images/faces. By performing this robust method, our intention is to detect dual images in an efficient way. This technique results in decreased feature cardinality and preserves unique efficiency of the data. The experiment is performed on extensive well liked face detecting benchmark datasets, Wider Face and FDDB. CNN with FE demonstrates the results with superiority and the accuracy was in-depth analyzed by CNN classifier.

Download Full-text

Ensemble Classification through Random Projections for single-cell RNA-seq data

10.1101/2020.06.24.169136 ◽

2020 ◽

Author(s):

Aristidis G. Vrahatis ◽

Sotiris Tasoulis ◽

Spiros Georgakopoulos ◽

Vassilis Plagianakos

Keyword(s):

Single Cell ◽

Random Projection ◽

Classification Performance ◽

Majority Voting ◽

Ensemble Classification ◽

High Dimensionality ◽

Computational Time ◽

Biomedical Data ◽

Rna Seq ◽

Low Dimensional

AbstractNowadays the biomedical data are generated exponentially, creating datasets for analysis with ultra-high dimensionality and complexity. This revolution, which has been caused by recent advents in biotechnologies, has driven to big-data and data-driven computational approaches. An indicative example is the emerging single-cell RNA-sequencing (scRNA-seq) technology, which isolates and measures individual cells. Although scRNA-seq has revolutionized the biotechnology domain, such data computational analysis is a major challenge because of their ultra-high dimensionality and complexity. Following this direction, in this work we study the properties, effectiveness and generalization of the recently proposed MRPV algorithm for single cell RNA-seq data. MRPV is an ensemble classification technique utilizing multiple ultra-low dimensional Random Projected spaces. A given classifier determines the class for each sample for all independent spaces while a majority voting scheme defines their predominant class. We show that Random Projection ensembles offer a platform not only for a low computational time analysis but also for enhancing classification performance. The developed methodologies were applied to four real biomedical high dimensional data from single-cell RNA-seq studies and compared against well-known and similar classification tools. Experimental results showed that based on simplistic tools we can create a computationally fast, simple, yet effective approach for single cell RNA-seq data with ultra-high dimensionality.

Download Full-text