Enhancement of Sales promotion using Clustering Techniques in Data Mart

Clustering is an important research topic in wide range of unsupervised classification application. Clustering is a technique, which divides a data into meaningful groups. K-means algorithm is one of the popular clustering algorithms. It belongs to partition based grouping techniques, which are based on the iterative relocation of data points between clusters. It does not support global clustering and it has linear time complexity of O(n2). The existing and conventional data clustering algorithms were nâ€™t designed to handle the huge amount of data. So, to overcome these issues Golay code clustering algorithm is selected. Golay code based system used to facilitate the identification of the set of codeword incarnate similar object behaviors. The time complexity associated with Golay code-clustering algorithm is O(n). In this work, the collected sales data is pre processed by removing all null and empty attributes, then eliminating redundant, and noise data. To enhance the sales promotion, K-means and Golay code clustering algorithms are used to cluster the sales data in terms of place and item. Performances of these algorithms are analyzed in terms of accuracy and execution time. Our results show that the Golay code algorithm outperforms than K-mean algorithm in all factors.

Download Full-text

An improved OPTICS clustering algorithm for discovering clusters with uneven densities

Intelligent Data Analysis ◽

10.3233/ida-205497 ◽

2021 ◽

Vol 25 (6) ◽

pp. 1453-1471

Author(s):

Chunhua Tang ◽

Han Wang ◽

Zhiwen Wang ◽

Xiangkun Zeng ◽

Huaran Yan ◽

...

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Substantial Improvement ◽

Experimental Results ◽

High Time ◽

Parameter Setting ◽

K Nearest Neighbor ◽

Density Based Clustering

Most density-based clustering algorithms have the problems of difficult parameter setting, high time complexity, poor noise recognition, and weak clustering for datasets with uneven density. To solve these problems, this paper proposes FOP-OPTICS algorithm (Finding of the Ordering Peaks Based on OPTICS), which is a substantial improvement of OPTICS (Ordering Points To Identify the Clustering Structure). The proposed algorithm finds the demarcation point (DP) from the Augmented Cluster-Ordering generated by OPTICS and uses the reachability-distance of DP as the radius of neighborhood eps of its corresponding cluster. It overcomes the weakness of most algorithms in clustering datasets with uneven densities. By computing the distance of the k-nearest neighbor of each point, it reduces the time complexity of OPTICS; by calculating density-mutation points within the clusters, it can efficiently recognize noise. The experimental results show that FOP-OPTICS has the lowest time complexity, and outperforms other algorithms in parameter setting and noise recognition.

Download Full-text

The fast clustering algorithm for the big data based on K-means

International Journal of Wavelets Multiresolution and Information Processing ◽

10.1142/s0219691320500538 ◽

2020 ◽

Vol 18 (06) ◽

pp. 2050053

Author(s):

Ting Xie ◽

Taiping Zhang

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Feature Space ◽

Data Sets ◽

Benchmark Data ◽

Clustering Model ◽

Alternating Direction ◽

Learning Technique ◽

Noise Data

As a powerful unsupervised learning technique, clustering is the fundamental task of big data analysis. However, many traditional clustering algorithms for big data that is a collection of high dimension, sparse and noise data do not perform well both in terms of computational efficiency and clustering accuracy. To alleviate these problems, this paper presents Feature K-means clustering model on the feature space of big data and introduces its fast algorithm based on Alternating Direction Multiplier Method (ADMM). We show the equivalence of the Feature K-means model in the original space and the feature space and prove the convergence of its iterative algorithm. Computationally, we compare the Feature K-means with Spherical K-means and Kernel K-means on several benchmark data sets, including artificial data and four face databases. Experiments show that the proposed approach is comparable to the state-of-the-art algorithm in big data clustering.

Download Full-text

A Comparison of K-Means and Mean Shift Algorithms

10.20944/preprints202108.0140.v1 ◽

2021 ◽

Author(s):

Mehak Nigar Shumaila

Keyword(s):

Cluster Analysis ◽

Data Analysis ◽

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Mean Shift ◽

Prediction Performance ◽

Learning Problem ◽

Cluster A ◽

Formation Of Groups

Clustering, or otherwise known as cluster analysis, is a learning problem that takes place without any human supervision. This technique has often been utilized, much efficiently, in data analysis, and serves for observing and identifying interesting, useful, or desired patterns in the said data. The clustering technique functions by performing a structured division of the data involved, in similar objects based on the characteristics that it identifies. This process results in the formation of groups, and each group that is formed, is called a cluster. A single said cluster consists of objects from the data, that have similarities among other objects found in the same cluster, and resemble differences when compared to objects identified from the data that now exist in other clusters. The process of clustering is very significant in various aspects of data analysis, as it determines and presents the intrinsic grouping of objects present in the data, based on their attributes, in a batch of unlabeled raw data. A textbook or otherwise said, good criteria, does not exist in this method of cluster analysis. That is because this process is so different and so customizable for every user, that needs it in his/her various and different needs. There is no outright best clustering algorithm, as it massively depends on the user’s scenario and needs. This paper is intended to compare and study two different clustering algorithms. The algorithms under investigation are k-mean and mean shift. These algorithms are compared according to the following factors: time complexity, training, prediction performance and accuracy of the clustering algorithms.

Download Full-text

Adaptive Initialization Method for K-Means Algorithm

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.740817 ◽

2021 ◽

Vol 4 ◽

Author(s):

Jie Yang ◽

Yu-Kai Wang ◽

Xin Yao ◽

Chin-Teng Lin

Keyword(s):

Time Complexity ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Real Life ◽

Superior Performance ◽

Local Optima ◽

Initial Cluster ◽

Higher Dimensional ◽

Real World Datasets ◽

Random Method

The K-means algorithm is a widely used clustering algorithm that offers simplicity and efficiency. However, the traditional K-means algorithm uses a random method to determine the initial cluster centers, which make clustering results prone to local optima and then result in worse clustering performance. In this research, we propose an adaptive initialization method for the K-means algorithm (AIMK) which can adapt to the various characteristics in different datasets and obtain better clustering performance with stable results. For larger or higher-dimensional datasets, we even leverage random sampling in AIMK (name as AIMK-RS) to reduce the time complexity. 22 real-world datasets were applied for performance comparisons. The experimental results show AIMK and AIMK-RS outperform the current initialization methods and several well-known clustering algorithms. Specifically, AIMK-RS can significantly reduce the time complexity to O (n). Moreover, we exploit AIMK to initialize K-medoids and spectral clustering, and better performance is also explored. The above results demonstrate superior performance and good scalability by AIMK or AIMK-RS. In the future, we would like to apply AIMK to more partition-based clustering algorithms to solve real-life practical problems.

Download Full-text

COMBINING FUZZY PROBABILITY AND FUZZY CLUSTERING FOR MULTISPECTRAL SATELLITE IMAGERY CLASSIFICATION

Vietnam Journal of Science and Technology ◽

10.15625/0866-708x/54/3/6463 ◽

2016 ◽

Vol 54 (3) ◽

pp. 300 ◽

Cited By ~ 2

Author(s):

Mai Dinh Sinh ◽

Le Hung Trinh ◽

Ngo Thanh Long

Keyword(s):

Fuzzy Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Satellite Image ◽

Great Influence ◽

Unsupervised Classification ◽

Classification Algorithms ◽

Fuzzy Probability ◽

Multispectral Satellite Images ◽

The Stability

This paper proposes a method of combining fuzzy probability and fuzzy clustering algorithm to classify on multispectral satellite images by relying on fuzzy probability to calculate the number of clusters and the centroid of clusters then using fuzzy clustering to classifying land-cover on the satellite image. In fact, the classification algorithms, the initialization of the clusters and the initial centroid of clusters have great influence on the stability of the algorithms, dealing time and classification results; the unsupervised classification algorithms such as k-Means, c-Means, Iso-data are used quite common for many problems, but the disadvantages is the low accuracy and unstable, especially when dealing with the problems on the satellite image. Results of the algorithm which are proposed show significant reduction of noise in the clusters and comparison with various clustering algorithms like k-means, iso-data, so on.

Download Full-text

On Hierarchical Linguistic-Based Clustering

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2015.p0900 ◽

2015 ◽

Vol 19 (6) ◽

pp. 900-906 ◽

Cited By ~ 1

Author(s):

Naohiko Kinoshita ◽

◽

Yasunori Endo ◽

Akira Sugawara ◽

◽

...

Keyword(s):

Data Structure ◽

Data Structures ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Fuzzy Reasoning ◽

Unsupervised Classification ◽

Clustering Techniques ◽

Model Based Clustering ◽

Model Based ◽

Soft Computing Techniques

Clustering is representative unsupervised classification. Many researchers have proposed clustering algorithms based on mathematical models – methods we call model-based clustering. Clustering techniques are very useful for determining data structures, but model-based clustering is difficult to use for analyzing data correctly because we cannot select a suitable method unless we know the data structure at least partially. The new clustering algorithm we propose introduces soft computing techniques such as fuzzy reasoning in what we call linguistic-based clustering, whose features are not incident to the data structure. We verify the method’s effectiveness through numerical examples.

Download Full-text

Hard c-Means Using Quadratic Penalty-Vector Regularization for Uncertain Data

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0831 ◽

2012 ◽

Vol 16 (7) ◽

pp. 831-840 ◽

Cited By ~ 1

Author(s):

Yasunori Endo ◽

◽

Arisa Taniguchi ◽

Yukihiro Hamasuna ◽

◽

...

Keyword(s):

Missing Values ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Uncertain Data ◽

Unsupervised Classification ◽

Real Space ◽

Clustering Methods ◽

Cluster Number ◽

Numerical Examples ◽

Classification Technique

Clustering is an unsupervised classification technique for data analysis. In general, each datum in real space is transformed into a point in a pattern space to apply clustering methods. Data cannot often be represented by a point, however, because of its uncertainty, e.g., measurement error margin and missing values in data. In this paper, we will introduce quadratic penalty-vector regularization to handle such uncertain data using Hard c-Means (HCM), which is one of the most typical clustering algorithms. We first propose a new clustering algorithm called hard c-means using quadratic penalty-vector regularization for uncertain data (HCMP). Second, we propose sequential extraction hard c-means using quadratic penalty-vector regularization (SHCMP) to handle datasets whose cluster number is unknown. Furthermore, we verify the effectiveness of our proposed algorithms through numerical examples.

Download Full-text

Linear Time Complexity Time Series Clustering with Symbolic Pattern Forest

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2019/406 ◽

2019 ◽

Author(s):

Xiaosheng Li ◽

Jessica Lin ◽

Liang Zhao

Keyword(s):

Time Series ◽

Data Storage ◽

Time Complexity ◽

Clustering Algorithm ◽

Time Series Data ◽

Linear Time ◽

Series Data ◽

Data Generation ◽

Time Series Clustering ◽

Group Structures

With increasing powering of data storage and advances in data generation and collection technologies, large volumes of time series data become available and the content is changing rapidly. This requires the data mining methods to have low time complexity to handle the huge and fast-changing data. This paper presents a novel time series clustering algorithm that has linear time complexity. The proposed algorithm partitions the data by checking some randomly selected symbolic patterns in the time series. Theoretical analysis is provided to show that group structures in the data can be revealed from this process. We evaluate the proposed algorithm extensively on all 85 datasets from the well-known UCR time series archive, and compare with the state-of-the-art approaches with statistical analysis. The results show that the proposed method is faster, and achieves better accuracy compared with other rival methods.

Download Full-text

Benchmark and application of unsupervised classification approaches for univariate data

Communications Physics ◽

10.1038/s42005-021-00549-9 ◽

2021 ◽

Vol 4 (1) ◽

Author(s):

Maria El Abbassi ◽

Jan Overbeck ◽

Oliver Braun ◽

Michel Calame ◽

Herre S. J. van der Zant ◽

...

Keyword(s):

Machine Learning ◽

A Priori ◽

Clustering Algorithms ◽

Feature Space ◽

Unsupervised Classification ◽

Optimum Number ◽

Data Sets ◽

Learning Approaches ◽

Wide Range ◽

Characteristic Features

AbstractUnsupervised machine learning, and in particular data clustering, is a powerful approach for the analysis of datasets and identification of characteristic features occurring throughout a dataset. It is gaining popularity across scientific disciplines and is particularly useful for applications without a priori knowledge of the data structure. Here, we introduce an approach for unsupervised data classification of any dataset consisting of a series of univariate measurements. It is therefore ideally suited for a wide range of measurement types. We apply it to the field of nanoelectronics and spectroscopy to identify meaningful structures in data sets. We also provide guidelines for the estimation of the optimum number of clusters. In addition, we have performed an extensive benchmark of novel and existing machine learning approaches and observe significant performance differences. Careful selection of the feature space construction method and clustering algorithms for a specific measurement type can therefore greatly improve classification accuracies.

Download Full-text

Clustering Algorithms and Validation Indices for a Wide mmWave Spectrum

Information ◽

10.3390/info10090287 ◽

2019 ◽

Vol 10 (9) ◽

pp. 287 ◽

Cited By ~ 2

Author(s):

Bogdan Antonescu ◽

Miead Tehrani Moayyed ◽

Stefano Basagni

Keyword(s):

Communication Systems ◽

Clustering Algorithm ◽

Radio Channel ◽

Clustering Algorithms ◽

Wireless Communication Systems ◽

Cluster Validity Indices ◽

Validity Indices ◽

Wide Range ◽

Radio Signals ◽

Urban Scenario

Radio channel propagation models for the millimeter wave (mmWave) spectrum are extremely important for planning future 5G wireless communication systems. Transmitted radio signals are received as clusters of multipath rays. Identifying these clusters provides better spatial and temporal characteristics of the mmWave channel. This paper deals with the clustering process and its validation across a wide range of frequencies in the mmWave spectrum below 100 GHz. By way of simulations, we show that in outdoor communication scenarios clustering of received rays is influenced by the frequency of the transmitted signal. This demonstrates the sparse characteristic of the mmWave spectrum (i.e., we obtain a lower number of rays at the receiver for the same urban scenario). We use the well-known k-means clustering algorithm to group arriving rays at the receiver. The accuracy of this partitioning is studied with both cluster validity indices (CVIs) and score fusion techniques. Finally, we analyze how the clustering solution changes with narrower-beam antennas, and we provide a comparison of the cluster characteristics for different types of antennas.

Download Full-text