scholarly journals Analysis of Internet Marketing Forecast Model Based on Parallel K-Means Algorithm

2021 ◽  
Vol 2021 ◽  
pp. 1-9
Author(s):  
Xiaolei Chen ◽  
Sikun Ge

Based on the parallel K-means algorithm, this article conducts in-depth research on the related issues of marketing node detection under the Internet, including designing a new Internet marketing node detector and a location summary network based on FCN (Full Convolutional Network) to input the preprocessing of the node and verify its performance under the data sets. At the same time, to solve the problem of insufficient data sets of Internet marketing nodes, the Internet data sets are artificially generated and used for detector training. First, the multiclass K-means algorithm is changed to two categories suitable for Internet marketing node detection: marketing nodes and background categories. Secondly, the weights in the K-means algorithm are mostly only applicable to target detection tasks. Therefore, when processing Internet marketing node detection tasks, the K-means algorithm is used to regress the training set and calculate 5 weights. During the simulation experiment, the weight calculation formula is used to calculate the weight of the feature term. The basic idea is that if a feature word appears more often in this document but less frequently in other nodes, the word will be assigned higher. At the same time, this article focuses on k. Some shortcomings of the mean clustering algorithm have been specifically improved. By standardizing the data participating in the clustering, the data participating in the clustering is transformed from an irregular distribution to a cluster-like distribution, thereby facilitating the clustering process. The density is introduced to determine the initial center of the cluster, and the purity metric is introduced to determine the appropriate density radius of the cluster center, to achieve the most effective reduction of the support vector machine training samples.

Author(s):  
SANGHAMITRA BANDYOPADHYAY ◽  
UJJWAL MAULIK ◽  
MALAY KUMAR PAKHIRA

An efficient partitional clustering technique, called SAKM-clustering, that integrates the power of simulated annealing for obtaining minimum energy configuration, and the searching capability of K-means algorithm is proposed in this article. The clustering methodology is used to search for appropriate clusters in multidimensional feature space such that a similarity metric of the resulting clusters is optimized. Data points are redistributed among the clusters probabilistically, so that points that are farther away from the cluster center have higher probabilities of migrating to other clusters than those which are closer to it. The superiority of the SAKM-clustering algorithm over the widely used K-means algorithm is extensively demonstrated for artificial and real life data sets.


2007 ◽  
Vol 17 (01) ◽  
pp. 71-103 ◽  
Author(s):  
NARGESS MEMARSADEGHI ◽  
DAVID M. MOUNT ◽  
NATHAN S. NETANYAHU ◽  
JACQUELINE LE MOIGNE

Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.


2018 ◽  
Vol 25 (2) ◽  
pp. 473-482 ◽  
Author(s):  
Fan Xu ◽  
Peter W. Tse

Unlike many traditional feature extraction methods of vibration signal such as ensemble empirical mode decomposition (EEMD), deep belief network (DBN) in deep learning can extract the useful information automatically and reduce the reliance on experts, with signal processing technology, and troubleshooting experience. In conventional fault diagnosis, data labels are required for classifiers such as support vector machine, random forest, and artificial neural networks. These are usually based on expert knowledge, for training and testing. But the process is usually tedious. The clustering model, on the other hand, can finish the roller bearings fault diagnosis without data labels, which is more efficient. There are some common clustering models which include fuzzy C-means (FCM), Gustafson–Kessel (GK), Gath–Geva (GG) models, and affinity propagation (AP). Unlike FCM, GK, and GG, which require knowledge or experience to pre-set the number of cluster center points, AP clustering algorithm can obtain the cluster center point according to the responsibility and availability calculations for all data points automatically. To the best of the authors’ knowledge, AP is rarely used for fault diagnosis. In this paper, a method which combines DBN, with several hidden layers, and AP for roller bearings fault diagnosis is proposed. For data visualization, the principal component analysis (PCA) is deployed to reduce the dimension of the extracted feature. The first two principal components are employed as the input of the FCM, GK, GG, and AP models for roller bearings faults diagnosis. Compared with other combination models such as EEMD–PCA–FCM/GK/GG and DBN–PCA–FCM/GK/GG, the proposed method, from the experimental results, is superior to the aforementioned combination models.


Aiming at the problems of distorted center selection and slow iteration convergence in traditional clustering analysis algorithm, a novel clustering scheme based on improved k-means algorithm is proposed. In this paper, based on the analysis of all user behavior sets contained in the initial sample, a weight calculation method for abnormal behaviors and an eigenvalue extraction method for abnormal behavior set are proposed and a set of abnormal behaviors is constructed for each user according to the behavior data generated by abnormal users. Then, on the basis of the traditional k-means clustering algorithm, an improved algorithm is proposed. By calculating the compactness of all data points and selecting the initial cluster center among the data points with high and low compactness, the clustering performance is enhanced. Finally, the eigenvalues of the abnormal behavior set are used as the input of the algorithm to output the clustering results of the abnormal behavior. Experimental results show that the clustering performance of this algorithm is better than the traditional clustering algorithm, and can effectively improve the clustering performance of abnormal behavior


2013 ◽  
Vol 321-324 ◽  
pp. 1943-1946
Author(s):  
Lei Gu

A clustering algorithm based on one-class support vector machine has been proposed recently. Because the kernel technique is used, this approach can appear preferable to the traditional k-means clustering. Clustering ensemble method can combine several divisions of all unlabeled data into a single clustering to gain the better clustering results. In this paper, the clustering ensemble method is applied to the clustering algorithm based one-class support vector machines. Several partitions of multiple runs with different random initial data sets are combined into a final clustering result. Experiments show that the new approach can improve the clustering performance.


2012 ◽  
Vol 532-533 ◽  
pp. 1507-1511
Author(s):  
Zhen Jiang Zhao ◽  
Wei Gao ◽  
Huai Zhong Wang ◽  
Ke Fei Zhang

Support Vector Machine is widely used in data classification, but in the case of more training samples, the training time is longer. To solve this problem, use the ISODATA clustering algorithm to cluster samples to obtain the new cluster center, together with high similarity to the error for the sample to form a new cluster of training samples, training support vector machines. So that a solution of high similarity to repeat the training samples of similar problems, while focusing on the easily lead to wrong classification of the training samples. The support vector machine classification accuracy can be improved, and also reduces the training time, to make it more convenient for engineering application.


2011 ◽  
Vol 383-390 ◽  
pp. 925-930
Author(s):  
Chun Cheng Zhang ◽  
Xiang Guang Chen ◽  
Yuan Qing Xu

In order to improve the forecasting accuracy of indoor thermal comfort, the basic principle of fuzzy c-means clustering algorithm (FCM) and support vector machines (SVM) is analyzed. A kind of SVM forecasting method based on FCM data preprocess is proposed in this paper. The large data sets can be divided into multiple mixed groups and each group is represented by a single regression model using the proposed method. The support vector machines based on fuzzy c-means clustering algorithm (FCM+SVM) and the BP neural network based on fuzzy c-means clustering algorithm (FCM+BPNN) are respectively applied to forecast PMV index. The experimental results demonstrate that the FCM+SVM method has better forecasting accuracy compared with FCM+BPNN method.


2018 ◽  
Vol 2018 ◽  
pp. 1-14 ◽  
Author(s):  
Xiaobo Lv ◽  
Yan Ma ◽  
Xiaofu He ◽  
Hui Huang ◽  
Jie Yang

The minimum spanning tree- (MST-) based clustering method can identify clusters of arbitrary shape by removing inconsistent edges. The definition of the inconsistent edges is a major issue that has to be addressed in all MST-based clustering algorithms. In this paper, we propose a novel MST-based clustering algorithm through the cluster center initialization algorithm, called cciMST. First, in order to capture the intrinsic structure of the data sets, we propose the cluster center initialization algorithm based on geodesic distance and dual densities of the points. Second, we propose and demonstrate that the inconsistent edge is located on the shortest path between the cluster centers, so we can find the inconsistent edge with the length of the edges as well as the densities of their endpoints on the shortest path. Correspondingly, we obtain two groups of clustering results. Third, we propose a novel intercluster separation by computing the distance between the points at the intersection of clusters. Furthermore, we propose a new internal clustering validation measure to select the best clustering result. The experimental results on the synthetic data sets, real data sets, and image data sets demonstrate the good performance of the proposed MST-based method.


Nowadays, the internet and network service user’s counts are increasing and the data generation speed also very high. Then again, we see greater security dangers on the internet, enterprise network, websites and the network. Anomaly has been known as one of the effective cyber threats over the internet which increasing exponentially and thus overcomes the commonly used approaches for anomaly detection and classification. Anomaly detection is used in big data analytics to recognize the unexpected behaviour. The most commonly used characteristics in network environment are size and dimensionality, which are big datasets and also impose problems in recognizing useful patterns, For example, to identify the network traffic anomalies from the large datasets. Due to the enormous increase of computer network based facilities it is a challenge to perform fast and efficient anomaly detection. The anomaly recognition in big data sets is more useful to discover fraud and abnormal action. Here, we mainly focus on the problems regarding anomaly detection, so we introduce a novel machine learning based anomaly detection technique. Machine learning approach is used to enhance the anomaly detection speed which is very much useful to detect the anomaly from the large datasets. We evaluate the proposed framework by performing experiments with larger data sets and compare to several existing techniques such as fuzzy, SVM (Support Vector Machine) and PSO (Particle swarm optimization). It has shown 98% percentage of accuracy and the false rate of 0.002 % on proposed classifier. The experimental results illuminate that better performance than existing anomaly detection techniques in big data environment.


Author(s):  
Naida A. Kazibekova ◽  
◽  
Magrifa N. Agasieva ◽  
Zamira M. Ismieva

Sign in / Sign up

Export Citation Format

Share Document