An Improved K-Means Algorithm and its Application in Customer Classification of Network Enterprises

2014 ◽  
Vol 543-547 ◽  
pp. 2124-2127
Author(s):  
Feng Lan Luo

K-means algorithm has powerful ability to cluster large data sets due to its high efficiency in data mining but its calculation instability limits the application of the algorithm, so the research of intelligent optimization of K-means algorithm has become a hot research field for the researchers related. First the calculation instability of the original K-means algorithm is analyzed with more details; Second, the improvement of cluster seed selection methods and the calculation flow of K-means algorithm are redesigned to speed up the calculation and enhance the stability of the improved model; Third, the paper realizes and conducts the analysis in customer classification practice of the improved algorithm which show that the improved K-means algorithm has better performance in classification accuracy and calculation stability and can be used in customer classification for network trade enterprises practically.

Author(s):  
Adam Kiersztyn ◽  
Pawe Karczmarek ◽  
Krystyna Kiersztyn ◽  
Witold Pedrycz

2013 ◽  
Vol 694-697 ◽  
pp. 2881-2885
Author(s):  
Hai Yan Wang ◽  
Jian Xin Zhang

Dyeing textile’s information management system is the basis of accurate classification of color, machine studying methods have became a popular area of research for application in color classification. Traditional classification methods have high efficiency and are very simple , but they are dependent on the distribution of sample spaces. If the sample data properties are not independent, forecast precision will been affected badly and internal instability will appear. An application of Gray-Relation for dyeing textile color classification has been designed, which offsets the discount in mathematical statistics method for system analysis. It is applicable regardless of variant in sample size, while quantizing structure is in agreement with qualitative analysis. On the basis of theoretical analysis, Dyeing textile color classification was conducted in the conditions of random sampling、 uniform sampling and stratified sampling. The experimental results proofs that by using Gray-Relation, dyeing textile color classification does not need to be dependent on sample space distribution, and increases the stability of classification.


2021 ◽  
Vol 251 ◽  
pp. 02054
Author(s):  
Olga Sunneborn Gudnadottir ◽  
Daniel Gedon ◽  
Colin Desmarais ◽  
Karl Bengtsson Bernander ◽  
Raazesh Sainudiin ◽  
...  

In recent years, machine-learning methods have become increasingly important for the experiments at the Large Hadron Collider (LHC). They are utilised in everything from trigger systems to reconstruction and data analysis. The recent UCluster method is a general model providing unsupervised clustering of particle physics data, that can be easily modified to provide solutions for a variety of different decision problems. In the current paper, we improve on the UCluster method by adding the option of training the model in a scalable and distributed fashion, and thereby extending its utility to learn from arbitrarily large data sets. UCluster combines a graph-based neural network called ABCnet with a clustering step, using a combined loss function in the training phase. The original code is publicly available in TensorFlow v1.14 and has previously been trained on a single GPU. It shows a clustering accuracy of 81% when applied to the problem of multi-class classification of simulated jet events. Our implementation adds the distributed training functionality by utilising the Horovod distributed training framework, which necessitated a migration of the code to TensorFlow v2. Together with using parquet files for splitting data up between different compute nodes, the distributed training makes the model scalable to any amount of input data, something that will be essential for use with real LHC data sets. We find that the model is well suited for distributed training, with the training time decreasing in direct relation to the number of GPU’s used. However, further improvements by a more exhaustive and possibly distributed hyper-parameter search is required in order to achieve the reported accuracy of the original UCluster method.


Author(s):  
Carlos Goncalves ◽  
Luis Assuncao ◽  
Jose C. Cunha

Data analytics applications handle large data sets subject to multiple processing phases, some of which can execute in parallel on clusters, grids or clouds. Such applications can benefit from using MapReduce model, only requiring the end-user to define the application algorithms for input data processing and the map and reduce functions, but this poses a need to install/configure specific frameworks such as Apache Hadoop or Elastic MapReduce in Amazon Cloud. In order to provide more flexibility in defining and adjusting the application configurations, as well as in the specification of the composition of the application phases and their orchestration, the authors describe an approach for supporting MapReduce stages as sub-workflows in the AWARD framework (Autonomic Workflow Activities Reconfigurable and Dynamic). The authors discuss how a text mining application is represented as a complex workflow with multiple phases, where individual workflow nodes support MapReduce computations. Access to intermediate data produced during the MapReduce computations is supported by a data sharing abstraction. The authors describe two implementations of this abstraction, one based on a shared tuple space and another based on an in-memory distributed key/value store. The authors describe the implementation of the framework, a set of developed tools, and our experimentation with the execution of the text mining algorithm over multiple Amazon EC2 (Elastic Compute Cloud) instances, and report on the speed-up and size-up results obtained up to 20 EC2 instances and for different corpus sizes, up to 97 million words.


2019 ◽  
Vol 64 (6) ◽  
pp. 669-675 ◽  
Author(s):  
Abdulaziz Alsayyari

Abstract A new technique for electronic fetal monitoring (EFM) using an efficient structure of neural networks based on the Legendre series is presented in this paper. Such a structure is achieved by training a Legendre series-based neural network (LNN) to classify the different fetal states based on recorded cardiotocographic (CTG) data sets given by others. These data sets consist of measurements of fetal heart rate (FHR) and uterine contraction (UC). The applied LNN utilizes a Legendre series expansion for the input vectors and, hence, has the capability to produce explicit equations describing multi-input multi-output systems. Simulations of the proposed technique in EFM demonstrate its high efficiency. Training the LNN requires a few number of iterations (5–10 epochs). The applied technique makes the classification of the fetal state available through equations combining the trained LNN weights and the current measured CTG record. A comparison of performance between the proposed LNN and other popular neural network techniques such as the Volterra neural network (VNN) in EFM is provided. The comparison shows that, the LNN outperforms the VNN in case of less computational requirements and fast convergence with a lower mean square error.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Christian Montag ◽  
Éilish Duke ◽  
Alexander Markowetz

The present paper provides insight into an emerging research discipline calledPsychoinformatics. In the context ofPsychoinformatics, we emphasize the cooperation between the disciplines of psychology and computer science in handling large data sets derived from heavily used devices, such as smartphones or online social network sites, in order to shed light on a large number of psychological traits, including personality and mood. New challenges await psychologists in light of the resulting “Big Data” sets, because classic psychological methods will only in part be able to analyze this data derived from ubiquitous mobile devices, as well as other everyday technologies. As a consequence, psychologists must enrich their scientific methods through the inclusion of methods from informatics. The paper provides a brief review of one area of this research field, dealing mainly with social networks and smartphones. Moreover, we highlight how data derived fromPsychoinformaticscan be combined in a meaningful way with data from human neuroscience. We close the paper with some observations of areas for future research and problems that require consideration within this new discipline.


Author(s):  
T. Ravindra Babu ◽  
M. Narasimha Murty ◽  
S. V. Subrahmanya

Data Mining deals with efficient algorithms for dealing with large data. When such algorithms are combined with data compaction, they would lead to superior performance. Approaches to deal with large data include working with representatives of data instead of entire data. The representatives should preferably be generated with minimal data scans. In the current chapter we discuss working with methods of lossy and non-lossy data compression methods combined with clustering and classification of large datasets. We demonstrate the working of such schemes on two large data sets.


Data Mining ◽  
2013 ◽  
pp. 734-750
Author(s):  
T. Ravindra Babu ◽  
M. Narasimha Murty ◽  
S. V. Subrahmanya

Data Mining deals with efficient algorithms for dealing with large data. When such algorithms are combined with data compaction, they would lead to superior performance. Approaches to deal with large data include working with representatives of data instead of entire data. The representatives should preferably be generated with minimal data scans. In the current chapter we discuss working with methods of lossy and non-lossy data compression methods combined with clustering and classification of large datasets. We demonstrate the working of such schemes on two large data sets.


Sign in / Sign up

Export Citation Format

Share Document