scholarly journals K-Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data

2015 ◽  
Vol 2015 ◽  
pp. 1-10
Author(s):  
Kai Wang ◽  
Qing Zhao ◽  
Jianwei Lu ◽  
Tianwei Yu

With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinearK-profiles clustering method, which can be seen as the nonlinear counterpart of theK-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed thatK-profiles clustering not only outperformed traditional linearK-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on whichK-profile clustering generated biologically meaningful results.

2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


Author(s):  
Ana Belén Ramos-Guajardo

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.


2013 ◽  
Vol 321-324 ◽  
pp. 1939-1942
Author(s):  
Lei Gu

The locality sensitive k-means clustering method has been presented recently. Although this approach can improve the clustering accuracies, it often gains the unstable clustering results because some random samples are employed for the initial centers. In this paper, an initialization method based on the core clusters is used for the locality sensitive k-means clustering. The core clusters can be formed by constructing the σ-neighborhood graph and their centers are regarded as the initial centers of the locality sensitive k-means clustering. To investigate the effectiveness of our approach, several experiments are done on three datasets. Experimental results show that our proposed method can improve the clustering performance compared to the previous locality sensitive k-means clustering.


Author(s):  
J. W. Li ◽  
X. Q. Han ◽  
J. W. Jiang ◽  
Y. Hu ◽  
L. Liu

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.


Author(s):  
Muhamad Alias Md. Jedi ◽  
Robiah Adnan

TCLUST is a method in statistical clustering technique which is based on modification of trimmed k-means clustering algorithm. It is called “crisp” clustering approach because the observation is can be eliminated or assigned to a group. TCLUST strengthen the group assignment by putting constraint to the cluster scatter matrix. The emphasis in this paper is to restrict on the eigenvalues, λ of the scatter matrix. The idea of imposing constraints is to maximize the log-likelihood function of spurious-outlier model. A review of different robust clustering approach is presented as a comparison to TCLUST methods. This paper will discuss the nature of TCLUST algorithm and how to determine the number of cluster or group properly and measure the strength of group assignment. At the end of this paper, R-package on TCLUST implement the types of scatter restriction, making the algorithm to be more flexible for choosing the number of clusters and the trimming proportion.


2020 ◽  
Vol 10 (2) ◽  
pp. 21-39
Author(s):  
Archana Yashodip Chaudhari ◽  
Preeti Mulay

Intelligent electricity meters (IEMs) form a key infrastructure necessary for the growth of smart grids. IEMs generate a considerable amount of electricity data incrementally. However, on an influx of new data, traditional clustering task re-cluster all of the data from scratch. The incremental clustering method is an essential way to solve the problem of clustering with dynamic data. Given the volume of IEM data and the number of data types involved, an incremental clustering method is highly complex. Microsoft Azure provide the processing power necessary to handle incremental clustering analytics. The proposed Cloud4NFICA is a scalable platform of a nearness factor-based incremental clustering algorithm. This research uses the real dataset of Irish households collected by IEMs and related socioeconomic data. Cloud4NFICA is incremental in nature, hence accommodates the influx of new data. Cloud4NFICA was designed as an infrastructure as a service. It is visible from the study that the developed system performs well on the scalability aspect.


2013 ◽  
Vol 380-384 ◽  
pp. 1290-1293
Author(s):  
Qing Ju Guo ◽  
Wen Tian Ji ◽  
Sheng Zhong

Lots of research findings have been made from home and abroad on clustering algorithm in recent years. In view of the traditional partition clustering method K-means algorithm, this paper, after analyzing its advantages and disadvantages, combines it with ontology-based data set to establish a semantic web model. It improves the existing clustering algorithm in various constraint conditions with the aim of demonstrating that the improved algorithm has better efficiency and accuracy under semantic web.


Algorithms ◽  
2020 ◽  
Vol 13 (7) ◽  
pp. 158
Author(s):  
Tran Dinh Khang ◽  
Nguyen Duc Vuong ◽  
Manh-Kien Tran ◽  
Michael Fowler

Clustering is an unsupervised machine learning technique with many practical applications that has gathered extensive research interest. Aside from deterministic or probabilistic techniques, fuzzy C-means clustering (FCM) is also a common clustering technique. Since the advent of the FCM method, many improvements have been made to increase clustering efficiency. These improvements focus on adjusting the membership representation of elements in the clusters, or on fuzzifying and defuzzifying techniques, as well as the distance function between elements. This study proposes a novel fuzzy clustering algorithm using multiple different fuzzification coefficients depending on the characteristics of each data sample. The proposed fuzzy clustering method has similar calculation steps to FCM with some modifications. The formulas are derived to ensure convergence. The main contribution of this approach is the utilization of multiple fuzzification coefficients as opposed to only one coefficient in the original FCM algorithm. The new algorithm is then evaluated with experiments on several common datasets and the results show that the proposed algorithm is more efficient compared to the original FCM as well as other clustering methods.


2019 ◽  
Vol 11 (9) ◽  
pp. 2560
Author(s):  
Hyun Ahn ◽  
Tai-Woo Chang

As the adoption of information technologies increases in the manufacturing industry, manufacturing companies should efficiently manage their data and manufacturing processes in order to enhance their manufacturing competency. Because smart factories acquire processing data from connected machines, the business process management (BPM) approach can enrich the capability of manufacturing operations management. Manufacturing companies could benefit from the well-defined methodologies and process-centric engineering practices of this BPM approach for optimizing their manufacturing processes. Based on the approach, this paper proposes a similarity-based hierarchical clustering method for manufacturing processes. To this end, first we describe process modeling based on the BPM-compliant standard so that the manufacturing processes can be controlled by BPM systems. Second, we present similarity measures for manufacturing process models that serve as a criterion for the hierarchical clustering. Then, we formulate the hierarchical clustering problem and describe an agglomerative clustering algorithm using the measured similarities. Our contribution is considered on the assumption that a manufacturing company adopts the BPM approach and it operates various manufacturing processes. We expect that our method enables manufacturing companies to design and manage a vast amount of manufacturing processes at a coarser level, and it also can be applied to various process (re)engineering problems.


2013 ◽  
Vol 380-384 ◽  
pp. 1488-1494
Author(s):  
Wang Wei ◽  
Jin Yue Peng

In the research and development of intelligence system, clustering analysis is a very important problem. According to the new direct clustering algorithm using similarity measure of Vague sets as evaluation criteria presented by paper, the Vague direct clustering method are used to analysis using different similarity measure of Vague sets. The experimental result shows that the direct clustering method based on the similarity of Vague sets is effective, and the direct clustering method based on different similarity measure of Vague sets is the same basically, but difference on the steps of clustering. To select different algorithms according different conditions in the work of the actual applications.


Sign in / Sign up

Export Citation Format

Share Document