scholarly journals How to Use Temporal-Driven Constrained Clustering to Detect Typical Evolutions

2014 ◽  
Vol 23 (04) ◽  
pp. 1460013 ◽  
Author(s):  
Marian-Andrei Rizoiu ◽  
Julien Velcin ◽  
Stéphane Lallich

In this paper, we propose a new time-aware dissimilarity measure that takes into account the temporal dimension. Observations that are close in the description space, but distant in time are considered as dissimilar. We also propose a method to enforce the segmentation contiguity, by introducing, in the objective function, a penalty term inspired from the Normal Distribution Function. We combine the two propositions into a novel time-driven constrained clustering algorithm, called TDCK-Means, which creates a partition of coherent clusters, both in the multidimensional space and in the temporal space. This algorithm uses soft semi-supervised constraints, to encourage adjacent observations belonging to the same entity to be assigned to the same cluster. We apply our algorithm to a Political Studies dataset in order to detect typical evolution phases. We adapt the Shannon entropy in order to measure the entity contiguity, and we show that our proposition consistently improves temporal cohesion of clusters, without any significant loss in the multidimensional variance.

2009 ◽  
pp. 150-171 ◽  
Author(s):  
Shilin Wang ◽  
Alan Wee-Chung Liew ◽  
Wing Hong Lau ◽  
Shu Hung Leung

As the first step of many visual speech recognition and visual speaker authentication systems, robust and accurate lip region segmentation is of vital importance for lip image analysis. However, most of the current techniques break down when dealing with lip images with complex and inhomogeneous background region such as mustaches and beards. In order to solve this problem, a Multi-class, Shapeguided FCM (MS-FCM) clustering algorithm is proposed in this chapter. In the proposed approach, one cluster is set for the lip region and a combination of multiple clusters for the background which generally includes the skin region, lip shadow or beards. With the spatial distribution of the lip cluster, a spatial penalty term considering the spatial location information is introduced and incorporated into the objective function such that pixels having similar color but located in different regions can be differentiated. Experimental results show that the proposed algorithm provides accurate lip-background partition even for the images with complex background features.


Gut ◽  
2019 ◽  
Vol 68 (7) ◽  
pp. 1169-1179 ◽  
Author(s):  
Tao Zuo ◽  
Xiao-Juan Lu ◽  
Yu Zhang ◽  
Chun Pan Cheung ◽  
Siu Lam ◽  
...  

ObjectiveThe pathogenesis of UC relates to gut microbiota dysbiosis. We postulate that alterations in the viral community populating the intestinal mucosa play an important role in UC pathogenesis. This study aims to characterise the mucosal virome and their functions in health and UC.DesignDeep metagenomics sequencing of virus-like particle preparations and bacterial 16S rRNA sequencing were performed on the rectal mucosa of 167 subjects from three different geographical regions in China (UC=91; healthy controls=76). Virome and bacteriome alterations in UC mucosa were assessed and correlated with patient metadata. We applied partition around medoids clustering algorithm and classified mucosa viral communities into two clusters, referred to as mucosal virome metacommunities 1 and 2.ResultsIn UC, there was an expansion of mucosa viruses, particularly Caudovirales bacteriophages, and a decrease in mucosa Caudovirales diversity, richness and evenness compared with healthy controls. Altered mucosal virome correlated with intestinal inflammation. Interindividual dissimilarity between mucosal viromes was higher in UC than controls. Escherichia phage and Enterobacteria phage were more abundant in the mucosa of UC than controls. Compared with metacommunity 1, metacommunity 2 was predominated by UC subjects and displayed a significant loss of various viral species. Patients with UC showed substantial abrogation of diverse viral functions, whereas multiple viral functions, particularly functions of bacteriophages associated with host bacteria fitness and pathogenicity, were markedly enriched in UC mucosa. Intensive transkingdom correlations between mucosa viruses and bacteria were significantly depleted in UC.ConclusionWe demonstrated for the first time that UC is characterised by substantial alterations of the mucosa virobiota with functional distortion. Enrichment of Caudovirales bacteriophages, increased phage/bacteria virulence functions and loss of viral-bacterial correlations in the UC mucosa highlight that mucosal virome may play an important role in UC pathogenesis.


Complexity ◽  
2020 ◽  
Vol 2020 ◽  
pp. 1-16
Author(s):  
M. A. Balafar ◽  
R. Hazratgholizadeh ◽  
M. R. F. Derakhshi

Constrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to the Oracle and their relation is answered with “Must-link (ML) or Cannot-link (CL).” In each iteration, first, the support vector machine (SVM) is utilized based on the label produced by the current clustering. According to the distance of each document from the hyperplane, the distance matrix is created. Also, based on cosine similarity of word2vector of each document, the similarity matrix is created. Two types of probability (similarity and degree of similarity) are calculated and they are smoothed for belonging to neighborhoods. Neighborhoods form the samples that are labeled by Oracle, to be in the same cluster. Finally, at the end of each iteration, the data with a greater level of uncertainty (in term of probability) is selected for questioning the oracle. In order to evaluate, the proposed method is compared with famous state-of-the-art methods based on two criteria and over a standard dataset. The result demonstrates an increased accuracy and stability of the obtained result with fewer questions.


2017 ◽  
Vol 25 (2) ◽  
pp. 96-113 ◽  
Author(s):  
Matin Macktoobian ◽  
Mahdi Aliyari Sh

A spatially-constrained clustering algorithm is presented in this paper. This algorithm is a distributed clustering approach to fine-tune the optimal distances between agents of the system to strengthen the data passing among them using a set of spatial constraints. In fact, this method will increase interconnectivity among agents and clusters, leading to improvement of the overall communicative functionality of the multi-robot system. This strategy will lead to the establishment of loosely-coupled connections among the clusters. These implicit interconnections will mobilize the clusters to receive and transmit information within the multi-agent system. In other words, this algorithm classifies each agent into the clusters with the lowest cost of local communication with its peers. This research demonstrates that the presented decentralized method will actually boost the communicative agility of the swarm by probabilistic proof of the acquired optimality. Hence, the common assumption regarding the full-knowledge of the agents’ primary locations has been fully relaxed compared to former methods. Consequently, the algorithm’s reliability and efficiency is confirmed. Furthermore, the method’s efficacy in passing information will improve the functionality of higher-level swarm operations, such as task assignment and swarm flocking. Analytical investigations and simulated accomplishments, corresponding to highly-populated swarms, prove the claimed efficiency and coherence.


Author(s):  
Iffat Gheyas ◽  
Simon Parkinson ◽  
Saad Khan

In this paper, we propose a fully autonomous density-based clustering algorithm named ‘Ocean’, which is inspired by the oceanic landscape and phenomena that occur in it. Ocean is an improvement over conventional algorithms regarding both distance metric and the clustering mechanism. Ocean defines the distance between two categories as the difference in the relative densities of categories. Unlike existing approaches, Ocean neither assigns the same distance to all pairs of categories, nor assigns arbitrary weights to matches and mismatches between categories that can lead to clustering errors. Ocean uses density ratios of adjacent regions in multidimensional space to detect the edges of the clusters. Ocean is robust against clusters of identical patterns. Unlike conventional approaches, Ocean neither makes any assumption regarding the data distribution within clusters, nor requires tuning of free parameters. Empirical evaluations demonstrate improved performance of Ocean over existing approaches.


Author(s):  
Erliang Zeng ◽  
Chengyong Yang ◽  
Tao Li ◽  
Giri Narasimhan

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.


Author(s):  
Erliang Zeng ◽  
Chengyong Yang ◽  
Tao Li ◽  
Giri Narasimhan

Clustering of gene expression data is a standard exploratory technique used to identify closely related genes. Many other sources of data are also likely to be of great assistance in the analysis of gene expression data. This data provides a mean to begin elucidating the large-scale modular organization of the cell. The authors consider the challenging task of developing exploratory analytical techniques to deal with multiple complete and incomplete information sources. The Multi-Source Clustering (MSC) algorithm developed performs clustering with multiple, but complete, sources of data. To deal with incomplete data sources, the authors adopted the MPCK-means clustering algorithms to perform exploratory analysis on one complete source and other potentially incomplete sources provided in the form of constraints. This paper presents a new clustering algorithm MSC to perform exploratory analysis using two or more diverse but complete data sources, studies the effectiveness of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporates such incomplete data into constrained clustering algorithm in form of constraints sets.


Author(s):  
Helton Hugo de Carvalho Júnior ◽  
Robson Luiz Moreno ◽  
Tales Cleber Pimenta

This chapter presents the viability analysis and the development of heart disease identification embedded system. It offers a time reduction on electrocardiogram – ECG signal processing by reducing the amount of data samples without any significant loss. The goal of the developed system is the analysis of heart signals. The ECG signals are applied into the system that performs an initial filtering, and then uses a Gustafson-Kessel fuzzy clustering algorithm for the signal classification and correlation. The classification indicates common heart diseases such as angina, myocardial infarction and coronary artery diseases. The system uses the European electrocardiogram ST-T Database – EDB as a reference for tests and evaluation. The results prove the system can perform the heart disease detection on a data set reduced from 213 to just 20 samples, thus providing a reduction to just 9.4% of the original set, while maintaining the same effectiveness. This system is validated in a Xilinx Spartan®-3A FPGA. The FPGA implemented a Xilinx Microblaze® Soft-Core Processor running at a 50 MHz clock rate.


Sign in / Sign up

Export Citation Format

Share Document