robust clustering
Recently Published Documents


TOTAL DOCUMENTS

230
(FIVE YEARS 58)

H-INDEX

26
(FIVE YEARS 5)

Author(s):  
Pedro C. Álvarez-Esteban ◽  
Luis A. García-Escudero

AbstractA robust approach for clustering functional directional data is proposed. The proposal adapts “impartial trimming” techniques to this particular framework. Impartial trimming uses the dataset itself to tell us which appears to be the most outlying curves. A feasible algorithm is proposed for its practical implementation justified by some theoretical properties. A “warping” approach is also introduced which allows including controlled time warping in that robust clustering procedure to detect typical “templates”. The proposed methodology is illustrated in a real data analysis problem where it is applied to cluster aircraft trajectories.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8220
Author(s):  
Stephen Clark ◽  
Nik Lomax ◽  
Michelle Morris ◽  
Francesca Pontin ◽  
Mark Birkin

Many researchers are beginning to adopt the use of wrist-worn accelerometers to objectively measure personal activity levels. Data from these devices are often used to summarise such activity in terms of averages, variances, exceedances, and patterns within a profile. In this study, we report the development of a clustering utilising the whole activity profile. This was achieved using the robust clustering technique of k-medoids applied to an extensive data set of over 90,000 activity profiles, collected as part of the UK Biobank study. We identified nine distinct activity profiles in these data, which captured both the pattern of activity throughout a week and the intensity of the activity: “Active 9 to 5”, “Active”, “Morning Movers”, “Get up and Active”, “Live for the Weekend”, “Moderates”, “Leisurely 9 to 5”, “Sedate” and “Inactive”. These patterns are differentiated by sociodemographic, socioeconomic, and health and circadian rhythm data collected by UK Biobank. The utility of these findings are that they sit alongside existing summary measures of physical activity to provide a way to typify distinct activity patterns that may help to explain other health and morbidity outcomes, e.g., BMI or COVID-19. This research will be returned to the UK Biobank for other researchers to use.


Author(s):  
Andreas Wunsch ◽  
Tanja Liesch ◽  
Stefan Broda

AbstractHydrograph clustering helps to identify dynamic patterns within aquifers systems, an important foundation of characterizing groundwater systems and their influences, which is necessary to effectively manage groundwater resources. We develope an unsupervised modeling approach to characterize and cluster hydrographs on regional scale according to their dynamics. We apply feature-based clustering to improve the exploitation of heterogeneous datasets, explore the usefulness of existing features and propose new features specifically useful to describe groundwater hydrographs. The clustering itself is based on a powerful combination of Self-Organizing Maps with a modified DS2L-Algorithm, which automatically derives the cluster number but also allows to influence the level of detail of the clustering. We further develop a framework that combines these methods with ensemble modeling, internal cluster validation indices, resampling and consensus voting to finally obtain a robust clustering result and remove arbitrariness from the feature selection process. Further we propose a measure to sort hydrographs within clusters, useful for both interpretability and visualization. We test the framework with weekly data from the Upper Rhine Graben System, using more than 1800 hydrographs from a period of 30 years (1986-2016). The results show that our approach is adaptively capable of identifying homogeneous groups of hydrograph dynamics. The resulting clusters show both spatially known and unknown patterns, some of which correspond clearly to external controlling factors, such as intensive groundwater management in the northern part of the test area. This framework is easily transferable to other regions and, by adapting the describing features, also to other time series-clustering applications.


2021 ◽  
Vol 2021 ◽  
pp. 1-16
Author(s):  
Melisew Tefera Belachew

Determining the number of clusters in high-dimensional real-life datasets and interpreting the final outcome are among the challenging problems in data science. Discovering the number of classes in cancer and microarray data plays a vital role in the treatment and diagnosis of cancers and other related diseases. Nonnegative matrix factorization (NMF) plays a paramount role as an efficient data exploratory tool for extracting basis features inherent in massive data. Some algorithms which are based on incorporating sparsity constraints in the nonconvex NMF optimization problem are applied in the past for analyzing microarray datasets. However, to the best of our knowledge, none of these algorithms use block coordinate descent method which is known for providing closed form solutions. In this paper, we apply an algorithm developed based on columnwise partitioning and rank-one matrix approximation. We test this algorithm on two well-known cancer datasets: leukemia and multiple myeloma. The numerical results indicate that the proposed algorithm performs significantly better than related state-of-the-art methods. In particular, it is shown that this method is capable of robust clustering and discovering larger cancer classes in which the cluster splits are stable.


2021 ◽  
Author(s):  
Marc Creixell ◽  
Aaron Samuel Meyer

Cell signaling is orchestrated in part through a network of protein kinases and phosphatases. Dysregulation of kinase signaling is widespread in diseases such as cancer and is readily targetable through inhibitors of kinase enzymatic activity. Mass spectrometry-based analysis of kinase signaling can provide a global view of kinase signaling regulation but making sense of these data is complicated by its stochastic coverage of the proteome, measurement of substrates rather than kinase signaling itself, and the scale of the data collected. Here, we implement a dual data and motif clustering strategy (DDMC) that simultaneously clusters substrate peptides into similarly regulated groups based on their variation within an experiment and their sequence profile. We show that this can help to identify putative upstream kinases and supply more robust clustering. We apply this clustering to large-scale clinical proteomic profiling of lung cancer and identify conserved proteomic signatures of tumorigenicity, genetic mutations, and tumor immune infiltration. We propose that DDMC provides a general and flexible clustering strategy for the analysis of phosphoproteomic data.


Sign in / Sign up

Export Citation Format

Share Document