scholarly journals ABOUT PARAMETRIZATION OF SELECTION OF SIGNIFICANT CLUSTERS

2019 ◽  
Vol 4 (1) ◽  
pp. 64-67
Author(s):  
Pavel Kim

One of the fundamental tasks of cluster analysis is the partitioning of multidimensional data samples into groups of clusters – objects, which are closed in the sense of some given measure of similarity. In a some of problems, the number of clusters is set a priori, but more often it is required to determine them in the course of solving clustering. With a large number of clusters, especially if the data is “noisy,” the task becomes difficult for analyzing by experts, so it is artificially reduces the number of consideration clusters. The formal means of merging the “neighboring” clusters are considered, creating the basis for parameterizing the number of significant clusters in the “natural” clustering model [1].

1990 ◽  
Vol 29 (03) ◽  
pp. 200-204 ◽  
Author(s):  
J. A. Koziol

AbstractA basic problem of cluster analysis is the determination or selection of the number of clusters evinced in any set of data. We address this issue with multinomial data using Akaike’s information criterion and demonstrate its utility in identifying an appropriate number of clusters of tumor types with similar profiles of cell surface antigens.


2017 ◽  
Vol 13 (2) ◽  
pp. 1-12 ◽  
Author(s):  
Jungmok Ma

One of major obstacles in the application of the k-means clustering algorithm is the selection of the number of clusters k. The multi-attribute utility theory (MAUT)-based k-means clustering algorithm is proposed to tackle the problem by incorporating user preferences. Using MAUT, the decision maker's value structure for the number of clusters and other attributes can be quantitatively modeled, and it can be used as an objective function of the k-means. A target clustering problem for military targeting process is used to demonstrate the MAUT-based k-means and provide a comparative study. The result shows that the existing clustering algorithms do not necessarily reflect user preferences while the MAUT-based k-means provides a systematic framework of preferences modeling in cluster analysis.


2008 ◽  
Vol 2 (1) ◽  
pp. 65-70 ◽  
Author(s):  
M. Cabello ◽  
J. A. G. Orza ◽  
V. Galiano ◽  
G. Ruiz

Abstract. Backtrajectory differences and clustering sensitivity to the meteorological input data are studied. Trajectories arriving in Southeast Spain (Elche), at 3000, 1500 and 500 m for the 7-year period 2000–2006 have been computed employing two widely used meteorological data sets: the NCEP/NCAR Reanalysis and the FNL data sets. Differences between trajectories grow linearly at least up to 48 h, showing faster growing after 72 h. A k-means cluster analysis performed on each set of trajectories shows differences in the identified clusters (main flows), partially because the number of clusters of each clustering solution differs for the trajectories arriving at 3000 and 1500 m. Trajectory membership to the identified flows is in general more sensitive to the input meteorological data than to the initial selection of cluster centroids.


2020 ◽  
Vol 35 (4) ◽  
pp. 1879-1894
Author(s):  
Jonas M. B. Haslbeck ◽  
Dirk U. Wulff

Abstract We improve instability-based methods for the selection of the number of clusters k in cluster analysis by developing a corrected clustering distance that corrects for the unwanted influence of the distribution of cluster sizes on cluster instability. We show that our corrected instability measure outperforms current instability-based measures across the whole sequence of possible k, overcoming limitations of current insability-based methods for large k. We also compare, for the first time, model-based and model-free approaches to determining cluster-instability and find their performance to be comparable. We make our method available in the R-package .


Author(s):  
Maria A. Milkova

Nowadays the process of information accumulation is so rapid that the concept of the usual iterative search requires revision. Being in the world of oversaturated information in order to comprehensively cover and analyze the problem under study, it is necessary to make high demands on the search methods. An innovative approach to search should flexibly take into account the large amount of already accumulated knowledge and a priori requirements for results. The results, in turn, should immediately provide a roadmap of the direction being studied with the possibility of as much detail as possible. The approach to search based on topic modeling, the so-called topic search, allows you to take into account all these requirements and thereby streamline the nature of working with information, increase the efficiency of knowledge production, avoid cognitive biases in the perception of information, which is important both on micro and macro level. In order to demonstrate an example of applying topic search, the article considers the task of analyzing an import substitution program based on patent data. The program includes plans for 22 industries and contains more than 1,500 products and technologies for the proposed import substitution. The use of patent search based on topic modeling allows to search immediately by the blocks of a priori information – terms of industrial plans for import substitution and at the output get a selection of relevant documents for each of the industries. This approach allows not only to provide a comprehensive picture of the effectiveness of the program as a whole, but also to visually obtain more detailed information about which groups of products and technologies have been patented.


2011 ◽  
Vol 8 (1) ◽  
pp. 201-210
Author(s):  
R.M. Bogdanov

The problem of determining the repair sections of the main oil pipeline is solved, basing on the classification of images using distance functions and the clustering principle, The criteria characterizing the cluster are determined by certain given values, based on a comparison with which the defect is assigned to a given cluster, procedures for the redistribution of defects in cluster zones are provided, and the cluster zones parameters are being changed. Calculations are demonstrating the range of defect density variation depending on pipeline sections and the universal capabilities of linear objects configuration with arbitrary density, provided by cluster analysis.


Author(s):  
Laure Fournier ◽  
Lena Costaridou ◽  
Luc Bidaut ◽  
Nicolas Michoux ◽  
Frederic E. Lecouvet ◽  
...  

Abstract Existing quantitative imaging biomarkers (QIBs) are associated with known biological tissue characteristics and follow a well-understood path of technical, biological and clinical validation before incorporation into clinical trials. In radiomics, novel data-driven processes extract numerous visually imperceptible statistical features from the imaging data with no a priori assumptions on their correlation with biological processes. The selection of relevant features (radiomic signature) and incorporation into clinical trials therefore requires additional considerations to ensure meaningful imaging endpoints. Also, the number of radiomic features tested means that power calculations would result in sample sizes impossible to achieve within clinical trials. This article examines how the process of standardising and validating data-driven imaging biomarkers differs from those based on biological associations. Radiomic signatures are best developed initially on datasets that represent diversity of acquisition protocols as well as diversity of disease and of normal findings, rather than within clinical trials with standardised and optimised protocols as this would risk the selection of radiomic features being linked to the imaging process rather than the pathology. Normalisation through discretisation and feature harmonisation are essential pre-processing steps. Biological correlation may be performed after the technical and clinical validity of a radiomic signature is established, but is not mandatory. Feature selection may be part of discovery within a radiomics-specific trial or represent exploratory endpoints within an established trial; a previously validated radiomic signature may even be used as a primary/secondary endpoint, particularly if associations are demonstrated with specific biological processes and pathways being targeted within clinical trials. Key Points • Data-driven processes like radiomics risk false discoveries due to high-dimensionality of the dataset compared to sample size, making adequate diversity of the data, cross-validation and external validation essential to mitigate the risks of spurious associations and overfitting. • Use of radiomic signatures within clinical trials requires multistep standardisation of image acquisition, image analysis and data mining processes. • Biological correlation may be established after clinical validation but is not mandatory.


Sign in / Sign up

Export Citation Format

Share Document