scholarly journals Cell Type Hierarchy Reconstruction via Reconciliation of Multi-resolution Cluster Tree

2021 ◽  
Author(s):  
Minshi Peng ◽  
Brie Wamsley ◽  
Andrew Elkins ◽  
Daniel M Geschwind ◽  
Yuting Wei ◽  
...  

AbstractA wealth of clustering algorithms are available for Single-cell RNA sequencing (scRNA-seq), but it remains challenging to compare and characterize the features across different scales of resolution. To resolve this challenge Multi-resolution Reconciled Tree (MRtree), builds a hierarchical tree structure based on multi-resolution partitions that is highly flexible and can be coupled with most scRNA-seq clustering algorithms. MRtree out-performs bottom-up or divisive hierarchical clustering approaches because it inherits the robustness and versatility of a flat clustering approach, while maintaining the hierarchical structure of cells. Application to fetal brain cells yields insight into subtypes of cells that can be reliably estimated.

Author(s):  
A. Suhaibah ◽  
U. Uznir ◽  
F. Anton ◽  
D. Mioc ◽  
A. A. Rahman

Supply Chain Management (SCM) is the management of the products and goods flow from its origin point to point of consumption. During the process of SCM, information and dataset gathered for this application is massive and complex. This is due to its several processes such as procurement, product development and commercialization, physical distribution, outsourcing and partnerships. For a practical application, SCM datasets need to be managed and maintained to serve a better service to its three main categories; distributor, customer and supplier. To manage these datasets, a structure of data constellation is used to accommodate the data into the spatial database. However, the situation in geospatial database creates few problems, for example the performance of the database deteriorate especially during the query operation. We strongly believe that a more practical hierarchical tree structure is required for efficient process of SCM. Besides that, three-dimensional approach is required for the management of SCM datasets since it involve with the multi-level location such as shop lots and residential apartments. 3D R-Tree has been increasingly used for 3D geospatial database management due to its simplicity and extendibility. However, it suffers from serious overlaps between nodes. In this paper, we proposed a partition-based clustering for the construction of a hierarchical tree structure. Several datasets are tested using the proposed method and the percentage of the overlapping nodes and volume coverage are computed and compared with the original 3D R-Tree and other practical approaches. The experiments demonstrated in this paper substantiated that the hierarchical structure of the proposed partitionbased clustering is capable of preserving minimal overlap and coverage. The query performance was tested using 300,000 points of a SCM dataset and the results are presented in this paper. This paper also discusses the outlook of the structure for future reference.


2013 ◽  
Vol 284-287 ◽  
pp. 3051-3055
Author(s):  
Lin Chih Chen

Academic search engines, such as Google Scholar and Scirus, provide a Web-based interface to effectively find relevant scientific articles to researchers. However, current academic search engines are lacking the ability to cluster the search results into a hierarchical tree structure. In this paper, we develop a post-search academic search engine by using a mixed clustering method. In this method, we first adopt a suffix tree clustering and a two-way hash mechanism to generate all meaningful labels. We then develop a divisive hierarchical clustering algorithm to organize the labels into a hierarchical tree. According to the results of experiments, we conclude that using our mixed clustering method to cluster the search results can give significant performance gains than current academic search engines. In this paper, we make two contributions. First, we present a high performance academic search engine based on our mixed clustering method. Second, we develop a divisive hierarchical clustering algorithm to organize all returned search results into a hierarchical tree structure.


2021 ◽  
Vol 10 (7) ◽  
pp. 432
Author(s):  
Nicolai Moos ◽  
Carsten Juergens ◽  
Andreas P. Redecker

This paper describes a methodological approach that is able to analyse socio-demographic and -economic data in large-scale spatial detail. Based on the two variables, population density and annual income, one investigates the spatial relationship of these variables to identify locations of imbalance or disparities assisted by bivariate choropleth maps. The aim is to gain a deeper insight into spatial components of socioeconomic nexuses, such as the relationships between the two variables, especially for high-resolution spatial units. The used methodology is able to assist political decision-making, target group advertising in the field of geo-marketing and for the site searches of new shop locations, as well as further socioeconomic research and urban planning. The developed methodology was tested in a national case study in Germany and is easily transferrable to other countries with comparable datasets. The analysis was carried out utilising data about population density and average annual income linked to spatially referenced polygons of postal codes. These were disaggregated initially via a readapted three-class dasymetric mapping approach and allocated to large-scale city block polygons. Univariate and bivariate choropleth maps generated from the resulting datasets were then used to identify and compare spatial economic disparities for a study area in North Rhine-Westphalia (NRW), Germany. Subsequently, based on these variables, a multivariate clustering approach was conducted for a demonstration area in Dortmund. In the result, it was obvious that the spatially disaggregated data allow more detailed insight into spatial patterns of socioeconomic attributes than the coarser data related to postal code polygons.


Author(s):  
R. R. Gharieb ◽  
G. Gendy ◽  
H. Selim

In this paper, the standard hard C-means (HCM) clustering approach to image segmentation is modified by incorporating weighted membership Kullback–Leibler (KL) divergence and local data information into the HCM objective function. The membership KL divergence, used for fuzzification, measures the proximity between each cluster membership function of a pixel and the locally-smoothed value of the membership in the pixel vicinity. The fuzzification weight is a function of the pixel to cluster-centers distances. The used pixel to a cluster-center distance is composed of the original pixel data distance plus a fraction of the distance generated from the locally-smoothed pixel data. It is shown that the obtained membership function of a pixel is proportional to the locally-smoothed membership function of this pixel multiplied by an exponentially distributed function of the minus pixel distance relative to the minimum distance provided by the nearest cluster-center to the pixel. Therefore, since incorporating the locally-smoothed membership and data information in addition to the relative distance, which is more tolerant to additive noise than the absolute distance, the proposed algorithm has a threefold noise-handling process. The presented algorithm, named local data and membership KL divergence based fuzzy C-means (LDMKLFCM), is tested by synthetic and real-world noisy images and its results are compared with those of several FCM-based clustering algorithms.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Lopamudra Dey ◽  
Sanjay Chakraborty

“Clustering” the significance and application of this technique is spread over various fields. Clustering is an unsupervised process in data mining, that is why the proper evaluation of the results and measuring the compactness and separability of the clusters are important issues. The procedure of evaluating the results of a clustering algorithm is known as cluster validity measure. Different types of indexes are used to solve different types of problems and indices selection depends on the kind of available data. This paper first proposes Canonical PSO based K-means clustering algorithm and also analyses some important clustering indices (intercluster, intracluster) and then evaluates the effects of those indices on real-time air pollution database, wholesale customer, wine, and vehicle datasets using typical K-means, Canonical PSO based K-means, simple PSO based K-means, DBSCAN, and Hierarchical clustering algorithms. This paper also describes the nature of the clusters and finally compares the performances of these clustering algorithms according to the validity assessment. It also defines which algorithm will be more desirable among all these algorithms to make proper compact clusters on this particular real life datasets. It actually deals with the behaviour of these clustering algorithms with respect to validation indexes and represents their results of evaluation in terms of mathematical and graphical forms.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Giji Kiruba ◽  
Benita

Abstract The energy performance of IoT-MWSNs may be augmented by using a suitable clustering technique for integrating IoT sensors. Clustering, on the other hand, requires additional overhead, such as determining the cluster head and cluster formation. Environmental Energy Attentive Clustering with Remote Nodes is a unique environmental energy attentive clustering approach for IoT-MWSNs proposed in this study methodology (E2ACRN). Cluster head (CH) in E2ACRN is entirely determined by weight. The residual energy of each IoT sensor and the local average energy of all IoT sensors in the cluster are used to calculate the weight. Inappropriately planned allocated clustering techniques might result in nodes being too far away from CH. These distant nodes communicate with the sink by using more energy. The ambient average energy, remoteness among IoT sensors, and sink are used to determine whether a distant node transmits its information to a CH in the previous cycle or to sink in order to lengthen lifetime. The simulation results of the current technique revealed that E2ACRN performs better than previous clustering algorithms.


Author(s):  
Juhi Singh ◽  
Mandeep Mittal ◽  
Sarla Pareek

Due to the increased availability of individual customer data, it is possible to predict customer buying pattern. Customers can be segmented using clustering algorithms based on various parameters such as Frequency, Recency and Monetary values (RFM). The data can further be analyzed to infer rules among two or more purchases of the customer. In this chapter we will present a clustering algorithm, enhanced k- means algorithm, which is based on k- means algorithm to divide customers into various segments. After segmentation, each segment is mined with the help of a priori algorithm to infer rules so that the customer's purchase behavior can be predicted. From large number of association rules with sufficient coverage, the customer's purchasing pattern can be predicted. Experiment on real database is implemented to evaluate the performance on effectiveness and utility of the approach. The results show that the proposed approach can gain a well insight into customers' segmentation and thus their behavior can be predicted.


Author(s):  
Wilson Wong

Feature-based semantic measurements have played a dominant role in conventional data clustering algorithms for many existing applications. However, the applicability of existing data clustering approaches to a wider range of applications is limited due to issues such as complexity involved in semantic computation, long pre-processing time required for feature preparation, and poor extensibility of semantic measurement due to non-incremental feature source. This chapter first summarises the many commonly used clustering algorithms and feature-based semantic measurements, and then highlights the shortcomings to make way for the proposal of an adaptive clustering approach based on featureless semantic measurements. The chapter concludes with experiments demonstrating the performance and wide applicability of the proposed clustering approach.


Author(s):  
Gerald Schaefer

As image databases are growing, efficient and effective methods for managing such large collections are highly sought after. Content-based approaches have shown large potential in this area as they do not require textual annotation of images. However, while for image databases the query-by-example concept is at the moment the most commonly adopted retrieval method, it is only of limited practical use. Techniques which allow human-centred navigation and visualization of complete image collections therefore provide an interesting alternative. In this chapter we present an effective and efficient approach for user-centred navigation of large image databases. Image thumbnails are projected onto a spherical surface so that images that are visually similar are located close to each other in the visualization space. To avoid overlapping and occlusion effects images are placed on a regular grid structure while large databases are handled through a clustering technique paired with a hierarchical tree structure which allows for intuitive real-time browsing experience.


2020 ◽  
Vol 2020 ◽  
pp. 1-15
Author(s):  
Peng Zhang ◽  
Kun She

The target of the clustering analysis is to group a set of data points into several clusters based on the similarity or distance. The similarity or distance is usually a scalar used in numerous traditional clustering algorithms. Nevertheless, a vector, such as data gravitational force, contains more information than a scalar and can be applied in clustering analysis to promote clustering performance. Therefore, this paper proposes a three-stage hierarchical clustering approach called GHC, which takes advantage of the vector characteristic of data gravitational force inspired by the law of universal gravitation. In the first stage, a sparse gravitational graph is constructed based on the top k data gravitations between each data point and its neighbors in the local region. Then the sparse graph is partitioned into many subgraphs by the gravitational influence coefficient. In the last stage, the satisfactory clustering result is obtained by merging these subgraphs iteratively by using a new linkage criterion. To demonstrate the performance of GHC algorithm, the experiments on synthetic and real-world data sets are conducted, and the results show that the GHC algorithm achieves better performance than the other existing clustering algorithms.


Sign in / Sign up

Export Citation Format

Share Document