A Multiple-Label Guided Clustering Algorithm for Historical Document Dating and Localization

AbstractRegression-guided clustering is introduced as a means of constructing circulation-to-environment synoptic climatological classifications. Rather than applying an unsupervised clustering algorithm to synoptic-scale atmospheric circulation data, one instead augments the atmospheric circulation dataset with predictions from a supervised regression model linking circulation to environment. The combined dataset is then entered into the clustering algorithm. The level of influence of the environmental dataset can be controlled by a simple weighting factor. The method is generic in that the choice of regression model and clustering algorithm is left to the user. Examples are given using standard multivariate linear regression models and the k-means clustering algorithm, both established methods in synoptic climatology. Results for southern British Columbia, Canada, indicate that model performance can be made to range between that of a fully unsupervised algorithm and a fully supervised algorithm.

Download Full-text

“Follow the Leader”: A Centrality Guided Clustering and Its Application to Social Network Analysis

The Scientific World JOURNAL ◽

10.1155/2013/368568 ◽

2013 ◽

Vol 2013 ◽

pp. 1-9 ◽

Cited By ~ 8

Author(s):

Qin Wu ◽

Xingqin Qi ◽

Eddie Fuller ◽

Cun-Quan Zhang

Keyword(s):

Social Network ◽

Network Analysis ◽

Clustering Algorithm ◽

Data Sets ◽

Network Clustering ◽

Clustering Methods ◽

Social Network Data ◽

Centrality Score ◽

Vertex Centrality ◽

Guided Clustering

Within graph theory and network analysis, centrality of a vertex measures the relative importance of a vertex within a graph. The centrality plays key role in network analysis and has been widely studied using different methods. Inspired by the idea of vertex centrality, a novel centrality guided clustering (CGC) is proposed in this paper. Different from traditional clustering methods which usually choose the initial center of a cluster randomly, the CGC clustering algorithm starts from a “LEADER”—a vertex with the highest centrality score—and a new “member” is added into the same cluster as the “LEADER” when some criterion is satisfied. The CGC algorithm also supports overlapping membership. Experiments on three benchmark social network data sets are presented and the results indicate that the proposed CGC algorithm works well in social network clustering.

Download Full-text

Combining an Evolution-guided Clustering Algorithm and Haplotype-based LRT in Family Association Studies

BMC Genetics ◽

10.1186/1471-2156-12-48 ◽

2011 ◽

Vol 12 (1) ◽

pp. 48 ◽

Cited By ~ 3

Author(s):

Mei-Hsien Lee ◽

Jung-Ying Tzeng ◽

Su-Yun Huang ◽

Chuhsing Hsiao

Keyword(s):

Clustering Algorithm ◽

Association Studies ◽

Guided Clustering

Download Full-text

Distributed Entropy Energy-Efficient Clustering algorithm for cluster head selection (DEEEC)

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189135 ◽

2020 ◽

Vol 39 (6) ◽

pp. 8139-8147

Author(s):

Ranganathan Arun ◽

Rangaswamy Balamurugan

Keyword(s):

Energy Efficient ◽

Clustering Algorithm ◽

Cluster Head ◽

Residual Energy ◽

Energy Utilization ◽

Sensor Nodes ◽

Second Stage ◽

Energy Efficient Clustering ◽

Two Stages ◽

Ch Selection

In Wireless Sensor Networks (WSN) the energy of Sensor nodes is not certainly sufficient. In order to optimize the endurance of WSN, it is essential to minimize the utilization of energy. Head of group or Cluster Head (CH) is an eminent method to develop the endurance of WSN that aggregates the WSN with higher energy. CH for intra-cluster and inter-cluster communication becomes dependent. For complete, in WSN, the Energy level of CH extends its life of cluster. While evolving cluster algorithms, the complicated job is to identify the energy utilization amount of heterogeneous WSNs. Based on Chaotic Firefly Algorithm CH (CFACH) selection, the formulated work is named “Novel Distributed Entropy Energy-Efficient Clustering Algorithm”, in short, DEEEC for HWSNs. The formulated DEEEC Algorithm, which is a CH, has two main stages. In the first stage, the identification of temporary CHs along with its entropy value is found using the correlative measure of residual and original energy. Along with this, in the clustering algorithm, the rotating epoch and its entropy value must be predicted automatically by its sensor nodes. In the second stage, if any member in the cluster having larger residual energy, shall modify the temporary CHs in the direction of the deciding set. The target of the nodes with large energy has the probability to be CHs which is determined by the above two stages meant for CH selection. The MATLAB is required to simulate the DEEEC Algorithm. The simulated results of the formulated DEEEC Algorithm produce good results with respect to the energy and increased lifetime when it is correlated with the current traditional clustering protocols being used in the Heterogeneous WSNs.

Download Full-text

Handling WSD using Hierarchical Clustering Algorithm with sentences

International Journal of Scientific Research in Science Engineering and Technology ◽

10.32628/ijsrset1841120 ◽

2018 ◽

pp. 83-88

Author(s):

Mohana Priya K ◽

Pooja Ragavi S ◽

Krishna Priya G

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Cosine Similarity Measure ◽

Hierarchical Clustering Algorithm ◽

Multiple Levels ◽

Pos Tagger ◽

Sentence Clustering ◽

The Right

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%

Download Full-text

K-MEANS CLUSTERING ALGORITHM BASED CLASSIFICATION OF SOIL FERTILITY IN NORTH WEST NIGERIA

FUDMA Journal of Sciences ◽

10.33003/fjs-2020-0402-363 ◽

2020 ◽

Vol 4 (2) ◽

pp. 780-787

Author(s):

Ibrahim Hassan Hayatu ◽

Abdullahi Mohammed ◽

Barroon Ahmad Isma’eel ◽

Sahabi Yusuf Ali

Keyword(s):

Soil Fertility ◽

Crop Yield ◽

Clustering Algorithm ◽

Soil Samples ◽

North West ◽

R Programming ◽

Available Information ◽

Northwest Region ◽

The Relationship

Soil fertility determines a plant's development process that guarantees food sufficiency and the security of lives and properties through bumper harvests. The fertility of soil varies according to regions, thereby determining the type of crops to be planted. However, there is no repository or any source of information about the fertility of the soil in any region in Nigeria especially the Northwest of the country. The only available information is soil samples with their attributes which gives little or no information to the average farmer. This has affected crop yield in all the regions, more particularly the Northwest region, thus resulting in lower food production. Therefore, this study is aimed at classifying soil data based on their fertility in the Northwest region of Nigeria using R programming. Data were obtained from the department of soil science from Ahmadu Bello University, Zaria. The data contain 400 soil samples containing 13 attributes. The relationship between soil attributes was observed based on the data. K-means clustering algorithm was employed in analyzing soil fertility clusters. Four clusters were identified with cluster 1 having the highest fertility, followed by 2 and the fertility decreases with an increasing number of clusters. The identification of the most fertile clusters will guide farmers on where best to concentrate on when planting their crops in order to improve productivity and crop yield.

Download Full-text