scholarly journals Exemplars can Reciprocate Principal Components

2021 ◽  
Vol 20 ◽  
pp. 30-38
Author(s):  
Kieran Greer

This paper presents a clustering algorithm that is an extension of the Category Trees algorithm. Category Trees is a clustering method that creates tree structures that branch on category type and not feature. The development in this paper is to consider a secondary order of clustering that is not the category to which the data row belongs, but the tree, representing a single classifier, that it is eventually clustered with. Each tree branches to store subsets of other categories, but the rows in those subsets may also be related. This paper is therefore concerned with looking at that second level of clustering between the category subsets, to try to determine if there is any consistency over it. It is argued that Principal Components may be a related and reciprocal type of structure, and there is an even bigger question about the relation between exemplars and principal components, in general. The theory is demonstrated using the Portugal Forest Fires dataset as a case study. The Category Trees are then combined with other Self-Organising algorithms from the author and it is suggested that they all belong to the same family type, which is an Entropy-style of classifier. Some analysis of classifier types is also presented.

2021 ◽  
Vol 2021 ◽  
pp. 1-10
Author(s):  
Yaping Li

The main objective of this paper is to present a new clustering algorithm for metadata trees based on K-prototypes algorithm, GSO (glowworm swarm optimization) algorithm, and maximal frequent path (MFP). Metadata tree clustering includes computing the feature vector of the metadata tree and the feature vector clustering. Therefore, traditional data clustering methods are not suitable directly for metadata trees. As the main method to calculate eigenvectors, the MFP method also faces the difficulties of high computational complexity and loss of key information. Generally, the K-prototypes algorithm is suitable for clustering of mixed-attribute data such as feature vectors, but the K-prototypes algorithm is sensitive to the initial clustering center. Compared with other swarm intelligence algorithms, the GSO algorithm has more efficient global search advantages, which are suitable for solving multimodal problems and also useful to optimize the K-prototypes algorithm. To address the clustering of metadata tree structures in terms of clustering accuracy and high data dimension, this paper combines the GSO algorithm, K-prototypes algorithm, and MFP together to study and design a new metadata structure clustering method. Firstly, MFP is used to describe metadata tree features, and the key parameter of categorical data is introduced into the feature vector of MFP to improve the accuracy of the feature vector to describe the metadata tree; secondly, GSO is combined with K-prototypes to design GSOKP for clustering the feature vector that contains numeric data and categorical data so as to improve the clustering accuracy; finally, tests are conducted with a set of metadata trees. The experimental results show that the designed metadata tree clustering method GSOKP-FP has certain advantages in respect to clustering accuracy and time complexity.


DYNA ◽  
2019 ◽  
Vol 86 (211) ◽  
pp. 94-101
Author(s):  
Jesica Rubiano Moreno ◽  
Carlos Alonso Malaver ◽  
Samuel Nucamendi Guillén ◽  
Carlos López Hernández

The aim of this study is to introduce a new clustering method for ipsatives variables. This  method can be used for nominals or ordinals variables for which responses must be mutually exclusive, and it is independent of data distribution. The proposed method is applied to outline motivational profiles for individuals based on a declared preferences set.  A case study is used to analyze the performance of the proposed algorithm by comparing proposed method results versus the PAM method. Results show that proposed method generate a better segmentation and differentiated groups. An extensive study was conducted to validate the performance clustering method against a set of random groups by clustering measures.


2013 ◽  
Vol 756-759 ◽  
pp. 3849-3854
Author(s):  
Xi Yang Yang ◽  
Fu Sheng Yu

A novel kernel based semi-supervised fuzzy clustering algorithm is proposed, and its iterative formula is given. This new algorithm can effectively improve the efficiency of the clustering algorithm. Combined with Fisher projection algorithm, two principal components are extracted from 7 hue statistics and 11 green value statistics, this new semi-supervised clustering method is applied to recognize the angular leaf spot disease of Bauhinia blakeana. The results showed that the consistent rate is 100% for the labeled leaves, and above 95% for other unlabeled leaves.


Author(s):  
Nabila Amalia Khairani ◽  
Edi Sutoyo

Forest and land fires are disasters that often occur in Indonesia. In 2007, 2012 and 2015 forest fires that occurred in Sumatra and Kalimantan attracted global attention because they brought smog pollution to neighboring countries. One of the regions that has the highest fire hotspots is West Kalimantan Province. Forest and land fires have an impact on health, especially on the communities around the scene, as well as on the economic and social aspects. This must be overcome, one of them is by knowing the location of the area of ??fire and can analyze the causes of forest and land fires. With the impact caused by forest and land fires, the purpose of this study is to apply the clustering method using the k-means algorithm to be able to determine the hotspot prone areas in West Kalimantan Province. And evaluate the results of the cluster that has been obtained from the clustering method using the k-means algorithm. Data mining is a suitable method to be able to find out information on hotspot areas. The data mining method used is clustering because this method can process hotspot data into information that can inform areas prone to hotspots. This clustering uses k-means algorithm which is grouping data based on similar characteristics. The hotspots data obtained are grouped into 3 clusters with the results obtained for cluster 0 as many as 284 hotspots including hazardous areas, 215 hotspots including non-prone areas and 129 points that belong to very vulnerable areas. Then the clustering results were evaluated using the Davies-Bouldin Index (DBI) method with a value of 3.112 which indicates that the clustering results of 3 clusters were not optimal.


Author(s):  
Muchamad Kurniawan ◽  
Rani Rotul Muhima ◽  
Siti Agustini

One of the causes of forest fires is the lack of speed of handling when a fire occurs. This can be anticipated by determining how many extinguishing units are in the center of the hot spot. To get hotspots, NASA has provided an active fire dataset. The clustering method is used to get the most optimal centroid point. The clustering methods we use are K-Means, Fuzzy C-Means (FCM), and Average Linkage. The reason for using K-means is a simple method and has been applied in various areas. FCM is a partition-based clustering algorithm which is a development of the K-means method. The hierarchical based clustering method is represented by the Average Linkage method.  The measurement technique that uses is the sum of the internal distance of each cluster. Elbow evaluation is used to evaluate the optimal cluster. The results obtained after conducting the K-Means trial obtained the best results with a total distance of 145.35 km, and the best clusters from this method were 4 clusters. Meanwhile, the total distance values obtained from the FCM and Linkage methods were 154.13 km and 266.61 km.


Author(s):  
Nabila Amalia Khairani ◽  
Edi Sutoyo

Forest and land fires are disasters that often occur in Indonesia. In 2007, 2012 and 2015 forest fires that occurred in Sumatra and Kalimantan attracted global attention because they brought smog pollution to neighboring countries. One of the regions that has the highest fire hotspots is West Kalimantan Province. Forest and land fires have an impact on health, especially on the communities around the scene, as well as on the economic and social aspects. This must be overcome, one of them is by knowing the location of the area of ??fire and can analyze the causes of forest and land fires. With the impact caused by forest and land fires, the purpose of this study is to apply the clustering method using the k-means algorithm to be able to determine the hotspot prone areas in West Kalimantan Province. And evaluate the results of the cluster that has been obtained from the clustering method using the k-means algorithm. Data mining is a suitable method to be able to find out information on hotspot areas. The data mining method used is clustering because this method can process hotspot data into information that can inform areas prone to hotspots. This clustering uses k-means algorithm which is grouping data based on similar characteristics. The hotspots data obtained are grouped into 3 clusters with the results obtained for cluster 0 as many as 284 hotspots including hazardous areas, 215 hotspots including non-prone areas and 129 points that belong to very vulnerable areas. Then the clustering results were evaluated using the Davies-Bouldin Index (DBI) method with a value of 3.112 which indicates that the clustering results of 3 clusters were not optimal.


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


1989 ◽  
Vol 54 (10) ◽  
pp. 2692-2710 ◽  
Author(s):  
František Babinec ◽  
Mirko Dohnal

The problem of transformation of data on the reliability of chemical equipment obtained in particular conditions to other equipment in other conditions is treated. A fuzzy clustering algorithm is defined for this problem. The method is illustrated on a case study.


2021 ◽  
Vol 13 (6) ◽  
pp. 3246
Author(s):  
Zoe Slattery ◽  
Richard Fenner

Building on the existing literature, this study examines whether specific drivers of forest fragmentation cause particular fragmentation characteristics, and how these characteristics can be linked to their effects on forest-dwelling species. This research uses Landsat remote imaging to examine the changing patterns of forests. It focuses on areas which have undergone a high level of a specific fragmentation driver, in particular either agricultural expansion or commodity-driven deforestation. Seven municipalities in the states of Rondônia and Mato Grosso in Brazil are selected as case study areas, as these states experienced a high level of commodity-driven deforestation and agricultural expansion respectively. Land cover maps of each municipality are created using the Geographical Information System software ArcGIS Spatial Analyst extension. The resulting categorical maps are input into Fragstats fragmentation software to calculate quantifiable fragmentation metrics for each municipality. To determine the effects that these characteristics are likely to cause, this study uses a literature review to determine how species traits affect their responses to forest fragmentation. Results indicate that, in areas that underwent agricultural expansion, the remaining forest patches became more complex in shape with longer edges and lost a large amount of core area. This negatively affects species which are either highly dispersive or specialist to core forest habitat. In areas that underwent commodity-driven deforestation, it was more likely that forest patches would become less aggregated and create disjunct core areas. This negatively affects smaller, sedentary animals which do not naturally travel long distances. This study is significant in that it links individual fragmentation drivers to their landscape characteristics, and in turn uses these to predict effects on species with particular traits. This information will prove useful for forest managers, particularly in the case study municipalities examined in this study, in deciding which species require further protection measures. The methodology could be applied to other drivers of forest fragmentation such as forest fires.


Sign in / Sign up

Export Citation Format

Share Document