number of clusters
Recently Published Documents


TOTAL DOCUMENTS

1056
(FIVE YEARS 368)

H-INDEX

44
(FIVE YEARS 5)

2022 ◽  
Vol 10 (4) ◽  
pp. 583-593
Author(s):  
Syiva Multi Fani ◽  
Rukun Santoso ◽  
Suparti Suparti

Social media is computer-based technology that facilitates the sharing of ideas, thoughts, and information through the building of virtual networks and communities. Twitter is one of the most popular social media in Indonesia which has 78 million users. Businesses rely heavily on Twitter for advertising. Businesses can use these types of tweet content as a means of advertising to Twitter users by Knowing the types of tweet content that are mostly retweeted by their followers . In this study, the application of Text Mining to perform clustering using the K-means clustering method with the best number of clusters obtained from the Silhouette Coefficient method on the @bliblidotcom Twitter tweet data to determine the types of tweet content that are mostly retweeted by @bliblidotcom followers. Tweets with the most retweets and favorites are discount offers and flash sales, so Blibli Indonesia could use this kind of tweet to conduct advertising on social media Twitter because the prize quiz tweets are liked by the @bliblidotcom Twitter account followers.


SinkrOn ◽  
2022 ◽  
Vol 7 (1) ◽  
pp. 28-32
Author(s):  
Desi Puspita ◽  
Sasmita Sasmita

The purpose of this study was to analyze the application of the k-means algorithm in classifying tourist visits to the city of Pagar Alam. The k-means algorithm in grouping tourist objects begins by determining the number of clusters to be formed, determining the centroid value of each cluster, calculating the distance between the data, and calculating the minimum object distance calculated. There are 10 tourism objects that are superior from the data from the Tourism Office of the City of Pagar Alam. The research data used is the number of tourist visitors during the COVID-19 pandemic, namely 2020. The data are grouped into 4 clusters, namely C1 = high number of tourist visitors, C2 = moderate number of tourist visitors, C3 = low number of tourist visitors, C4 = number of visitors travel is very low. the centroid values ​​used are C1 = 92,494, Centroid C2 = 71,658, Centroid C3 = 26,981 and centroid C4 = 4,485. then we get the results of grouping C1=Green Paradise tourism, C2=Janang Orange Gardens,, C3=Curup Tujuh Kenangan, Curup Mangkok, Curup dew, Tegur Wangi Site, Pelang Kenidai Village, and C4= Lumai Site, Tebing Tinggi Site and Tanjung Aro Site . From the results of grouping for c4 it becomes a note for the government of the City of Pagar Alam in increasing the number of tourist visitors.


2021 ◽  
Vol 10 (3) ◽  
pp. 359-366
Author(s):  
Hanik Malikhatin ◽  
Agus Rusgiyono ◽  
Di Asih I Maruddani

Prospective TKI workers who apply for passports at the Immigration Office Class I Non TPI Pati have countries destinations and choose different PPTKIS agencies. Therefore, the grouping of characteristics prospective TKI needed so that can be used as a reference for the government in an effort to improve the protection of TKI in destination countries and carry out stricter supervision of PPTKIS who manage TKI. The purpose of this research is to classify the characteristics of prospective TKI workers with the optimal number of clusters. The method used is k-Modes Clustering with values of k = 2, 3, 4, and 5. This method can agglomerate categorical data. The optimal number of clusters can be determined using the Dunn Index. For grouping data easily, then compiled a Graphical User Interface (GUI) based application with RStudio. Based on the analysis, the optimal number of clusters is two clusters with a Dunn Index value of 0,4. Cluster 1 consists of mostly male TKI workers (51,04%), aged ≥ 20 years old (91,93%), with the destination Malaysia country (47%), and choosing PPTKIS Surya Jaya Utama Abadi (37,51%), while cluster 2, mostly of male TKI workers (94,10%), aged ≥ 20 years old (82,31%), with the destination Korea Selatan country (77,95%), and choosing PPTKIS BNP2TKI (99,78%). 


Author(s):  
Afdelia Novianti ◽  
Irsyifa Mayzela Afnan ◽  
Rafi Ilmi Badri Utama ◽  
Edy Widodo

Poverty is an essential issue for every country, including Indonesia. Poverty can be caused by the scarcity of basic necessities or the difficulty of accessing education and employment. In 2019 Papua Province became the province with the highest poverty percentage at 27.53%. Seeing this, the district groupings formed in describing poverty conditions in Papua Province are based on similar characteristics using the variables Percentage of Poor Population, Gross Regional Domestic Product, Open Unemployment Rate, Life Expectancy, Literacy Rate, and Population Working in the Agricultural Sector using K-medoids clustering algorithm. The results of this study indicate that the optimal number of clusters to describe poverty conditions in Papua Province is 4 clusters with a variance of 0.012, where the first cluster consists of 10 districts, the second cluster consists of 5 districts, the third cluster consists of 12 districts, and the fourth cluster consists of 2 districts.


2021 ◽  
Author(s):  
Amir Mosavi ◽  
Majid

Identifying the number of oil families in petroleum basins provides practical and valuable information in petroleum geochemistry studies from exploration to development. Oil family grouping helps us track migration pathways, identify the number of active source rock(s), and examine the reservoir continuity. To date, almost in all oil family typing studies, common statistical methods such as principal component analysis (PCA) and hierarchical clustering analysis (HCA) have been used. However, there is no publication regarding using artificial neural networks (ANNs) for examining the oil families in petroleum basins. Hence, oil family typing requires novel, not overused and common techniques. This paper is the first report of oil family typing using ANNs as robust computational methods. To this end, a self-organization map (SOM) neural network associated with three clustering validity indices were employed on oil samples belonging to the Iranian part of the Persian Gulf’ oilfields. For the SOM network, at first, ten default clusters were selected. Afterwards, three effective clustering validity coefficients, namely Calinski-Harabasz (CH), Silhouette indexes (SI) and Davies-Bouldin (DB), were operated to find the optimum number of clusters. Accordingly, among ten default clusters, the maximum CH (62) and SI (0.58) were acquired for four clusters. Likewise, the lowest DB (0.8) was obtained for four clusters. Thus, all three validation coefficients introduced four clusters as the optimum number of clusters or oil families. The number of oil families identified in the present report is consistent with those previously reported by other researchers in the same study area. However, the techniques used in the present paper, which have not been implemented so far, can be introduced as more straightforward for clustering purposes in the oil family typing than those of common and overused methods of PCA and HCA.


2021 ◽  
Author(s):  
Majid ◽  
Amir Mosavi

Identifying the number of oil families in petroleum basins provides practical and valuable information in petroleum geochemistry studies from exploration to development. Oil family grouping helps us track migration pathways, identify the number of active source rock(s), and examine the reservoir continuity. To date, almost in all oil family typing studies, common statistical methods such as principal component analysis (PCA) and hierarchical clustering analysis (HCA) have been used. However, there is no publication regarding using artificial neural networks (ANNs) for examining the oil families in petroleum basins. Hence, oil family typing requires novel, not overused and common techniques. This paper is the first report of oil family typing using ANNs as robust computational methods. To this end, a self-organization map (SOM) neural network associated with three clustering validity indices were employed on oil samples belonging to the Iranian part of the Persian Gulf’ oilfields. For the SOM network, at first, ten default clusters were selected. Afterwards, three effective clustering validity coefficients, namely Calinski-Harabasz (CH), Silhouette indexes (SI) and Davies-Bouldin (DB), were operated to find the optimum number of clusters. Accordingly, among ten default clusters, the maximum CH (62) and SI (0.58) were acquired for four clusters. Likewise, the lowest DB (0.8) was obtained for four clusters. Thus, all three validation coefficients introduced four clusters as the optimum number of clusters or oil families. The number of oil families identified in the present report is consistent with those previously reported by other researchers in the same study area. However, the techniques used in the present paper, which have not been implemented so far, can be introduced as more straightforward for clustering purposes in the oil family typing than those of common and overused methods of PCA and HCA.


2021 ◽  
Author(s):  
Yuansong zeng ◽  
Zhuoyi Wei ◽  
Fengqi Zhong ◽  
Zixiang Pan ◽  
Yutong Lu ◽  
...  

Clustering analysis is widely utilized in single-cell RNA-sequencing (scRNA-seq) data to discover cell heterogeneity and cell states. While many clustering methods have been developed for scRNA-seq analysis, most of these methods require to provide the number of clusters. However, it is not easy to know the exact number of cell types in advance, and experienced determination is not always reliable. Here, we have developed ADClust, an automatic deep embedding clustering method for scRNA-seq data, which can accurately cluster cells without requiring a predefined number of clusters. Specifically, ADClust first obtains low-dimensional representation through pre-trained autoencoder, and uses the representations to cluster cells into initial micro-clusters. The clusters are then compared in between by a statistical test, and similar micro-clusters are merged into larger clusters. According to the clustering, cell representations are updated so that each cell will be pulled toward centres of its assigned cluster and similar clusters, while cells are separated to keep distances between clusters. This is accomplished through jointly optimizing the carefully designed clustering and autoencoder loss functions. This merging process continues until convergence. ADClust was tested on eleven real scRNA-seq datasets, and shown to outperform existing methods in terms of both clustering performance and the accuracy on the number of the determined clusters. More importantly, our model provides high speed and scalability for large datasets.


2021 ◽  
Vol 6 (3) ◽  
pp. 136
Author(s):  
Aski Widdatul Fuadah ◽  
Fajrin Nurman Arifin ◽  
Oktalia Juwita

Clustering is a process of grouping data based on similarities or similarities with other members in a group. Food security is the condition of a country to provide food for individuals, which does not conflict with beliefs, religion and culture and leads a healthy, active and productive life. Food instability and food insecurity can be caused by many factors, one of which is natural disasters. In 2020, Jember Regency experienced 121 natural disasters. Determination of the optimal K value is done to get the right number of group divisions from the clustering process, in this case using the elbow method. The data used in the clustering process are sub-districts in Jember Regency using transient attributes or natural disaster events. Based on the results of sub-district data grouping from the number of clusters k=1 to k=10, the optimal k value was found at the value of k = 4 with the SSE (Sum of Square Error) value = 24,809.


2021 ◽  
Vol 6 (3) ◽  
pp. 187
Author(s):  
Cepy Sukmayadi ◽  
Aji Primajaya ◽  
Iqbal Maulana

Flood disasters often occur during the rainy season. Karawang is one area that is often flooded. Based on the risk index from BNPB, the flood disaster in Karawang affected 84% of the community, so efforts need to be made to reduce and overcome flood disasters. These problems are the beginning of efforts that need to be known which areas are prone to flooding. Therefore, this study aims to determine flood-prone areas in Karawang as an initial effort in tackling flood disasters. The research was conducted by classifying flood-prone areas using the k-medoids algorithm. K-Medoids uses the partition clustering method to group lists and objects into a number of clusters. This algorithm uses objects in a collection of objects that represent a cluster. The attributes used are flood-causing factors such as rainfall, elevation (soil height), population density, and distance to the river. The results of the study found three potential floods, namely low, medium, and high. There are 1 sub-district with low flood potential, 24 sub-districts with moderate flood potential, and 5 sub-districts with high flood potential. The test results using the silhouette coefficient get a value of 0.370.


Sign in / Sign up

Export Citation Format

Share Document