silhouette width
Recently Published Documents


TOTAL DOCUMENTS

20
(FIVE YEARS 11)

H-INDEX

4
(FIVE YEARS 1)

2021 ◽  
Vol 14 (2) ◽  
pp. 137-143
Author(s):  
Nur Laita Rizki Amalia ◽  
Ahmad Afif Supianto ◽  
Nanang Yudi Setiawan ◽  
Vicky Zilvan ◽  
Asri Rizki Yuliani ◽  
...  

Learning activities are one of the processes of delivering information or messages from teachers to students. SMPN 4 Sidoarjo is a State Junior High School (JHS) located in Sidoarjo Regency. During the learning process, the collected academic score data were still not well organized by teachers and school principals in monitoring student learning performance. The score data is from Bahasa Indonesia subject from a teacher with 222 data included at 2019/2020 school year. The method used in student clustering is K-Means. The number of clusters are determined using the elbow method and displayed in graphic form. Clustering result can be used as a reference for teachers in determining study groups and determining the best treatment for each cluster. The best clustering results are proven by validation score using Davies-Bouldin Index, Silhouette Width, and Calinski-Harabasz Index. Three clusters were obtained for each class level of data, while the cluster ranges from two to five for the data for each study group. The dashboard is used in order to visualize the clustering result. Usability testing using System Usability Scale (SUS) has a score value of 87.5, which means that the dashboard can be accepted by SMPN 4 Sidoarjo.


2021 ◽  
Vol 158 ◽  
pp. 107190
Author(s):  
Fatima Batool ◽  
Christian Hennig
Keyword(s):  

Forests ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 649
Author(s):  
Oliver Limberger ◽  
Jürgen Homeier ◽  
Nina Farwig ◽  
Franz Pucha-Cofrep ◽  
Andreas Fries ◽  
...  

Few plant functional types (PFTs) with fixed average traits are used in land surface models (LSMs) to consider feedback between vegetation and the changing atmosphere. It is uncertain if highly diverse vegetation requires more local PFTs. Here, we analyzed how 52 tree species of a megadiverse mountain rain forest separate into local tree functional types (TFTs) for two functions: biomass production and solar radiation partitioning. We derived optical trait indicators (OTIs) by relating leaf optical metrics and functional traits through factor analysis. We distinguished four OTIs explaining 38%, 21%, 15%, and 12% of the variance, of which two were considered important for biomass production and four for solar radiation partitioning. The clustering of species-specific OTI values resulted in seven and eight TFTs for the two functions, respectively. The first TFT ensemble (P-TFTs) represented a transition from low to high productive types. The P-TFT were separated with a fair average silhouette width of 0.41 and differed markedly in their main trait related to productivity, Specific Leaf Area (SLA), in a range between 43.6 to 128.2 (cm2/g). The second delineates low and high reflective types (E-TFTs), were subdivided by different levels of visible (VIS) and near-infrared (NIR) albedo. The E-TFTs were separated with an average silhouette width of 0.28 and primarily defined by their VIS/NIR albedo. The eight TFT revealed an especially pronounced range in NIR reflectance of 5.9% (VIS 2.8%), which is important for ecosystem radiation partitioning. Both TFT sets were grouped along elevation, modified by local edaphic gradients and species-specific traits. The VIS and NIR albedo were related to altitude and structural leaf traits (SLA), with NIR albedo showing more complex associations with biochemical traits and leaf water. The TFTs will support LSM simulations used to analyze the functioning of mountain rainforests under climate change.


2020 ◽  
Vol 13 (2) ◽  
pp. 11
Author(s):  
Bekti Endar Susilowati ◽  
Pardomuan Robinson Sihombing

Principal Component Analysis (PCA) merupakan salah satu analisis multivariat yang digunakan untuk mengganti variable dengan Principal Component yang sedikit jumlahnya namun tidak terlalu banyak informasi yang hilang. Atau dengan kata lain, it used to explain the underlying variance-covariance structure of the large data set of variables through a few linear combination of these variables. PCA sangat dipengaruhi oleh kehadiran outlier karena didasarkan pada matriks kovarian yang sensitive terhadap outlier. Oleh karena itu, pada analisis ini akan digunakan PCA yang robust terhadap outlier yaitu ROBPCA atau PCA Hubert. Selanjutnya, dari Principal Component yang terbentuk digunakan sebagai input (masukan) untuk cluster analysis dengan metode Clara (Clustering Large Area). Clustering Large Area merupakan salah satu metode k-medoids yang robust terhadap outlier dan baik digunakan pada data dalam jumlah besar. Dalam studi kasus terhadap variabel penyusun indeks kebahagiaan berdasarkan The World Happiness Report 2018 dengan metode Clara yang menggunakan jarak manhattan didapatkan nilai rata-rata Overall Average Silhouette Width yang terbaik pada 5 cluster. 


2019 ◽  
Vol 9 (23) ◽  
pp. 13231-13243 ◽  
Author(s):  
Attila Lengyel ◽  
Zoltán Botta‐Dukát

2019 ◽  
Vol 490 (3) ◽  
pp. 3392-3403
Author(s):  
Mario Pasquato ◽  
Chul Chung

ABSTRACT Globular clusters (GCs) have historically been subdivided in either two (disc/halo) or three (disc/inner-halo/outer-halo) groups based on their orbital, chemical, and internal physical properties. The qualitative nature of this subdivision makes it impossible to determine whether the natural number of groups is actually two, three, or more. In this paper we use cluster analysis on the (log M, log σ0, log Re, [Fe/H], log |Z|) space to show that the intrinsic number of GC groups is actually either k = 2 or k = 3, with the latter being favoured albeit non-significantly. In the k = 2 case, the Partitioning Around Medoids (PAM) clustering algorithm recovers a metal-poor halo GC group and a metal-rich disc GC group. With k = 3 the three groups can be interpreted as disc/inner-halo/outer-halo families. For each group we obtain a medoid, i.e. a representative element (NGC 6352, NGC 5986, and NGC 5466 for the disc, inner halo, and outer halo, respectively), and a measure of how strongly each GC is associated with its group, the so-called silhouette width. Using the latter, we find a correlation with age for both disc and outer halo GCs where the stronger the association of a GC with the disc (outer halo) group, the younger (older) it is. Our findings are aligned with previous work based on very different approaches, such as cladistic analysis, suggesting that the grouping we obtain is quite robust and represents some genuine underlying physical subdivision of GCs. We provide a catalogue where we list the assigned group for each GC.


2019 ◽  
Author(s):  
Ateeq Muhammed Khaliq ◽  
RG Sharathchandra ◽  
Meenakshi Rajamohan

AbstractThis study aims to create a tumor heterogeneity-based model for predicting the best features of lung adenocarcinoma (LUAD) in multiple cancer subtypes using the Least Absolute Shrinking and Selection Operator (LASSO). The RNA-Seq raw count data of 533 LUAD samples and 59 normal samples were downloaded from the TCGA data portal. Based on consensus clustering method samples was divided into two subtypes, and clusters were validated using silhouette width. Furthermore, we estimated subtypes for the abundance of immune and non-immune stromal cell populations which infiltrated cancer tissue. We established the LASSO model for predicting each subtype’s best features. Enrichment pathway analysis was then carried out. Finally, the validity of the LASSO model for identifying features was established by the survival analysis. Our study suggests that the unsupervised clustering and Machine learning methods such as LASSO model-based feature selection can be effectively used to predict relevant genes which might play an essential role in cancer diagnosis.


2019 ◽  
Vol 8 (2) ◽  
pp. 161-170
Author(s):  
Milla Alifatun Nahdliyah ◽  
Tatik Widiharih ◽  
Alan Prahutama

The k-medoids method is a non-hierarchical clustering to classify n object into k clusters that have the same characteristics. This clustering algorithm uses the medoid as its cluster center. Medoid is the most centrally located object in a cluster, so it’s robust to outliers. In cluster analysis the objects are grouped by the similarity. To measure the similarity, it can be used distance measures, euclidean distance and cityblock distance. The distance that is used in cluster analysis can affect the clustering results. Then, to determine the quality of the clustering results can be used the internal criteria with silhouette width and C-index. In this research the k-medoids method to classify of regencies/cities in Central Java based on type and number of crimes. The optimal cluster at k= 4 use euclidean distance, where the silhouette index= 0,3862593 and C-index= 0,043893. Keywords: Clustering, k-Medoids, Euclidean distance, Cityblock distance, Silhouette index, C-index, Crime


2019 ◽  
Author(s):  
Attila Lengyel ◽  
David W. Roberts ◽  
Zoltán Botta-Dukát

AbstractAimsTo introduce REMOS, a new iterative reallocation method (with two variants) for vegetation classification, and to compare its performance with OPTSIL. We test (1) how effectively REMOS and OPTSIL maximize mean silhouette width and minimize the number of negative silhouette widths when run on classifications with different structure; (2) how these three methods differ in runtime with different sample sizes; and (3) if classifications by the three reallocation methods differ in the number of diagnostic species, a surrogate for interpretability.Study areaSimulation; example data sets from grasslands in Hungary and forests in Wyoming and Utah, USA.MethodsWe classified random subsets of simulated data with the flexible-beta algorithm for different values of beta. These classifications were subsequently optimized by REMOS and OPTSIL and compared for mean silhouette widths and proportion of negative silhouette widths. Then, we classified three vegetation data sets of different sizes from two to ten clusters, optimized them with the reallocation methods, and compared their runtimes, mean silhouette widths, numbers of negative silhouette widths, and the number of diagnostic species.ResultsIn terms of mean silhouette width, OPTSIL performed the best when the initial classifications already had high mean silhouette width. REMOS algorithms had slightly lower mean silhouette width than what was maximally achievable with OPTSIL but their efficiency was consistent across different initial classifications; thus REMOS was significantly superior to OPTSIL when the initial classification had low mean silhouette width. REMOS resulted in zero or a negligible number of negative silhouette widths across all classifications. OPTSIL performed similarly when the initial classification was effective but could not reach as low proportion of misclassified objects when the initial classification was inefficient. REMOS algorithms were typically more than an order of magnitude faster to calculate than OPTSIL. There was no clear difference between REMOS and OPTSIL in the number of diagnostic species.ConclusionsREMOS algorithms may be preferable to OPTSIL when (1) the primary objective is to reduce or eliminate negative silhouette widths in a classification, (2) the initial classification has low mean silhouette width, or (3) when the time efficiency of the algorithm is important because of the size of the data set or the high number of clusters.


Sign in / Sign up

Export Citation Format

Share Document