silhouette index
Recently Published Documents


TOTAL DOCUMENTS

82
(FIVE YEARS 53)

H-INDEX

5
(FIVE YEARS 1)

2022 ◽  
pp. 104-120
Author(s):  
Siarudin Mohamad ◽  
San Afri Awang ◽  
Ronggo Sadono ◽  
Priyono Suryanto

Small-scale Privately-owned Forest (SSPF) has various patterns identification, based on the stand structure and species composition. The recognition and classification of the SSPF cropping patterns are required for further planning and policy development. Therefore, this study aims to classify the cropping pattern of SSPF in Ciamis Regency, West Java Province, Indonesia. The data were collected by observing the stand structure and species composition of 150 plots of land, encompassing three Sub-districts representing the central, northern, and southern regions of Ciamis Regency. The four categorical variables include tree species composition, age, spatial distribution, and intercropping pattern. While the two continuous variables were stand density and basal area. The patterns obtained were classified based on a Two-Step Cluster algorithm with log-likelihood distance measure, and auto clustering using Schwarz's Bayesian Information Criterion, validated by silhouette index. In addition, a multicollinearity test was conducted to reduce redundancy in using variable sets. The results showed that, the improvement of the cluster quality based on the silhouette index value, was achievable by excluding the tree spatial distribution variable, which exhibits multicollinearity. The cropping patterns were classified into three categories, namely tree crops, mixed-tree lots, and agrisilviculture for group-1, group-2, and group-3, respectively. Group-1 consisted of stands with one or two commercial tree species, and in several cases, were intercropped. Group-2 contained uneven-aged mixed-tree stands without any crops. While Group-3 consisted of an intercropping system of uneven-aged mixed-tree stands and crops. The results suggest further analysis, in order to relate the cropping patterns with the socio-economic characteristics of the landowners, as well as the strategies for the development of a sustainable SSPF.


AI ◽  
2021 ◽  
Vol 3 (1) ◽  
pp. 1-22
Author(s):  
Jean-Sébastien Dessureault ◽  
Daniel Massicotte

This paper examines the critical decision process of reducing the dimensionality of a dataset before applying a clustering algorithm. It is always a challenge to choose between extracting or selecting features. It is not obvious to evaluate the importance of the features since the most popular methods to do it are usually intended for a supervised learning technique process. This paper proposes a novel method called “Decision Process for Dimensionality Reduction before Clustering” (DPDRC). It chooses the best dimensionality reduction method (selection or extraction) according to the data scientist’s parameters and the profile of the data, aiming to apply a clustering process at the end. It uses a Feature Ranking Process Based on Silhouette Decomposition (FRSD) algorithm, a Principal Component Analysis (PCA) algorithm, and a K-means algorithm along with its metric, the Silhouette Index (SI). This paper presents five scenarios based on different parameters. This research also aims to discuss the impacts, advantages, and disadvantages of each choice that can be made in this unsupervised learning process.


Author(s):  
Gary Reyes ◽  
Laura Lanzarini ◽  
Waldo Hasperué ◽  
Aurelio F. Bariviera

Given the large volume of georeferenced information generated and stored by many types of devices, the study and improvement of techniques capable of operating with these data is an area of great interest. The analysis of vehicular trajectories with the aim of forming clusters and identifying emerging patterns is very useful for characterizing and analyzing transportation flows in cities. This paper presents a new trajectory clustering method capable of identifying clusters of vehicular sub-trajectories in various sectors of a city. The proposed method is based on the use of an auxiliary structure to determine the correct location of the centroid of each group or set of sub-trajectories along the adaptive process. The proposed method was applied on three real databases, as well as being compared with other relevant methods, achieving satisfactory results and showing good cluster quality according to the Silhouette index.


2021 ◽  
Vol 21 (2) ◽  
pp. 38-56
Author(s):  
Kinga Kądziołka

Abstract Research background: The multidimensional assessment of the attractiveness of cryptocurrency exchanges seems to be an important issue, because the risk of the collapse of such an exchange or its use for illegal purposes is higher than in the case of traditional exchanges. Purpose: The aim of the work is to create ranking and identify groups of cryptocurrency exchanges with a similar level of attractiveness. Research methodology: 13 different composite indicators were considered. Finally, one of them was chosen as a representative according to the similarity of the obtained rankings. Clustering methods were used to identify groups of exchanges with a similar level of the constructed measure. Result: The best according to the adopted criteria of rankings similarity was the taxonomic measure constructed using the standardized sum method with equal weights. Combining hierarchical clustering with the k-means algorithm allowed to improve the quality of clustering measured with the silhouette index. Novelty: The originality of the paper lies in the use of different methods of a multidimensional comparative analysis on the cryptocurrency market.


2021 ◽  
Vol 13 (21) ◽  
pp. 4250
Author(s):  
Jordi Mahardika Puntu ◽  
Ping-Yu Chang ◽  
Ding-Jiun Lin ◽  
Haiyina Hasbia Amania ◽  
Yonatan Garkebo Doyoro

We aim to develop a comprehensive tunnel lining detection method and clustering technique for semi-automatic rebar identification in order to investigate the ten tunnels along the South-link Line Railway of Taiwan (SLRT). We used the Ground Penetrating Radar (GPR) instrument with a 1000 MHz antenna frequency, which was placed on a versatile antenna holder that is flexible to the tunnel’s condition. We called it a Vehicle-mounted Ground Penetrating Radar (VMGPR) system. We detected the tunnel lining boundary according to the Fresnel Reflection Coefficient (FRC) in both A-scan and B-scan data, then estimated the thinning lining of the tunnels. By applying the Hilbert Transform (HT), we extracted the envelope to see the overview of the energy distribution in our data. Once we obtained the filtered radargram, we used it to estimate the Two-dimensional Forward Modeling (TDFM) simulation parameters. Specifically, we produced the TDFM model with different random noise (0–30%) for the rebar model. The rebar model and the field data were identified with the Hierarchical Agglomerative Clustering (HAC) in machine learning and evaluated using the Silhouette Index (SI). Taken together, these results suggest three boundaries of the tunnel lining i.e., the air–second lining boundary, the second–first lining boundary, and the first–wall rock boundary. Among the tunnels that we scanned, the Fangye 1 tunnel is the only one in category B, with the highest percentage of the thinning lining, i.e., 13.39%, whereas the other tunnels are in category A, with a percentage of the thinning lining of 0–1.71%. Based on the clustered radargram, the TDFM model for rebar identification is consistent with the field data, where k = 2 is the best choice to represent our data set. It is interesting to observe in the clustered radargram that the TDFM model can mimic the field data. The most striking result is that the TDFM model with 30% random noise seems to describe our data well, where the rebar response is rough due to the high noise level on the radargram.


Author(s):  
Wawan Gunawan

Seiring dengan perkembangan teknologi informasi dan komunikasi, semakin banyak data yang digunakan dalam suatu pemecahan masalah. Tetapi, dengan banyaknya data yang ada sangat sulit mencari informasi yang diinginkan. Oleh karena itu, dilakukan data mining untuk mengekstraksi pengetahuan secara otomatis dari data berukuran besar dengan cara mencari pola-pola menarik yang terkandung di dalam data tersebut. Dalam penelitian ini, peneliti menggunakan algoritma DBSCAN dalam penelitiannya. Data yang digunakan adalah data spasial mahasiswa Universitas Mercu Buana. Dari data ini, peneliti mengambil informasi scatterplot yang terbentuk, lalu dengan algoritma DBSCAN untuk melihat cluster yang terbentuk, dan melakukan validasi dengan Silhouette Index. Dari penelitian ini dapat disimpulkan bahwa algoritma DBSCAN berhasil diimplementasikan pada data mahasiswa Universitas Mercu Buana. Dan hasil pengujian dari implementasi algoritma DBSCAN dipengaruhi oleh dua nilai parameter yaitu Minimum Points, dan Epsilon.


2021 ◽  
Vol 15 ◽  
Author(s):  
Ori Carmi ◽  
Adi Gross ◽  
Nadav Ivzan ◽  
Lamberto La Franca ◽  
Nairouz Farah ◽  
...  

The localization and measurement of neuronal activity magnitude at high spatial and temporal resolution are essential for mapping and better understanding neuronal systems and mechanisms. One such example is the generation of retinotopic maps, which correlates localized retinal stimulation with the corresponding specific visual cortex responses. Here we evaluated and compared seven different methods for extracting and localizing cortical responses from voltage-sensitive dye imaging recordings, elicited by visual stimuli projected directly on the rat retina by a customized projection system. The performance of these methods was evaluated both qualitatively and quantitatively by means of two cluster separation metrics, namely, the (adjusted) Silhouette Index (SI) and the (adjusted) Davies-Bouldin Index (DBI). These metrics were validated using simulated data, which showed that Temporally Structured Component Analysis (TSCA) outperformed all other analysis methods for localizing cortical responses and generating high-resolution retinotopic maps. The analysis methods, as well as the use of cluster separation metrics proposed here, can facilitate future research aiming to localize specific activity at high resolution in the visual cortex or other brain areas.


2021 ◽  
Vol 18 (1) ◽  
pp. 130-140
Author(s):  
Yanuwar Reinaldi ◽  
Nurissaidah Ulinnuha ◽  
Moh. Hafiyusholeh

Community welfare is one of the important points for a region and is also the essence of national development. The welfare of the people in Indonesia is fairly unequal, especially in East Java. To be able to map an area to the welfare of its people in East Java, one way that can be used is to use clustering. The hierarchical clustering method is one of the clustering methods for grouping data. In hierarchical clustering, single linkage, complete linkage, and average linkage methods are suitable methods for grouping data, which will compare the best method to use. The results of the calculation show that the average linkage method with three clusters is the best calculation with a silhouette index value of 0.6054, with the 1st cluster there are 23 regions, namely the city/district with the highest community welfare, the 2nd cluster there are 11 regions, namely cities/districts with moderate social welfare, and in the third cluster there are 4 regions, namely cities/districts with the lowest community welfare.


2021 ◽  
Vol 4 (S3) ◽  
Author(s):  
Alexander Bogensperger ◽  
Yann Fabel

AbstractWith increasing digitization, new opportunities emerge concerning the availability and use of data in the energy sector. A comprehensive literature review shows an abundance in available unsupervised clustering algorithms as well as internal, relative and external cluster validation indices (cvi) to evaluate the results. Yet, the comparison of different clustering results on the same dataset, executed with different algorithms and a specific practical goal in mind still proves scientifically challenging. A large variety of cvi are described and consolidated in commonly used composite indices (e.g. Davies-Bouldin-Index, silhouette-Index, Dunn-Index). Previous works show the challenges surrounding these composite indices since they serve a generalized cluster quality evaluation. However, this does not suit individual clustering goals in many cases. The presented paper introduces the current state of science, existing cluster validation indices and proposes a practical method to combine them to an individual composite index, using Multi Criteria Decision Analysis (mcda). The methodology is applied on two energy economic use cases for clustering load profiles of bidirectional electric vehicles and municipalities.


2021 ◽  
Vol 36 (Supplement_1) ◽  
Author(s):  
E Panacheva ◽  
D Pochernikov ◽  
E Voroshilina

Abstract Study question What are the differences in the semen microbiota composition of patients with asthenozoospermia and normospermia according to cluster analysis of PCR data? Summary answer The detection rate of 4 stable semen microbiota clusters and the dominant bacteria groups varied in patients with asthenozoospermia and normospermia. What is known already Most of the research dedicated to analyzing normal and pathological semen microbiota is based on 16S rRNA gene specific Next generation sequencing (NGS). It has shown that microbiota is represented by polymicrobial communities (clusters) that consist of microorganisms from different genera and bacteria phyla. Despite it being highly informative, NGS has several weaknesses: complex sample preparation, difficult sample intake control, long analysis process, complicated results interpretation, high cost of equipment and reagents. These factors make it virtually impossible to use this approach in routine medical practice. Quantitative real-time PCR (RT-PCR) is far more suitable for this. Study design, size, duration Patients included in the study (n = 301) came to the “Garmonia” Medical Center (Yekaterinburg, Russia) either seeking preconception care or for infertility treatment. Depending on the spermiogram results, they were divided into two groups. Group 1 (n = 171) — asthenozoospermia, Group 2 (n = 130) — normospermia. Participants/materials, setting, methods Semen microbiota was analyzed using RT-PCR kit Androflor (DNA-Technology, Russia). Cluster analysis was performed for 201 samples with the total bacterial load (TBL) of at least 103 GE/ml (asthenozoospermia = 96, normospermia = 105). Cluster analysis was conducted using the k-means ++ algorithm, scikit-learn. The Silhouette index and the Davies–Bouldin index (DBI) were used to confirm the stability of clusters. Main results and the role of chance Both in the samples with normospermia and asthenozoospermia, four stable microbiota clusters were distinguished. Cluster I was characterized by the prevalence of obligate anaerobes, Lactobacillus spp. were prevalent in Cluster II, Gram-positive facultative anaerobes were prevalent in Cluster III, Enterobacteriaceae/Enterococcus spp. were prevalent in Cluster IV. Cluster I was detected the most often in both groups. However, in normospermia it was represented by various obligate anaerobes without pronounced quantitative predominance of any bacteria group. In samples with asthenozoospermia one of the bacteria groups were prevalent in Cluster I: Bacteroides spp./Porphyromonas spp./Prevotella spp., Peptostreptococcus spp./Parvimonas spp. or Eubacterium spp. In samples with asthenozoospermia Cluster II was characterized by the prevalence of Lactobacillus spp., while in samples with normospermia other bacteria groups were present along with lactobacilli, mainly obligate anaerobes. In samples with normospermia Corynebacterium spp. and Streptococcus spp., typical of normal microbiota of male UGT, were prevalent in Cluster III. In samples with asthenozoospermia Cluster III were characterized by the prevalence of Staphylococcus spp. In samples with asthenozoospermia Lactobacillus spp was present in Cluster IV along with Enterobacteriaceae/Enterococcus spp., which was not typical of the samples with normospermia. Limitations, reasons for caution Cluster analysis was not conducted for the samples with TBL lower than 103 GE/ml, since their results were incompatible with the data received for the negative control samples. Wider implications of the findings Further research could determine the detection rate of the described bacterial clusters in semen with other pathologies. Establishing the relationship between the characteristics of semen microbiota and infertility in men might allow the development of new algorithms for treating patients with reproductive disorders, depending on the composition of semen microbiota. Trial registration number not applicable


Sign in / Sign up

Export Citation Format

Share Document