clustering data
Recently Published Documents


TOTAL DOCUMENTS

373
(FIVE YEARS 136)

H-INDEX

21
(FIVE YEARS 4)

2022 ◽  
Vol 10 (4) ◽  
pp. 583-593
Author(s):  
Syiva Multi Fani ◽  
Rukun Santoso ◽  
Suparti Suparti

Social media is computer-based technology that facilitates the sharing of ideas, thoughts, and information through the building of virtual networks and communities. Twitter is one of the most popular social media in Indonesia which has 78 million users. Businesses rely heavily on Twitter for advertising. Businesses can use these types of tweet content as a means of advertising to Twitter users by Knowing the types of tweet content that are mostly retweeted by their followers . In this study, the application of Text Mining to perform clustering using the K-means clustering method with the best number of clusters obtained from the Silhouette Coefficient method on the @bliblidotcom Twitter tweet data to determine the types of tweet content that are mostly retweeted by @bliblidotcom followers. Tweets with the most retweets and favorites are discount offers and flash sales, so Blibli Indonesia could use this kind of tweet to conduct advertising on social media Twitter because the prize quiz tweets are liked by the @bliblidotcom Twitter account followers.


Author(s):  
Trisna Yuniarti ◽  
Dahliyah Hayati

The oil palm is the most productive plantation product in Indonesia. Government strategies and policies related to oil palm plantations continue to be carried out considering that the plantation area is increasing every year. Segmentation of oil palm plantations based on area, production, and productivity aims to identify groups of potential oil palm plantations in the territory of Indonesia. This segmentation can provide consideration in formulating strategies and policies that will be made by the government. The segmentation method for grouping oil palm plantations uses the K-Means Clustering Data Mining technique with 3 clusters specified. Data mining stages start from data collection until representation is carried out, where 34 data sets are collected, only 25 data sets can be processed further. The results of this grouping obtained three plantation segments, namely 72% of the plantation group with low potential, 20% of the plantation group with medium potential, and 8% of the plantation group with high potential.


2021 ◽  
Vol 7 (3) ◽  
pp. 340
Author(s):  
Ricky Akbar ◽  
Mutia Octaviany

Penelitian ini bertujuan untuk merancang dan membangun dashboard dan clustering data layanan perizinan publik Dinas Penanaman Modal dan Pelayanan Terpadu Satu Pintu (DPMPTSP) Kabupaten Dharmasraya, untuk memudahkan proses analisis data dan pengambilan keputusan oleh pihak manajemen dengan memanfaatkan Business Intelligence. Karena selama ini proses analisis data masih dilakukan secara konvensional yaitu dengan mengambil data dari Aplikasi kemudian dikonversikan ke Microsoft Excel terlebih dahulu, baru dilakukan analisis. Proses seperti ini akan memakan waktu yang lama dan hasil analisis yang tidak akurat. Dalam melakukan Layanan Perizinan untuk Publik, Dinas DPMPTSP selama ini sudah menggunakan Aplikasi SiCantik, yaitu aplikasi berbasis web yang terintegrasi untuk pengelolaan perizinan usaha maupun layanan umum lainnya yang bersifat Online Single Submission (OSS). Namun, aplikasi ini hanya sebatas melakukan pengelolaan data, belum terdapat visualisasi dashboard untuk memudahkan proses analisis. Sebelum melakukan perancangan dashboard dan proses clustering data, terlebih dahulu dilakukan pengolahan data dari Database Aplikasi dalam bentuk Extract-Transform-Load (ETL) menggunakan tool perangkat lunak Pentaho Data Integration (PDI), sedangkan untuk merancang dashboard dan proses clustering menggunakan perangkat lunak Mirosoft Power BI. Hasil dari penelitian ini adalah bentuk visualisasi dashboard yang interaktif dan mudah dipahami, serta pengelompokkan data dalam bentuk grafik clusterisasi yang dapat mempermudah pihak manajemen dalam pengambilan keputusan.


2021 ◽  
Vol 11 (24) ◽  
pp. 12054
Author(s):  
Neila Mezghani ◽  
Rayan Soltana ◽  
Youssef Ouakrim ◽  
Alix Cagnin ◽  
Alexandre Fuentes ◽  
...  

The purpose of this study is to identify healthy phenotypes in knee kinematics based on clustering data analysis. Our analysis uses the 3D knee kinematics curves, namely, flexion/extension, abduction/adduction, and tibial internal/external rotation, measured via a KneeKG™ system during a gait task. We investigated two data representation approaches that are based on the joint analysis of the three dimensions. The first is a global approach that is considered a concatenation of the kinematic data without any dimensionality reduction. The second is a local approach that is considered a set of 69 biomechanical parameters of interest extracted from the 3D kinematic curves. The data representations are followed by a clustering process, based on the BIRCH (balanced iterative reducing and clustering using hierarchies) discriminant model, to separate 3D knee kinematics into homogeneous groups or clusters. Phenotypes were obtained by averaging those groups. We validated the clusters using inter-cluster correlation and statistical hypothesis tests. The simulation results showed that the global approach is more efficient, and it allows the identification of three descriptive 3D kinematic phenotypes within a healthy knee population.


2021 ◽  
Vol 923 (2) ◽  
pp. 154
Author(s):  
Jeremy L. Tinker

Abstract We apply a new galaxy group-finder to the Main Galaxy Sample of the SDSS. This algorithm introduces new freedom to assign halos to galaxies that is self-calibrated by comparing the catalog to complementary data. These include galaxy clustering data and measurements of the total satellite luminosity from deep-imaging data. We present constraints on the galaxy-halo connection for star-forming and quiescent populations. The results of the self-calibrated group catalog differ in several key ways from previous group catalogs and halo-occupation analyses. The transition halo mass scale, where half of the halos contain quiescent central galaxies, is at M h ∼ 1012.4 h −1 M ⊙, significantly higher than other constraints. Additionally, the width of the transition from predominantly star-forming halos to quiescent halos occurs over a narrower range in halo mass. Quiescent central galaxies in low-mass halos are significantly more massive than star-forming centrals at the same halo mass, but this difference reverses above the transition halo mass. We find that the scatter in log M * at fixed M h is ∼0.2 dex for massive halos, in agreement with previous estimates, but rises sharply at lower halo masses. The halo masses assigned by the group catalog are in good agreement with weak-lensing estimates for star-forming and quiescent central galaxies. We discuss possible improvements to the algorithm made clear by this first application to data. The group catalog is made publicly available.


2021 ◽  
Vol 3 (2) ◽  
pp. 11-18
Author(s):  
Ainur Rahman ◽  
Heri Suroyo
Keyword(s):  
Low Cost ◽  

Fokus  peneltitian ini adalah melakukan analisis text mining pada produk elektronik yang dijual di e-commerce Shopee dengan menggunakan metode algoritma K-Means bahasa python. Data yang di scraping adalah teks komentar, penjualan dan rating bintang. Data hasil dari penelitian didapatkan pada analisis teks komentar produk dengan wordcloud produk Smartphone low cost menunjukan data komentar marketplace shopee dapat bahwa baik di smartphone low cost maupun yang high cost cenderung memiliki pola wordloud yang sama dimana kata yang dominan muncul cenderung netral dan positif, sedang  yang bermakna negatif cederung tidak dominan. Sementara kata yang sering muncul yaitu  barang, mantap, kirim dan bagus . Sedangkan hasil proses wordcloud high cost  kata yang sering muncul ialah  (kirim, cepat, dan bagus). Serta berdasarkan hasil grafik dari proses clustering data k-means menunjukan bahwa angka penjualan 0 sampai 1000 mendapatkan skor rating bintang tertinggi dan penjualan dengan skor rating bintang terendah ialah antara 1500 sampai 2000 ke atas.


2021 ◽  
Vol 27 (11) ◽  
pp. 1203-1221
Author(s):  
Amal Rekik ◽  
Salma Jamoussi

Clustering data streams in order to detect trending topic on social networks is a chal- lenging task that interests the researchers in the big data field. In fact, analyzing such data needs several requirements to be addressed due to their large amount and evolving nature. For this purpose, we propose, in this paper, a new evolving clustering method which can take into account the incremental nature of the data and meet with its principal requirements. Our method explores a deep learning technique to learn incrementally from unlabelled examples generated at high speed which need to be clustered instantly. To evaluate the performance of our method, we have conducted several experiments using the Sanders, HCR and Terr-Attacks datasets.


Entropy ◽  
2021 ◽  
Vol 23 (11) ◽  
pp. 1503
Author(s):  
Shunki Kyoya ◽  
Kenji Yamanishi

Finite mixture models are widely used for modeling and clustering data. When they are used for clustering, they are often interpreted by regarding each component as one cluster. However, this assumption may be invalid when the components overlap. It leads to the issue of analyzing such overlaps to correctly understand the models. The primary purpose of this paper is to establish a theoretical framework for interpreting the overlapping mixture models by estimating how they overlap, using measures of information such as entropy and mutual information. This is achieved by merging components to regard multiple components as one cluster and summarizing the merging results. First, we propose three conditions that any merging criterion should satisfy. Then, we investigate whether several existing merging criteria satisfy the conditions and modify them to fulfill more conditions. Second, we propose a novel concept named clustering summarization to evaluate the merging results. In it, we can quantify how overlapped and biased the clusters are, using mutual information-based criteria. Using artificial and real datasets, we empirically demonstrate that our methods of modifying criteria and summarizing results are effective for understanding the cluster structures. We therefore give a new view of interpretability/explainability for model-based clustering.


2021 ◽  
Author(s):  
Thorsten Horn ◽  
Kalin D. Narov ◽  
Kristen A. Panfilio

Parental RNA interference (pRNAi) is a powerful and widely used method for gene-specific knockdown. Yet in insects its efficacy varies between species, and how the systemic RNAi response is transmitted from mother to offspring remains elusive. Using the flour beetle Tribolium castaneum, we report an RT-qPCR strategy to unmask the presence of double-stranded RNA (dsRNA) distinct from endogenous mRNA. We find that the injected dsRNA is directly transmitted into the egg and persists throughout embryogenesis. Despite this depletion of dsRNA from the mother, we show that strong pRNAi can persist for months before waning at strain-specific rates. In seeking the receptor proteins for cellular uptake of long dsRNA into the egg, we lastly present a phylogenomics profiling approach to ascertain macroevolutionary distributions of candidate proteins. We demonstrate a visualization strategy based on taxonomically hierarchical assessment of orthology clustering data to rapidly assess gene age and copy number changes, refined by several lines of sequence-based evidence. We use this approach to document repeated losses of SID-1-like channel proteins in the arthropods, including wholesale loss in the Heteroptera (true bugs), which are nonetheless highly sensitive to pRNAi. Overall, we elucidate practical considerations for insect pRNAi against a backdrop of outstanding questions on the molecular mechanism of dsRNA transmission to achieve long-term, systemic knockdown.


Author(s):  
I Putu Noven Hartawan ◽  
I Made Oka Widyantara ◽  
A. A. I. N. E. Karyawati ◽  
Ngurah Indra Er ◽  
Ketut Buda Artana ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document