final cluster
Recently Published Documents


TOTAL DOCUMENTS

30
(FIVE YEARS 13)

H-INDEX

7
(FIVE YEARS 0)

PLoS ONE ◽  
2021 ◽  
Vol 16 (7) ◽  
pp. e0254090
Author(s):  
Nicole C. Nelson ◽  
Kelsey Ichikawa ◽  
Julie Chung ◽  
Momin M. Malik

To those involved in discussions about rigor, reproducibility, and replication in science, conversation about the “reproducibility crisis” appear ill-structured. Seemingly very different issues concerning the purity of reagents, accessibility of computational code, or misaligned incentives in academic research writ large are all collected up under this label. Prior work has attempted to address this problem by creating analytical definitions of reproducibility. We take a novel empirical, mixed methods approach to understanding variation in reproducibility discussions, using a combination of grounded theory and correspondence analysis to examine how a variety of authors narrate the story of the reproducibility crisis. Contrary to expectations, this analysis demonstrates that there is a clear thematic core to reproducibility discussions, centered on the incentive structure of science, the transparency of methods and data, and the need to reform academic publishing. However, we also identify three clusters of discussion that are distinct from the main body of articles: one focused on reagents, another on statistical methods, and a final cluster focused on the heterogeneity of the natural world. Although there are discursive differences between scientific and popular articles, we find no strong differences in how scientists and journalists write about the reproducibility crisis. Our findings demonstrate the value of using qualitative methods to identify the bounds and features of reproducibility discourse, and identify distinct vocabularies and constituencies that reformers should engage with to promote change.


2021 ◽  
Vol 5 (3) ◽  
pp. 565-575
Author(s):  
Arief Wibowo ◽  
Moh Makruf ◽  
Inge Virdyna ◽  
Farah Chikita Venna

The Covid-19 pandemic has made many changes in the patterns of community activity. Large-Scale Social Restrictions were implemented to reduce the number of transmission of the virus. This clearly affects the mode of transportation. The mode of transportation makes new regulations to reduce the number of passenger capacities in each fleet, for example, TransJakarta services. This study will categorize the TransJakarta corridors before and during the Covid-19 pandemic. The clustering method of K-Means and K-Medoids is used to obtain accurate calculation results. The calculations are performed using Microsoft Excel, Rapid Miner, and Python programming language. The clustering results obtained that using K-Means algorithm before Covid-19 pandemic, an optimum number of clusters is 3 clusters with DBI (Davies Bouldin Index) value is 0.184, and during Covid-19 pandemic, the optimum number of clusters is 2 clusters with DBI value is 0.188. Meanwhile, when using the K-Medoids algorithm before the Covid-19 pandemic, an optimum number of clusters is 3 clusters with the DBI value is 0.200, and during the Covid-19 pandemic, an optimum number of clusters is 4 clusters with the DBI value is 0.190. The final cluster is determined using the majority voting approach from all the tools used.  


2021 ◽  
Author(s):  
Marija Eric

The purpose of this thesis is to develop a methodology for hydrological modelling the performance of Low Impact Development technologies using an Urban Hydrological Response Unit approach. The K-Means Cluster Analysis procedure was carried out to create clusters of lot parcels which represented the Urban Hydrological Response Units. Different sampling methods were used to select lots from each of the clusters to model before and after Low Impact Development implementation. The runoff response (m3) of an approximate final cluster centre was used to calculate the total runoff (m3) of each cluster. After adding the total runoff (m3) for a group of 15 clusters, the benchmark runoff value (m3) from modelling all lots was closely approached with and without Low Impact Development. A random sample of 7 % and 90 % of lots from each cluster for a group of three clusters closely approached the benchmark runoff value (m3) for both no Low Impact Development and Low Impact Development respectively.


2021 ◽  
Author(s):  
Marija Eric

The purpose of this thesis is to develop a methodology for hydrological modelling the performance of Low Impact Development technologies using an Urban Hydrological Response Unit approach. The K-Means Cluster Analysis procedure was carried out to create clusters of lot parcels which represented the Urban Hydrological Response Units. Different sampling methods were used to select lots from each of the clusters to model before and after Low Impact Development implementation. The runoff response (m3) of an approximate final cluster centre was used to calculate the total runoff (m3) of each cluster. After adding the total runoff (m3) for a group of 15 clusters, the benchmark runoff value (m3) from modelling all lots was closely approached with and without Low Impact Development. A random sample of 7 % and 90 % of lots from each cluster for a group of three clusters closely approached the benchmark runoff value (m3) for both no Low Impact Development and Low Impact Development respectively.


2021 ◽  
Author(s):  
Yashuang Mu ◽  
Wei Wei ◽  
Hongyue Guo ◽  
Lijun Sun

Abstract In this study, a layered parallel algorithm via fuzzy c-means (FCM) technique, called LP-FCM, is proposed in the framework of Map-Reduce for data clustering problems. The LP-FCM mainly contains three layers. The first layer follows a parallel data partitioning method which is developed to randomly divide the original dataset into several subdatasets. The second layer uses a parallel cluster centers searching method based on Map-Reduce, where the classic FCM algorithm is applied to search the cluster centers for each subdataset in Map phases, and all the centers are gathered to the Reduce phase for the confirmation of the final cluster centers through FCM technique again. The third layer implements a parallel data clustering method based on the final cluster centers. After comparing with some famous classic random initialization sequential clustering algorithms which include K-means, K-medoids, FCM and MinMaxK-means on 20 benchmark datasets, the feasibility in terms of clustering accuracy is evaluated. Furthermore, the clustering time and the parallel performance are also tested on some generated large-scale datasets for the parallelism.


Author(s):  
Tahmim Jeba ◽  
◽  
Tarek Mahmud ◽  
Pritom S. Akash ◽  
Nadia Nahar

Code smells are the indicators of the flaws in the design and development phases that decrease the maintainability and reusability of a system. A system with uneven distribution of responsibilities among the classes is generated by one of the most hazardous code smells called God Class. To address this threatening issue, an extract class refactoring technique is proposed that incorporates both cohesion and contextual aspects of a class. In this work, greater emphasis was provided on the code documentation to extract classes with higher contextual similarity. Firstly, the source code is analyzed to generate a set of cluster of extracted methods. Secondly, another set of clusters is generated by analyzing code documentation. Then, merging these two, a final cluster set is formed to extract the God Class. Finally, an automatic refactoring approach is also followed to build newly identified classes. Using two different metrics, a comparative result analysis is provided where it is shown that the cohesion among the classes is increased if the context is added in the refactoring process. Moreover, a manual inspection is conducted to ensure that the methods of the refactored classes are contextually organized. This recommendation of God Class extraction can significantly help the developers in minimizing the burden of refactoring on own their own and maintaining the software systems.


2020 ◽  
Vol 17 (1) ◽  
pp. 72
Author(s):  
Natelda Rosaldiah Timisela
Keyword(s):  
Z Score ◽  

<p>Penelitin ini bertujuan untuk mengetahui segmentasi pasar sayuran organik. Sampel penelitian terdiri dari petani dan pedagang sayuran organik diambil secara sensus berjumlah masing-masing 13 responden. Sedangkan sampel konsumen diambil secara incidental yaitu pengambilan sampel secara kebetulan berjumlah 40 orang. Hasil analisis final cluster menunjukkan bahwa cluster 1, z-score bernilai negatif yaitu usia dan pekerjaan sedangkan pada cluster 2, z-scor bernilai negatif yaitu pendidikan dan pendapatan. Hal ini berarti bahwa pada cluster 1, usia dan pekerjaan konsumen masih sangat minim untuk menentukan pilihan dalam memilih sayuran organik untuk dikonsumsi. Hal yang sama juga terjadi di cluster 2, pendidikan dan pendapatan konsumen masih rendah untuk memilih sayuran organik sebagai kebutuhan konsumen. Nilai F terbesar (56,156) terlihat pada Zscore pendidikan, dengan angka signifikan 0,000 yang berarti signifikansinya adalah nyata. Hal ini berarti faktor pendidikan sangat membedakan karakteristik kedua cluster. Atau dapat dikatakan bahwa pendidikan pada kedua cluster yang sangat berbeda antara cluster 1 dengan cluster 2.</p>


2020 ◽  
Author(s):  
Nicole Christine Nelson ◽  
Kelsey Ichikawa ◽  
Julie Chung ◽  
Momin Malik

Addressing issues with the reproducibility of results is critical for scientific progress, but conflicting ideas about the sources of and solutions to irreproducibility are a barrier to change. Prior work has attempted to address this problem by creating analytical definitions of reproducibility. We take a novel empirical, mixed methods approach to understanding variation in reproducibility conversations, which yields a map of the discursive dimensions of these conversations. This analysis demonstrates that concerns about the incentive structure of science, the transparency of methods and data, and the need to reform academic publishing form the core of reproducibility discussions. We also identify three clusters of discussion that are distinct from the main group: one focused on reagents, another on statistical methods, and a final cluster focused the heterogeneity of the natural world. Although there are discursive differences between scientific and popular articles, there are no strong differences in how scientists and journalists write about the reproducibility crisis. Our findings show that conversations about reproducibility have a clear underlying structure, despite the broad scope and scale of the crisis. Our map demonstrates the value of using qualitative methods to identify the bounds and features of reproducibility discourse, and identifies distinct vocabularies and constituencies that reformers should engage with to promote change.


2020 ◽  
Vol 497 (4) ◽  
pp. 5220-5228
Author(s):  
Weiguang Cui ◽  
Jiaqi Qiao ◽  
Romeel Davé ◽  
Alexander Knebe ◽  
John A Peacock ◽  
...  

ABSTRACT Protoclusters, which will yield galaxy clusters at lower redshift, can provide valuable information on the formation of galaxy clusters. However, identifying progenitors of galaxy clusters in observations is not an easy task, especially at high redshift. Different priors have been used to estimate the overdense regions that are thought to mark the locations of protoclusters. In this paper, we use mimicked Ly α-emitting galaxies at z = 5.7 to identify protoclusters in the MultiDark galaxies, which are populated by applying three different semi-analytic models to the $1\, h^{-1}\, {\rm Gpc}$ MultiDark Planck2 simulation. To compare with observational results, we extend the criterion 1 (a Ly α luminosity limited sample) to criterion 2 (a match to the observed mean galaxy number density). To further statistically study the finding efficiency of this method, we enlarge the identified protocluster sample (criterion 3) to about 3500 at z = 5.7 and study their final mass distribution. The number of overdense regions and their selection probability depends on the semi-analytic models and strongly on the three selection criteria (partly by design). The protoclusters identified with criterion 1 are associated with a typical final cluster mass of $2.82\pm 0.92 \times 10^{15} \, \rm {M_{\odot }}$, which is in agreement with the prediction (within ±1σ) of an observed massive protocluster at z = 5.7. Identifying more protoclusters allows us to investigate the efficiency of this method, which is more suitable for identifying the most massive clusters: completeness ($\mathbb {C}$) drops rapidly with decreasing halo mass. We further find that it is hard to have a high purity ($\mathbb {P}$) and completeness simultaneously.


2020 ◽  
Vol 10 (3) ◽  
pp. 579-585
Author(s):  
Hui Zhang ◽  
Hongjie Zhang

Accurate segmentation of brain tissue has important guiding significance and practical application value for the diagnosis of brain diseases. Brain magnetic resonance imaging (MRI) has the characteristics of high dimensionality and large sample size. Such datasets create considerable computational complexity in image processing. To efficiently process large sample data, this article integrates the proposed block clustering strategy with the classic fuzzy C-means clustering (FCM) algorithm and proposes a block-based integrated FCM clustering algorithm (BI-FCM). The algorithm first performs block processing on each image and then clusters each subimage using the FCM algorithm. The cluster centers for all subimages are again clustered using FCM to obtain the final cluster center. Finally, the distance from each pixel to the final cluster center is obtained, and the corresponding division is performed according to the distance. The dataset used in this experiment is the Simulated Brain Database (SBD). The results show that the BI-FCM algorithm addresses the large sample processing problem well, and the theory is simple and effective.


Sign in / Sign up

Export Citation Format

Share Document