The utility of clusters and a Hungarian clustering algorithm

Implicit in the k–means algorithm is a way to assign a value, or utility, to a cluster of points. It works by taking the centroid of the points and the value of the cluster is the sum of distances from the centroid to each point in the cluster. The aim in this paper is to introduce an alternative way to assign a value to a cluster. Motivation is provided. Moreover, whereas the k–means algorithm does not have a natural way to determine k if it is unknown, we can use our method of evaluating a cluster to find good clusters in a sequential manner. The idea uses optimizations over permutations and clusters are set by the cyclic groups; generated by the Hungarian algorithm.

Download Full-text

Motivated Bayesians: Feeling Moral While Acting Egoistically

The Journal of Economic Perspectives ◽

10.1257/jep.30.3.189 ◽

2016 ◽

Vol 30 (3) ◽

pp. 189-212 ◽

Cited By ~ 47

Author(s):

Francesca Gino ◽

Michael I. Norton ◽

Roberto A. Weber

Keyword(s):

Utility Function ◽

Empirical Evidence ◽

Ample Evidence ◽

Clear Cut ◽

Economic Framework ◽

A Value ◽

Psychology And Economics ◽

Self Interest ◽

Growing Body ◽

Natural Way

Research yields ample evidence that individual's behavior often reflects an apparent concern for moral considerations. A natural way to interpret evidence of such motives using an economic framework is to add an argument to the utility function such that agents obtain utility both from outcomes that yield only personal benefits and from acting kindly, honestly, or according to some other notion of “right.” Indeed, such interpretations can account for much of the existing empirical evidence. However, a growing body of research at the intersection of psychology and economics produces findings inconsistent with such straightforward, preference-based interpretations for moral behavior. In particular, while people are often willing to take a moral act that imposes personal material costs when confronted with a clear-cut choice between “right” and “wrong,” such decisions often seem to be dramatically influenced by the specific contexts in which they occur. In particular, when the context provides sufficient flexibility to allow plausible justification that one can both act egoistically while remaining moral, people seize on such opportunities to prioritize self-interest at the expense of morality. In other words, people who appear to exhibit a preference for being moral may in fact be placing a value on feeling moral, often accomplishing this goal by manipulating the manner in which they process information to justify taking egoistic actions while maintaining this feeling of morality.

Download Full-text

Fuzzy clustering and fuzzy c-means partition cluster analysis and validation studies on a subset of citescore dataset

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v9i4.pp2760-2770 ◽

2019 ◽

Vol 9 (4) ◽

pp. 2760

Author(s):

K. Varada Rajkumar ◽

Adimulam Yesubabu ◽

K. Subrahmanyam

Keyword(s):

Cluster Analysis ◽

Fuzzy Clustering ◽

Time Complexity ◽

Clustering Algorithm ◽

Fuzzy Cluster ◽

Fuzzy Cluster Analysis ◽

Fuzzy C Means ◽

A Value ◽

Data Points ◽

Partition Clustering

A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter <em>m</em> was evaluated to determine the distribution of membership values with variation in <em>m</em> from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.

Download Full-text

An Experimental Study on Centrality Measures Using Clustering

Computers ◽

10.3390/computers10090115 ◽

2021 ◽

Vol 10 (9) ◽

pp. 115

Author(s):

Péter Marjai ◽

Bence Szabari ◽

Attila Kiss

Keyword(s):

Clustering Algorithm ◽

Graph Clustering ◽

Great Accuracy ◽

Centrality Measures ◽

Graph Database ◽

Modern Life ◽

Original Graph ◽

Large Graphs ◽

A Value ◽

Networks Biology

Graphs can be found in almost every part of modern life: social networks, road networks, biology, and so on. Finding the most important node is a vital issue. Up to this date, numerous centrality measures were proposed to address this problem; however, each has its drawbacks, for example, not scaling well on large graphs. In this paper, we investigate the ranking efficiency and the execution time of a method that uses graph clustering to reduce the time that is needed to define the vital nodes. With graph clustering, the neighboring nodes representing communities are selected into groups. These groups are then used to create subgraphs from the original graph, which are smaller and easier to measure. To classify the efficiency, we investigate different aspects of accuracy. First, we compare the top 10 nodes that resulted from the original closeness and betweenness methods with the nodes that resulted from the use of this method. Then, we examine what percentage of the first n nodes are equal between the original and the clustered ranking. Centrality measures also assign a value to each node, so lastly we investigate the sum of the centrality values of the top n nodes. We also evaluate the runtime of the investigated method, and the original measures in plain implementation, with the use of a graph database. Based on our experiments, our method greatly reduces the time consumption of the investigated centrality measures, especially in the case of the Louvain algorithm. The first experiment regarding the accuracy yielded that the examination of the top 10 nodes is not good enough to properly evaluate the precision. The second experiment showed that the investigated algorithm in par with the Paris algorithm has around 45–60% accuracy in the case of betweenness centrality. On the other hand, the last experiment resulted that the investigated method has great accuracy in the case of closeness centrality especially in the case of Louvain clustering algorithm.

Download Full-text

Davies Bouldin Index Algorithm for Optimizing Clustering Case Studies Mapping School Facilities

TEM Journal ◽

10.18421/tem103-13 ◽

2021 ◽

pp. 1099-1103

Author(s):

Yudhistira Arie Wijaya ◽

Dedy Achmad Kurniady ◽

Eddy Setyanto ◽

Wahdan Sanur Tarihoran ◽

Dadan Rusmana ◽

...

Keyword(s):

High School ◽

Clustering Algorithm ◽

School Facilities ◽

Vocational High School ◽

Number Of Clusters ◽

Level Of Education ◽

A Value ◽

Cluster Set ◽

The Government ◽

Government Website

The lower Davies Bouldin Index (DBI) is considered the best clustering algorithm based on the criteria that yields a cluster set. The purpose of this research is to optimize the clustering results using DBI. The data sources used are the number of villages that have school facilities and the level of education is obtained from the government website (https://www.bps.go.id). The level of education in question is senior high school and vocational high school. The method used is k-means. The results show that from the number of clusters (k = 2, 3, 4, 5, 6) the optimal DBI for (k = 2) is obtained with a value of 0.168 for Measure Type = Mixed Measures. For the value of k = 2, a mapping of areas with L0 (low) = 31 province and L1 (high) = 3 provinces is obtained. The final centroids obtained for each cluster are L0 (315 and 155) and L1 (1710 and 1259). Based on the results of mapping by optimizing k-means and DBI, more than 90% of the villages still have school facilities, especially at the high school and vocational high school levels.

Download Full-text

Exploring the potential microRNA sponge interactions of breast cancer based on some known interactions

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720020500079 ◽

2020 ◽

Vol 18 (03) ◽

pp. 2050007

Author(s):

Lei Tian ◽

Shu-Lin Wang

Keyword(s):

Breast Cancer ◽

Clustering Algorithm ◽

Human Cancer ◽

Classification Performance ◽

New Method ◽

Superior Performance ◽

Mirna Sponge ◽

A Value ◽

Microrna Sponge ◽

Mirna Sponges

MicroRNA (miRNA) sponges’ regulatory mechanisms play an important role in developing human cancer. Herein, we develop a new method to explore potential miRNA sponge interactions (EPMSIs) for breast cancer. Based on some known interactions, and a matching gene expression profile, EPMSIs explored other potential miRNA sponge interactions for breast cancer. Every interaction is inferred with a value representing interaction intensity. Then, we apply a clustering algorithm called BCPlaid to potential interactions. Ten modules are identified; nine of them are closely associated with biological enrichments. When we employ a classification algorithm to separate normal and tumor samples in each module, each module demonstrates powerful classification performance. Furthermore, EPMSI illustrates a new method to explore the miRNA sponge regulatory network for breast cancer by applying its superior performance.

Download Full-text

Automatic group-wise whole-brain short association fiber bundle labeling based on clustering and cortical surface information

10.21203/rs.2.20420/v1 ◽

2020 ◽

Author(s):

Andrea Vázquez ◽

Narciso López-López ◽

Josselin Houenou ◽

Cyril Poupon ◽

Jean-François Mangin ◽

...

Keyword(s):

White Matter ◽

Execution Time ◽

Fiber Bundle ◽

Clustering Algorithm ◽

Distance Measure ◽

Clustering Methods ◽

Good Correspondence ◽

Hungarian Algorithm ◽

Brain White Matter ◽

Fiber Clustering

Abstract Background: Diffusion MRI is the preferred non-invasive in vivo modality for the study of brain white matter connections. Tractography datasets contain 3D streamlines that can be analyzed to study the main brain white matter tracts. Fiber clustering methods have been used to automatically regroup similar fibers into clusters. However, due to inter-subject variability and artifacts, the resulting clusters are difficult to process for finding common connections across subjects, specially for superficial white matter. Methods: We present an automatic method for labeling of short association bundles on a group of subjects. The method is based on an intra-subject fiber clustering that generates compact fiber clusters. Posteriorly, the clusters are labeled based on the cortical connectivity of the fibers, taking as reference the Desikan-Killiany atlas, and named according to their relative position along one axis. Finally, two different strategies were applied and compared for the labeling of inter-subject bundles: a matching with the Hungarian algorithm, and a well-known fiber clustering algorithm, called QuickBundles. Results: Individual labeling was executed over four subjects, with an execution time of 3.6 minutes. An inspection of individual labeling based on a distance measure, showed good correspondence among the four tested subjects. Two inter-subject labeling were successfully implemented and applied to 20 subjects, and compared using a set of distance thresholds, ranging from a conservative value of 10 mm to a moderate value of 21 mm. Hungarian algorithm led to high correspondence, but low reproducibility for all the thresholds, with 96 seconds of execution time. QuickBundles led to better correspondence, reproducibility and short execution time of 9 seconds. Hence, the whole processing for the inter-subject labeling over 20 subjects takes 1.17 hours. Conclusion: We implemented a method for the automatic labeling of short bundles in individuals, based on an intra-subject clustering and the connectivity of the clusters with the cortex. The labels provide useful information for the visualization and analysis of individual connections, what is very difficult without any additional information. Furthermore, we provide two fast inter-subject bundle labeling methods. The obtained clusters could be used for performing manual or automatic connectivity analysis in individuals or across subjects. Keywords: fiber labeling; clustering; fiber bundle; tractography; superficial white matter

Download Full-text

The Level of Student Satisfaction with the Online Learning Process During a Pandemic Using the K-means Algorithm

Jurnal INFORM ◽

10.25139/inform.v6i2.3945 ◽

2021 ◽

Vol 6 (2) ◽

pp. 123-126

Author(s):

Talitha Syahla Janiar Arifin ◽

Nakia Natassa ◽

Dinda Khoirunnisa ◽

Retno Hendrowati

Keyword(s):

Online Learning ◽

Student Satisfaction ◽

Research Methods ◽

No Value ◽

Learning Process ◽

Quantitative Research ◽

Clustering Algorithm ◽

Quantitative Research Methods ◽

A Value ◽

Level Of Satisfaction

The number of cases of Covid-19 in this pandemic era is increasing and getting out of control every day. This triggers the Indonesian government to set policies on schools with online learning methods. Of course, online learning cannot ensure that it runs smoothly in all circles because several factors hinder the learning process. The difficulty of the internet network, limited quotas, unfamiliarity with the use of learning media, and an unsupportive environment for conducting online learning are the obstacles to ineffective online learning. The purpose of this study was to determine the level of satisfaction with online learning during the pandemic. This study uses quantitative research methods with a descriptive approach. Quantitative research methods will be processed into data mining using the K-Means Clustering Algorithm. The clustering process is carried out to get the results of clustering the level of student satisfaction. The dataset was obtained from the results of the questionnaire by submitting statements of satisfaction and dissatisfaction. The cluster type is based on high, medium, and low class. The test results obtained a value with the final iteration, namely the level of satisfied statements is categorized as high with a value of 11.79 compared to the dissatisfied statement, which is categorized as moderate with a value of 7.46. In contrast, for the low category level, there is no value of 0.00 cluster results state that the category is satisfied with online learning with a value of 9.33.

Download Full-text

Corrections to Yukawa couplings from higher dimensional operators in a natural SUSY SO(10) and HL-LHC implications

Journal of High Energy Physics ◽

10.1007/jhep01(2021)047 ◽

2021 ◽

Vol 2021 (1) ◽

Author(s):

Amin Aboubrahim ◽

Pran Nath ◽

Raza M. Syed

Keyword(s):

Gauge Group ◽

Hadron Collider ◽

High Energy ◽

Spontaneous Breaking ◽

Electroweak Symmetry ◽

Yukawa Couplings ◽

A Value ◽

Higher Dimensional ◽

Higgs Fields ◽

Natural Way

Abstract We consider a class of unified models based on the gauge group SO(10) which with appropriate choice of Higgs representations generate in a natural way a pair of light Higgs doublets needed to accomplish electroweak symmetry breaking. In this class of models higher dimensional operators of the form matter-matter-Higgs-Higgs in the superpotential after spontaneous breaking of the GUT symmetry generate contributions to Yukawa couplings which are comparable to the ones from cubic interactions. Specifically we consider an SO(10) model with a sector consisting of 126 + $$ \overline{126} $$ 126 ¯ + 210 of heavy Higgs which breaks the GUT symmetry down to the standard model gauge group and a sector consisting of 2 × 10 + 120 of light Higgs fields. In this model we compute the corrections from the quartic interactions to the Yukawa couplings for the top and the bottom quarks and for the tau lepton. It is then shown that inclusion of these corrections to the GUT scale Yukawas allows for consistency of the top, bottom and tau masses with experiment for low tan β with a value as low as tan β of 5–10. We compute the sparticle spectrum for a set of benchmarks and find that satisfaction of the relic density is achieved via a compressed spectrum and coannihilation and three sets of coannihilations appear: chargino-neutralino, stop-neutralino and stau-neutralino. We investigate the chargino-neutralino coannihilation in detail for the possibility of observation of the light chargino at the high luminosity LHC (HL-LHC) and at the high energy LHC (HE-LHC) which is a possible future 27 TeV hadron collider. It is shown that all benchmark models but one can be discovered at HL-LHC and all would be discoverable at HE-LHC. The ones discoverable at both machines require a much shorter time scale and a lower integrated luminosity at HE-LHC.

Download Full-text

Multiplatform collaborative detection resource scheduling method using K‐means clustering algorithm and Hungarian algorithm

Concurrency and Computation Practice and Experience ◽

10.1002/cpe.6075 ◽

2020 ◽

Author(s):

Tianquan Ni ◽

Yi Jiang

Keyword(s):

Clustering Algorithm ◽

Resource Scheduling ◽

Hungarian Algorithm ◽

Scheduling Method ◽

Collaborative Detection

Download Full-text

Application of K-Means Clustering Algorithm for Determination of Fire-Prone Areas Utilizing Hotspots in West Kalimantan Province

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v1i1.7 ◽

2020 ◽

Vol 1 (1) ◽

pp. 9-16

Author(s):

Nabila Amalia Khairani ◽

Edi Sutoyo

Keyword(s):

Data Mining ◽

Forest Fires ◽

Clustering Algorithm ◽

Social Aspects ◽

Mining Method ◽

Clustering Method ◽

West Kalimantan ◽

A Value ◽

The Impact

Forest and land fires are disasters that often occur in Indonesia. In 2007, 2012 and 2015 forest fires that occurred in Sumatra and Kalimantan attracted global attention because they brought smog pollution to neighboring countries. One of the regions that has the highest fire hotspots is West Kalimantan Province. Forest and land fires have an impact on health, especially on the communities around the scene, as well as on the economic and social aspects. This must be overcome, one of them is by knowing the location of the area of ??fire and can analyze the causes of forest and land fires. With the impact caused by forest and land fires, the purpose of this study is to apply the clustering method using the k-means algorithm to be able to determine the hotspot prone areas in West Kalimantan Province. And evaluate the results of the cluster that has been obtained from the clustering method using the k-means algorithm. Data mining is a suitable method to be able to find out information on hotspot areas. The data mining method used is clustering because this method can process hotspot data into information that can inform areas prone to hotspots. This clustering uses k-means algorithm which is grouping data based on similar characteristics. The hotspots data obtained are grouped into 3 clusters with the results obtained for cluster 0 as many as 284 hotspots including hazardous areas, 215 hotspots including non-prone areas and 129 points that belong to very vulnerable areas. Then the clustering results were evaluated using the Davies-Bouldin Index (DBI) method with a value of 3.112 which indicates that the clustering results of 3 clusters were not optimal.

Download Full-text