scholarly journals AB0430 DEVELOP A REPLICABLE MODEL FOR RATIONAL SELECTION OF STRATEGIES IN TREAT-TO-TARGET AND MAINTAIN-BEING-TARGET: REAL WORLD DATA MINING VIA SMART SYSTEM OF DISEASE MANAGEMENT (SSDM)

Author(s):  
Rong Mu ◽  
LI Chun ◽  
Jing Yang ◽  
Xiaohan Wang ◽  
Bin Wu ◽  
...  
Author(s):  
Deepali Virmani ◽  
Nikita Jain ◽  
Ketan Parikh ◽  
Shefali Upadhyaya ◽  
Abhishek Srivastav

This article describes how data is relevant and if it can be organized, linked with other data and grouped into a cluster. Clustering is the process of organizing a given set of objects into a set of disjoint groups called clusters. There are a number of clustering algorithms like k-means, k-medoids, normalized k-means, etc. So, the focus remains on efficiency and accuracy of algorithms. The focus is also on the time it takes for clustering and reducing overlapping between clusters. K-means is one of the simplest unsupervised learning algorithms that solves the well-known clustering problem. The k-means algorithm partitions data into K clusters and the centroids are randomly chosen resulting numeric values prohibits it from being used to cluster real world data containing categorical values. Poor selection of initial centroids can result in poor clustering. This article deals with a proposed algorithm which is a variant of k-means with some modifications resulting in better clustering, reduced overlapping and lesser time required for clustering by selecting initial centres in k-means and normalizing the data.


2021 ◽  
Vol 9 (8) ◽  
pp. 623-623
Author(s):  
Fangtao Yin ◽  
Hongyu Zhu ◽  
Songlin Hong ◽  
Chen Sun ◽  
Jie Wang ◽  
...  

Entropy ◽  
2021 ◽  
Vol 23 (12) ◽  
pp. 1621
Author(s):  
Przemysław Juszczuk ◽  
Jan Kozak ◽  
Grzegorz Dziczkowski ◽  
Szymon Głowania ◽  
Tomasz Jach ◽  
...  

In the era of the Internet of Things and big data, we are faced with the management of a flood of information. The complexity and amount of data presented to the decision-maker are enormous, and existing methods often fail to derive nonredundant information quickly. Thus, the selection of the most satisfactory set of solutions is often a struggle. This article investigates the possibilities of using the entropy measure as an indicator of data difficulty. To do so, we focus on real-world data covering various fields related to markets (the real estate market and financial markets), sports data, fake news data, and more. The problem is twofold: First, since we deal with unprocessed, inconsistent data, it is necessary to perform additional preprocessing. Therefore, the second step of our research is using the entropy-based measure to capture the nonredundant, noncorrelated core information from the data. Research is conducted using well-known algorithms from the classification domain to investigate the quality of solutions derived based on initial preprocessing and the information indicated by the entropy measure. Eventually, the best 25% (in the sense of entropy measure) attributes are selected to perform the whole classification procedure once again, and the results are compared.


Sign in / Sign up

Export Citation Format

Share Document